DIP 2013 All in One

Digital Image Processing
Course 1
Course 1
Bibliography
R.C. Gonzales, R.E. Woods, Digital Image Processing, Prentice Hall,
2008, 3rd ed.
R.C. Gonzales, R.E. Woods, S.L. Eddins, Digital Image Processing
Using MATLAB,
MATLAB Prentice Hall,
Hall 2003
http://www.imageprocessingplace.com/
M. Petrou, C. Petrou, Image Processing: the fundamentals,

John Wiley, 2010, 2nd ed.
W.
W Burger,
B
M.J.
M J Burge,
B
Di it l Image
Digital
I
Processing,
P
i
A Al
An
Algorithmic
ith i
Introduction Using Java, Springer, 2008
Course 1
Image Processing Toolbox (http://www.mathworks.com/products/image/)

C. Solomon, T. Breckon, Fundamentals of Digital Image
Processing: A Practical Approach with Examples in Matlab,

Matlab
Wiley-Blackwell, 2011
W.K. Pratt, Digital
g
Image
g Processing,
g Wiley-Interscience,
y
2007
Course 1
Evaluation
MATLAB image processing test (50%)
Articles/Books presentations (50%)
Course 1
Meet Lena!
The First Ladyy of the Internet
Course 1
Lenna
e a Sode
Soderberg
be g (Sjb
(Sjblom)
o )a
and
d
Jeff Seideman taken in May 1997
Imaging Science & Technology
Conference
Course 1
Wh t is
What
i Digital
Di it l Image
I
Processing?
P
i ?
f : D / 3
f(x,y) = intensity, gray level of the image at spatial point (x,y)
x, y, f(x,y) finite, discrete quantities digital image
Digital Image Processing = processing digital images by means of a digital
computer
A digital image is composed of a finite number of elements (location, value of
intensity):
( xi , y j , f ij )
These elements are called picture elements, image elements, pels, pixels
Course 1
Image processing is not limited to the visual band of the electromagnetic (EM)
spectrum
Image processing : gamma to radio waves, ultrasound, electron microscopy,
computer-generated images
image processing , image analysis , computer vision ?
Image processing = discipline in which both the input and the output of a
process are images
C
Computer
t Vi
Vision
i
= use computer
t tto emulate
l t h
human vision
i i (AI)
learning, making inferences and take actions
based on visual inputs
Image analysis (image understanding) = segmentation, partitioning images
into regions or objects
(link between image processing and image analysis)
Course 1
Distinction between image processing , image analysis , computer vision :

low-level, mid-level, high-level processes
L
Low-level
l
l processes: iimage preprocessing
i tto reduce
d
noise,
i
contrast
t t enhancement,
h
t
image sharpening; both input and output are images
Mid-level p
processes: segmentation,
g
p
partitioning
g images
g into regions
g
or objects,
j
description of the objects for computer processing,
classification/recognition of individual objects;
inputs are generally images, outputs are attributes extracted
from the input image (e
(e.g.
g edges
edges, contours
contours,
identity of individual objects)
High-level processes: making sense of a set of recognized objects;
performing the cognitive functions associated with vision
Course 1
Di it l IImage P
Digital
Processing
i (G
(Gonzalez
l +W
Woods)
d )=
processes whose inputs and outputs are images +
processes that extract attributes from images,
recognition of individual objects
(low and mid
(lowmid-level
level processes)
Example:
automated analysis of text =
acquiring an image containing text,
preprocessing the image (enhancement, sharpening),
extracting (segmenting) the individual characaters
characaters,
describing the characters in a form suitable for computer processing,
recognition of individual characters
Course 1
The Origins of DIP

Newspaper industry: pictures were sent by submarine cable between
London and New York
Before Bartlane cable picture transmission system (early 1920s) 1 week
With Bartlane system: less than 3 hours
Specialized printing equipment coded pictures for cable transmission and
reconstructed them at the receiving end
(1920s -5 distict levels of gray, 1929 15 levels)
This example is not DIP , the computer is not involved
DIP is linked and devolps in the same rhythm as digital computers
(data storage, display and transmission)
Course 1
A digital picture
produced in 1921
from a coded tape
by a telegraph
printer with
special type faces
(McFarlane)
A digital picture
made in 1922
from a tape
punched after the
signals had
crossed the Atlantic twice
(McFarlane)
Course 1
1964, Jet Propulsion Laboratory (Pasadena, California) processed pictures of the

moon transmitted by Ranger 7 (corrected image distortions)
The first picture of the

moon by a U.S. spacecraft.
Ranger 7 took this image
July 31, 1964, about
17 minutes before
impacting the lunar
surface.
(Courtesy of NASA)
Course 1
1960-1970 image processing techniques were used in medical image,

remote Earth resources observations, astronomy
1970s invention of CAT (computerized axial tomography)
http://www.virtualmedicalcentre.com/videos/cat-scans/793
CAT is a process in which a ring of detectors encircles an object (pacient),
and a X-ray source, concentric with the detector ring, rotates about the object.
The X-ray passes through the patient and are collected at the opposite end
b th
by
the d
detectors.
t t
A
As th
the source rotates
t t the
th procedure
d
is
i repeted.
t d
Tomography consists of algorithms that use the sense data to construct an
image that represent a slice through the object. Motion of the object in a
perpendiculare
p
to the ring
g of detectors p
produces a set of slices
direction p
which can be assembled in a 3D information of the inside of the object.
Course 1
g
geographers
g p
use DIP to study
yp
pollution p
patterns from aerial and satellite imagery
g y
archeology DIP allowed restoring blurred pictures that recorded rare artifacts
lost or damaged after being photographed
physics enhance images of experiments (high-energy plasmas,
electron microscopy)
astronomy, biology, nuclear medicine, law enforcement, industry
DIP used in solving problems dealing with machine perception
extracting from an image information suitable for computer processing
(statistical moments, Fourier transform coefficients, )
automatic character recognition, industrial machine vision for product
assembly
bl and
d iinspection,
i
military
ili
recognizance,
i
automatic
i processing
i off
fingerprints, machine processing of aerial and satellite imagery for
weather prediction, Internet
Course 1
Examples of Fields that Use DIP

Images can be classified according to their sources (visual, X-ray, )
Energy sources for images :
electromagnetic energy spectrum,
acoustic,
ultrasonic,
electronic,
computer- generated
Course 1
Electromagnetic waves can be thought as propagating sinusoidal

waves of different wavelength, or as a stream of massless particles,
each moving in a wavelike pattern with the speed of light.
light Each
massless particle contains a certain amount (bundle) of energy. Each
bundle of energy is called a photon.
photon If spectral bands are grouped
according to energy per photon we obtain the spectrum shown in the
image above, ranging from gamma
gamma-rays
rays (highest energy) to radio
waves (lowest energy).
Course 1
Course 1
Gamma Ray Imaging

Gamma-Ray
Nuclear medicine , astronomical observations
Nuclear medicine
the approach is to inject a patient with a radioactive isotope that emits
gamma rays as it decays
decays.
Images are produced from the emissions collected by gamma-ray detectors
Images of this sort are used to locate sites of bone pathology (infections, tumors)
PET (positron emision tomography) the patient is given a radioactive
isotope that emits positrons as it decays
Course 1
Examples of gamma-ray imaging
Bone scan
PET image
Course 1
X-ray imaging
Medical diagnostic,industry, astronomy
A X-ray tube is a vacuum tube with a cathode and an anode.
The cathode is heated, causing free electrons to be released.
The electrons flows at high
g speed
p
to the p
positively
y charged
g anode.
When the electrons strike a nucleus, energy is released in the
form of a X-ray radiation. The energy (penetreting power) of the
X-rays is controlled by a voltage applied across the anode, and
by a curent applied to the filament in the cathode
cathode.
The intensity of the X-rays is modified by absorbtion as they pass
through the patient and the resulting energy falling develops it
much in the same way that light develops photographic film.
Course 1
A i
Angiography
h = contrast-enhancement
t t h
t radiography
di
h
Angiograms = images of blood vessels
A catheter is inserted into an artery or vin in the groin. The catheter is threaded
into the blood vessel and guided to the area to be studied. When it reaches
the area to be studied, a X-ray contrast medium is injected through the catheter.
This enhances contrast of the blood vessels and enables radiologist to see any
irregularities or blockages.
X-rays are used in CAT (computerized axial tomography)
X-rays used in industrial processes (examine circuit boards for flows in manifacturing)
Industrial CAT scans are useful when the parts can be penetreted by X-rays
Course 1
Examples of X-ray imaging
Chest X-ray
Aortic angiogram
Head CT
Cygnus Loop
Circuit boards
Course 1
Imaging in the Ultraviolet Band

Litography, industrial inspection, microscopy, biological imaging,
astronomical observations
Ultraviolet light is used in fluorescence microscopy
Ultraviolet light is not visible to human eye but when a photon of ultraviolet
radiation collides with an electron in an atom of a fluorescent material it
elevates the electron to a higher energy level. After that the electron
relaxes to a lower level and emits light in the form of a lower-energy
photon
h t in
i th
the visible
i ibl ((red)
d) lilight
ht region.
i
Fluorescence = emission of light by a substance that has absorbed light or
other electromagentic radiation of a different wavelength
p = uses an excitation light
g to irradiate a p
prepared
p
specimen
p
Flourescence microscope
and then it separates the much weaker radiating
fluorescent light from the brighter excitation light.
Course 1
Imaging in the Visible and Infrared Bands

Light
Li ht microscopy,
i
astronomy,
t
remote
t sensing,
i
iindustry,
d t llaw enforcement
f
t
LANDSAT satellite obtained and transmitted images of the Earth from space
for p
purpose
p
of monitoring
g environmental conditions
on the planet
Weather observations and prediction produce major applications of multispectral
image from satellites
Course 1
Examples of light microscopy
Taxol
(anticancer agent)
magnified 250X
Nickel oxide
thin film
(600X)
C o este o
Cholesterol
(40X)
Surface of
audio CD
(1750X)
Microprocessor
(60X)
Organic
superconductor
(450X)
Course 1
A t
Automated
t d visual
i
l iinspection
ti off manufactured
f t d goods
d
a b
c d
e f
a a circuit board controller

b packaged
k
d pilles
ill
c bottles
d air bubbles in clear-plastic
product
e cereall
f image of intraocular implant
Course 1
Course 1
Imaging in the Microwave Band

The dominant aplicationof imageing in the microwave band radar
radar has the ability to collect data over virtually any region at any time
time,
regardless of weather or ambient light conditions
some radar waves can penetrate clouds, under certain conditions can
penetrate vegetation, ice, dry sand
sometimes radar is the only way to explore inaccessible regions of the
Earths
Earth
s surface
An imaging radar works like a flash camera : it provides its own illumination
(microwave pulses) to light an area on the ground and take a snapshot image.
I t d off a camera lens,
Instead
l
a radar
d uses an antenna
t
and
d a digital
di it l d
device
i tto
record the images. In a radar image one can see only the microwave energy
that was reflected back toward the radar antenna
Course 1
Imaging in the Radio Band

medicine, astronomy
MRI = Magnetic Resonance Imaging
This technique places the patient in a powerful magnet and passes short pulses
of radio waves through his or her body
body.
Each pulse causes a responding pulse of radio waves to be emited
by the patinet tissues.
The location from which these signals originate and their strength are
determined by a computer, which produces a 2D picture of a section of the patient
Course 1
MRI images off a human knee (left)

( f ) and spine (right)
(
)
Course 1
Images of the Crab Pulsar covering the electromagnetic spectrum
Gamma
X-ray
Optical
Infrared
Radio
Course 1
Other Imaging Modalities

acoustic imaging, electron microscopy, synthetic
((computer-generated)
p
g
) imaging
g g
Imaging using sound geological explorations, industry, medicine
Mineral and oil exploration
For image acquisition over land one of the main approaches is to use a
large truck an a large flat steel plate.
The plate is pressed on the ground by the truck and the truck is vibrated
through a frequency spectrum up to 100 Hz.
The strength and the speed of the returning sound waves are determined
by the composition of the Earth below the surface
surface.
These are analysed by a computer and images are generated from the
resulting analysis.
Course 1
Fundamental Steps in DIP

methods whose input and output are images
methods whose inputs are images but whose outputs are
attributes extracted from those images
Course 1
O t t are images
Outputs
i
image acquisition
image filtering and enhancement
image restoration
color image processing
wavelets and multiresolution processing

p
g
compression
morphological processing
Course 1
Outputs are attributes

morphological processing
segmentation
representation and description
object recognition
Course 1
I
Image
acquisition
i iti
- may involve
i
l preprocessing
i such
h as scaling
li
Image enhancement
manipulating an image so that the result is more suitable than
the original for a specific operation
enhancement is problem oriented
there is no general theory of image enhancement
enhancement use subjective methods for image emprovement
enhancement is based on human subjective preferences regarding
what is a good
good enhancement result
Course 1
Image restoration
improving the appearance of an image
j
- the techniques
q
for restoration are based on
restoration is objective
mathematical or probabilistic models of image degradation
Color image processing

fundamental concept in color models
basic color processing in a digital domain
Wavelets and multiresolution processing

representing images in various degree of resolution
Course 1
Compression
reducing the storage required to save an image or the bandwidth
required
i d tto ttransmit
it it
Morphological processing
tools for extracting image components that are useful in the representation
and description of shape
a transition from processes that output images to processes that output
image attributes
Course 1
Segmentation
partitioning an image into its constituents parts or objects
autonomous segmentation is one of the most difficult tasks of DIP
the more accurate the segmentation , the more likley recognition is to succeed
Representation and description (almost always follows segmentation)
segmentation
t ti produces
d
either
ith th
the b
boundary
d
off a region
i or allll th
the poits
it iin th
the
region itself
converting
g the data p
produced by
y segmentation
g
to a form suitable for
computer processing
Course 1
boundary representation: the focus is on external shape characteristics such

as corners or inflections
complete
l t region:
i
th
the ffocus iis on iinternal
t
l properties
ti such
h as ttexture
t
or
skeletal shape
description
p
is also called feature extraction extracting
g attributes that result
in some quantitative information of interest or are basic for differentiating
one class of objects from another
Object recognition
the process of assigning a label (e.g. vehicle) to an object based on its
descriptors
Knowledge database
Course 1
Simplified diagram
of a cross section
of the human eye
Course 1
Course 1
Three membranes enclose the eye:

the cornea and sclera outer cover
the choroid
the retina
The cornea is a tough, transparent tissue that covers the anterior surface of the eye.
Continuous with the cornea, the sclera is an opaque membrane that encloses
the remainder of the optic globe.
The choroid lies directly below the sclera. This membrane contains a network of
blood vessels (major nutrition of the eye). The choroid is pigmented and help reduce
the amount of light entering the eye
The choroid is divided (at its anterior extreme) into the ciliary body and the iris.
Th iris
The
i i contracts
t t and
d expands
d tto control
t l th
the amountt off light
li ht
Course 1
The lens
Th
l
i made
is
d up off concentric
t i llayers off fib
fibrous cells
ll and
d iis suspended
d db
by
fibers that attach to ciliary body (60-70% water, 6% fat, protein). The lens is
colored in slightly yellow. The lens absorbs approximatively 8% of the visible
g ((infrared and ultraviolet light
g are absorbed by
yp
proteins in lens))
light
The innermost membrane is the retina. When the eye is proper focused,
light from an object outside the eye is imaged on the retina.
Vision is possible because of the distribution of discrete light receptors on the
surface of the retina: cones and rods (6-7 milion cones, 75-150 milion rods),
Cones: located in the central part of the retina (fovea), they are sensitive to
colors, vision of detail, each cone is link to its own nerve
cone vision = photopic or bright-light vision
Fovea = the place where the image of the object of interest falls on
Course 1
Rods : distributed over al the retina surface, several rods are contected to
a single nerve, not specialized in detail vision,
serve to give a general, overall picture of the filed of view
not involved in color vision
sensitive to low level of illumination
Blind spot: region without receptors
Course 1
Distribution of
rods and cones
in the retina
Course 1
I
Image
formation
f
ti in
i the
th eye
Ordinary photographic camera: the lens has fixed focal length, focusing at
various distances is done by modifying the distance between the lens and the
image plane (were the film or imaging chip are located)
Human eye: the distance between the lens and the retina (the imaging region)
i fifixed,
is
d th
the focal
f
l length
l
th needed
d d to
t achieve
hi
proper ffocus iis obtained
bt i d b
by varying
i
the shape of the lens (the fibers in the ciliary body accomplish this,
flattening or thickening the lens for distant or near objects, respectively.
distance between lens and retina along visual axix = 17 mm
range of focal length = 14 mm to 17 mm
Course 1
Course 1
Illustration of Mach band effect

Percieved intensity
is not a simple function
of actual intensity
Course 1
All the inner squares have the same intensity, but they appear progressively darker as the
background becomes lighter
Course 1
Optical illusions
Course 1

Course 2

Course 2
- achromatic or monochromatic light - light that is void of color

- the attribute of such light is its intensity, or amount.
- gray level is used to describe monochromatic intensity because
it ranges from black, to grays, and to white.
- chromatic light spans the electromagnetic energy spectrum
from approximately 0.43 to 0.79 mm.
- quantities that describe the quality of a chromatic light source:
o radiance
the total amount of energy that flows from the light
source, and it is usually measured in watts (W)
o luminance
measured in lumens (lm), gives a measure of the
amount of energy an observer perceives from a light
source.

Course 2
For example, light emitted from a source operating in the far

infrared region of the spectrum could have significant energy
(radiance), but an observer would hardly perceive it; its luminance
would be almost zero.
o brightness
a subjective descriptor of light perception that is
practically impossible to measure. It embodies the
achromatic notion of intensity and is one of the key
factors in describing color sensation.

Course 2

Course 2
f : D , f ( x , y ) the physical meaning is determined by the source of the image

Image generated from a physical process f(x,y) proportional to the energy radiated
by the physical source.
0 f ( x, y)
f(x,y) characterized by two components:
1. i(x,y) = illumination component, the amount of source illumination incident on the
scene being viewed ;
2. r(x,y) = reflectance component, the amount of illumination reflected by the objects
in the scene;
f ( x, y) i( x, y) r ( x, y)
0 i( x, y)
0 r ( x, y) 1

Course 2
r(x,y)=0 - total absorption
r(x,y)=1 - total reflectance
i(x,y) determined by the illumination source

r(x,y) determined by the characteristics of the imaged objects
Lmin l f ( x0 , y0 ) Lmax , Lmin imin rmin , Lmax imax rmax
Lmin 10 , Lmax 1000 indoor values without additional illumination
Lmin , Lmax is called gray (or intensity) scale

In practice: Lmin 0 , Lmax L 1 0, L 1 , l 0 black , l L 1 white

Course 2
Image Sampling and Quantization

converting a continuous image f to digital form
- digitizing (x,y) is called sampling

- digitizing f(x,y) is called quantization

Course 2
Continuous image projected onto a sensor array Result of image sampling and quantization

Course 2
Representing Digital Images

(x,y) x = 0,1,,M-1 , y = 0,1,,N-1 spatial variables or spatial coordinates
f (0,0)
f (1,0)
f ( x, y)
f ( M 1,0)
a0,0
a
1,0
A

a M 1,0
a0,1
a1,1
a M 1,1
f (0,1)
f (1,1)
f ( M 1,1)
f (0, N 1)
f (1, N 1)
f ( M 1, N 1)
a0, N 1
a f ( x i , y j ) f (i , j )
a1, N 1
MN , i , j
ai , j image element, pixel
a M 1, N 1
f(0,0) the upper left corner of the image

Course 2
M, N 0, L=2k
ai , j , ai , j [0, L 1]
Dynamic range of an image = the ratio of the maximum measurable intensity to the
minimum detectable intensity level in the system
Upper limit determined by saturation, lower limit - noise

Course 2
Number of bits required to store a digitized image:
b M N k ,
for M N , b N 2 k
When an image can have 2k intensity levels, the image is referred as a k-bit image
256 discrete intensity values 8-bit image

Course 2
Spatial and Intensity Resolution

Spatial resolution the smallest discernible detail in an image
Measures: line pairs per unit distance, dots (pixels) per unit distance
Image resolution = the largest number of discernible line pairs per unit distance
(e.g. 100 line pairs per mm)
Dots per unit distance are commonly used in printing and publishing
In U.S. the measure is expressed in dots per inch (dpi)
(newspapers are printed with 75 dpi, glossy brochures at 175 dpi)
Intensity resolution the smallest discernible change in intensity level
The number of intensity levels (L) is determined by hardware considerations

L=2k most common k = 8
Intensity resolution, in practice, is given by k (number of bits used to quantize intensity)

Course 2
Fig.1 Reducing spatial resolution: 1250 dpi(upper left), 300 dpi (upper right)
150 dpi (lower left), 72 dpi (lower right)

Course 2
Reducing the number of gray levels: 256, 128, 64, 32

Course 2
Reducing the number of gray levels: 16, 8, 4, 2

Course 2
Image Interpolation
- used in zooming, shrinking, rotating, and geometric corrections
Shrinking, zooming image resizing image resampling methods
Interpolation is the process of using known data to estimate values at unknown locations
Suppose we have an image of size 500 500 pixels that has to be enlarged 1.5 times to
750750 pixels. One way to do this is to create an imaginary 750 750 grid with the
same spacing as the original, and then shrink it so that it fits exactly over the original
image. The pixel spacing in the 750 750 grid will be less than in the original image.
Problem: assignment of intensity-level in the new 750 750 grid
Nearest neighbor interpolation: assign for every point in the new grid (750 750) the
intensity of the closest pixel (nearest neighbor) from the old/original grid (500 500).

Course 2
This technique has the tendency to produce undesirable effects, like severe distortion of
straight edges.
Bilinear interpolation assign for the new (x,y) location the following intensity:
v( x, y ) a x b y c x y d
where the four coefficients are determined from the 4 equations in 4 unknowns that can
be written using the 4 nearest neighbors of point (x,y).
Bilinear interpolation gives much better results than nearest neighbor interpolation, with a
modest increase in computational effort.
Bicubic interpolation assign for the new (x,y) location an intensity that involves the 16
nearest neighbors of the point:
3
v ( x , y ) ci , j x i y j
i 0 j0
The coefficients ci,j are obtained solving a 16x16 linear system:

Course 2
3
c
i 0 j 0
i j
x
y intensity levels of the 16 nearest neighbors of ( x , y )
i, j
Generally, bicubic interpolation does a better job of preserving fine detail than the
bilinear technique. Bicubic interpolation is the standard used in commercial image editing
programs, such as Adobe Photoshop and Corel Photopaint.
Figure 2 (a) is the same as Fig. 1 (d), which was obtained by reducing the resolution of
the 1250 dpi in Fig. 1(a) to 72 dpi (the size shrank from 3692 2812 to 213 162) and
then zooming the reduced image back to its original size. To generate Fig. 1(d) nearest
neighbor interpolation was used (both for shrinking and zooming).
Figures 2(b) and (c) were generated using the same steps but using bilinear and bicubic
interpolation, respectively. Figures 2(d)+(e)+(f) were obtained by reducing the resolution
from 1250 dpi to 150 dpi (instead of 72 dpi)

Course 2
Fig. 2 Interpolation examples for zooming and shrinking (nearest neighbor, linear, bicubic)

Course 2
Neighbors of a Pixel
A pixel p at coordinates (x,y) has 4 horizontal and vertical neighbors:
horizontal: ( x 1, y ) , ( x 1, y ) ; vertical: ( x , y 1) , ( x , y 1)
This set of pixels, called the 4-neighbors of p, denoted by N4 (p).
The 4 diagonal neighbors of p have coordinates:
( x 1, y 1) , ( x 1, y 1) , ( x 1, y 1) , ( x 1, y 1)
and are denoted ND(p).
The horizontal, vertical and diagonal neighbors are called the 8-neighbors of p, denoted
N8 (p).
If (x,y) is on the border of the image some of the neighbor locations in ND(p) and N8(p)
fall outside the image.

Course 2
Adjacency, Connectivity, Regions, Boundaries

Denote by V the set of intensity levels used to define adjacency.
- in a binary image V {0,1} (V={0} , V={1})
- in a gray-scale image with 256 possible gray-levels, V can be any subset of {0,255}
We consider 3 types of adjacency:
(a) 4-adjacency : two pixels p and q with values from V are 4-adjacent if q N 4 ( p )
(b) 8-adjacency : two pixels p and q with values from V are 8-adjacent if q N 8 ( p )
(c) m-adjacency (mixed adjacency) : two pixels p and q with values from V are
m-adjacent if :
q N 4 ( p ) or
q N D ( p ) and the set N 4 ( p ) N 4 (q ) has no pixels whose values are from V.
Mixed adjacency is a modification of 8-adjacency. It is introduced to eliminate the

ambiguities that often arise when 8-adjacency is used. Consider the example:

Course 2
V {1} binar image
1
1
0
, 0
1 1

1
0

0
1
, 0
1 1
1
0
0
1
The three pixels at the top (first line) in the above example show multiple (ambiguous)
8-adjacency, as indicated by the dashed lines. This ambiguity is removed by using
m-adjacency.

Course 2
A (digital) path (or curve) from pixel p with coordinates (x,y) to q with coordinates (s,t)
is a sequence of distinct pixels with coordinates:
( x0 , y0 ) ( x , y ) , ( x1 , y1 ), ... , ( xn , yn ) ( s , t )
( xi 1 , yi 1 ) and ( xi , yi ) are adjacent, i 1, 2,..., n
The length of the path is n. If ( x0 , y0 ) ( xn , yn ) the path is closed.
Depending on the type of adjacency considered the paths are: 4-, 8-, or m-paths.
Let S denote a subset of pixels in an image. Two pixels p and q are said to be connected
in S if there exists a path between them consisting only of pixels from S.
S is a connected set if there is a path in S between any 2 pixels in S.
Let R be a subset of pixels in an image. R is a region of the image if R is a connected set.
Two regions R1 and R2 are said to be adjacent if R1 R2 form a connected set. Regions
that are not adjacent are said to be disjoint. When referring to regions only 4- and
8-adjacency are considered.

Course 2
Suppose that an image contains K disjoint regions, Rk , k 1,..., K , none of which

touches the image border.
Ru
R
k 1
, ( Ru )c the complement of Ru
We call al the points in Ru the foreground of the image and the points in ( Ru )c the
background of the image.
The boundary (border or contour) of a region R is the set of point that are adjacent to
points in the complement of R, (R)c. The border of an image is the set of pixels in the
region that have at least one background neighbor. This definition is referred to as the
inner border to distinguish it from the notion of outer border which is the corresponding
border in the background.

Course 2
Distance measures
For pixels p, q, and z, with coordinates (x,y), (s,t) and (v,w) respectively, D is a distance
function or metric if:
(a) D(p,q) 0 , D(p,q) = 0 iff p=q

(b) D(p,q) = D(q,p)
(c) D(p,z) D(p,q) + D(q,z)
The Euclidean distance between p and q is defined as:
1
2
De ( p, q ) ( x s ) ( y t ) ( x s )2 ( y t )2
2
The pixels q for which De ( p, q ) r are the points contained in a disk of radius r
centered at (x,y).

Course 2
The D4 distance (also called city-block distance) between p and q is defined as:
D4 ( p, q ) | x s | | y t |
The pixels q for which D4 ( p, q ) r form a diamond centered at (x,y).
2
2 1
D4 2
2 1 0 1
2 1
The pixels with D4 = 1 are the 4-neighbors of (x,y).

The D8 distance (called the chessboard distance) between p and q is defined as:
D8 ( p, q ) max{| x s | ,| y t |}
The pixels q for which D8 ( p, q ) r form a square centered at (x,y).

Course 2
2 2 2 2 2
D8 2
2 1 1 1
2 1 0 1
2 1 1 1
2 2 2 2 2
The pixels with D8 = 1 are the 8-neighbors of (x,y).

D4 and D8 distances are independent of any paths that might exist between p and q
because these distances involve only the coordinates of the point. If we consider
m-adjacency, the distance Dm is defined as:
Dm(p,q)= the shortest m-path between p and q
Dm depends on the values of the pixels along the path as well as the values of their
neighbors. Consider the following example:

Course 2
p3 {0,1}
p1 {0,1}
p4 1
p2 1
p1
Consider V={1}.
If p1 = p3 = 0 then Dm(p , p4) = 2.
If p1 = 1 , then p2 and p are no longer m-adjacent then Dm(p , p4) = 3 (p, p1, p2, p4).
If p1 = 0, p3 = 1 then Dm(p , p4) = 3.
If p1 = p3 = 1 then Dm(p , p4) = 4 (p, p1, p2, p3, p4).

Course 2
Array versus Matrix Operations
An array operation involving one or more images is carried out on a pixel-by-pixel basis.
a11
a
21
a12
a22
b11
b
21
b12
b22
Array product:
a11
a
21
a12 b11
a22 b21
b12
a11b11
a b
b22
21 21
a12b12
a22b21
Matrix product:
a11
a
21
a12 b11
a22 b21
b12
a11b11 a12b21
a b a b
b22
22 21
21 11
We assume array operations unless stated otherwise!
a11b12 a12b21
a21b12 a22b22

Course 2
Linear versus Nonlinear Operations
One of the most important classifications of image-processing methods is whether it is

linear or nonlinear.
H f ( x , y ) g ( x , y )
H is said to be a linear operator if:
H a f1 ( x , y ) b f 2 ( x , y ) a H f1 ( x , y ) b H f 2 ( x , y )
a , b , f1 , f 2 images
Example of nonlinear operator:

H f max{ f ( x , y )} the maximum value of the pixels of image f
0 2
6 5
f1
, f 2 4 7 , a 1, b 1
2
3

Course 2
0 2
6 3
6 5
maxa f1 b f 2 max 1
1)
max
4 7
2
2
3
0 2
6 5
1 max
1)
max
3 ( 1)7 4
2 3
4 7
Arithmetic Operations in Image Processing
Let g(x,y) denote a corrupted image formed by the addition of noise:

g( x , y ) f ( x , y ) ( x , y )
f(x,y) the noiseless image ; (x,y) the noise, uncorrelated and has 0 average value.

Course 2
For a random variable z with mean m, E[(z-m)2] is the variance (E( ) is the expected
value). The covariance of two random variables z1 and z2 is defined as E[(z1-m1) (z2-m2)].
The two random variables are uncorrelated when their covariance is 0.
Objective: reduce noise by adding a set of noisy images gi ( x , y ) (technique frequently
used in image enhancement)
1
g( x, y)
K
g ( x, y)
i 1
If the noise satisfies the properties stated above we have:

E g ( x, y) f ( x, y)
g2 ( x , y )
1 2
( x, y)
K
E ( g ( x , y )) is the expected value of g , and g2 ( x , y ) and 2( x , y ) are the variances of

g and , respectively. The standard deviation (square root of the variance) at any point in
the average image is:

Course 2
g( x, y)
1
( x, y)
K
As K increases, the variability (as measured by the variance or the standard deviation) of
the pixel values at each location (x,y) decreases. Because E g ( x , y ) f ( x , y ) , this
means that g ( x , y ) approaches f(x,y) as the number of noisy images used in the
averaging process increases.
An important application of image averaging is in the field of astronomy, where imaging
under very low light levels frequently causes sensor noise to render single images
virtually useless for analysis. Figure 2.26(a) shows an 8-bit image in which corruption
was simulated by adding to it Gaussian noise with zero mean and a standard deviation of
64 intensity levels. Figures 2.26(b)-(f) show the result of averaging 5, 10, 20, 50 and 100
images, respectively.

Course 2
a b c
d e f
Fig. 3 Image of Galaxy Pair NGC 3314 corrupted by additive Gaussian noise (left corner); Results of averaging 5, 10, 20, 50,
100 noisy images

Course 2
A frequent application of image subtraction is in the enhancement of differences between

images.
(a)
(b)
(c)
Fig. 4 (a) Infrared image of Washington DC area; (b) Image obtained from (a) by setting to zero the least
significant bit of each pixel; (c) the difference between the two images
Figure 4(b) was obtained by setting to zero the least-significant bit of every pixel in
Figure 4(a). The two images seem almost the same. Figure 4(c) is the difference between

Course 2
images (a) and (b). Black (0) values in Figure (c) indicate locations where there is no
difference between images (a) and (b).
Mask mode radiography
g ( x , y ) f ( x , y ) h( x , y )
h(x,y) , the mask, is an X-ray image of a region of a patients body, captured by an
intensified TV camera (instead of traditional X-ray film) located opposite an X-ray
source. The procedure consists of injecting an X-ray contrast medium into the patients
bloodstream, taking a series of images called live images (denoted f(x,y)) of the same
anatomical region as h(x,y), and subtracting the mask from the series of incoming live
images after injection of the contrast medium.
In g(x,y) we can find the differences between h and f, as enhanced detail.

Course 2
Images being captured at TV rates, we obtain a movie showing how the contrast medium
propagates through the various arteries in the area being observed.
a b
c d
Fig. 5 Angiography subtraction
example
(a) mask image; (b) live image ;
(c) difference between (a) and (b);
(d) - image (c) enhanced

Course 2
An important application of image multiplication (and division) is shading correction.

Suppose that an imaging sensor produces images in the form:
g ( x , y ) f ( x , y ) h( x , y )
f(x,y) the perfect image , h(x,y) the shading function
When the shading function is known:
f ( x, y)
g( x , y )
h( x , y )
h(x,y) is unknown but we have access to the imaging system, we can obtain an
approximation to the shading function by imaging a target of constant intensity. When the
sensor is not available, often the shading pattern can be estimated from the image.

Course 2
(a)
(b)
(c)
Fig. 6 Shading correction (a) Shaded image of a tungsten filament, magnified 130 ; (b) - shading pattern ; (c) corrected image

Course 2
Another use of image multiplication is in masking, also called region of interest (ROI),
operations. The process consists of multiplying a given image by a mask image that has
1s (white) in the ROI and 0s elsewhere. There can be more than one ROI in the mask
image and the shape of the ROI can be arbitrary, but usually is a rectangular shape.
(a)
(b)
(c)
Fig. 7 (a) digital dental X-ray image; (b) - ROI mask for teeth with fillings; (c) product of (a) and (b)

Course 2
In practice, most images are displayed using 8 bits the image values are in the range
[0,255].
TIFF, JPEG images conversion to this range is automatic. The conversion depends on
the system used.
Difference of two images can produce image with values in the range [-255,255]
Addition of two images range [0,510]
Many software packages simply set the negative values to 0 and set to 255 all values
greater than 255.
A more appropriate procedure: compute
f m f min( f )
which creates an image whose minimum value is 0, then we perform the operation:
fs K
fm
0 , K
max( f m )
( K 255 )

Course 2
Spatial Operations
- are performed directly on the pixels of a given image.
There are three categories of spatial operations:
single-pixel operations
neighborhood operations
geometric spatial transformations
Single-pixel operations
- change the values of intensity for the individual pixels
s T (z)
where z is the intensity of a pixel in the original image and s is the intensity of the
corresponding pixel in the processed image. Fig. 2.34 shows the transformation used to
obtain the negative of an 8-bit image

Course 2
Intensity transformation
function for the
complement of an 8-bit
image
Original digital mammogram
Negative image of the mammogram

Course 2
Neighborhood operations
Let Sxy denote a set of coordinates of a neighborhood centered on an arbitrary point (x,y)
in an image, f. Neighborhood processing generates an new intensity level at point (x,y)
based on the values of the intensities of the points in Sxy. For example, if Sxy is a
rectangular neighborhood of size m n centered in (x,y), we can assign the new value of
intensity by computing the average value of the pixels in Sxy.
g( x , y )
1
f (r , c)
m n ( r ,c )S xy
The net effect is to perform local blurring in the original image. This type of process is
used, for example, to eliminate small details and thus render blobs corresponding to the
largest region of an image.

Course 2
Aortic angiogram
Result of applying an averaging filter

(m=n=41)

Course 3

Course 3
Spatial Operations
- are performed directly on the pixels of a given image.
There are three categories of spatial operations:
single-pixel operations
neighborhood operations
geometric spatial transformations
Single-pixel operations
- change the values of intensity for the individual pixels
s T (z)
where z is the intensity of a pixel in the original image and s is the intensity of the
corresponding pixel in the processed image.

Course 3
Neighborhood operations
Let Sxy denote a set of coordinates of a neighborhood centered on an arbitrary point (x,y)
in an image, f. Neighborhood processing generates an new intensity level at point (x,y)
based on the values of the intensities of the points in Sxy. For example, if Sxy is a
rectangular neighborhood of size m x n centered in (x,y), we can assign the new value of
intensity by computing the average value of the pixels in Sxy.
g( x , y )
1
f (r , c)
m n ( r ,c )S xy
The net effect is to perform local blurring in the original image. This type of process is
used, for example, to eliminate small details and thus render blobs corresponding to the
largest region of an image.

Course 3
Geometric spatial transformations and image registration

- modify the spatial relationship between pixels in an image
- these transformations are often called rubber-sheet transformations (analogous to
printing an image on a sheet of rubber and then stretching the sheer according to a
predefined set of rules.
A geometric transformation consists of 2 basic operations:
(a) a spatial transformation of coordinates
(b) intensity interpolation that assign intensity values to the spatial transformed
pixels
The coordinates transformation:
( x , y ) T [(v , w )]
(v,w) pixel coordinates in the original image
(x,y) pixel coordinates in the transformed image

Course 3
v w
T [( v , w )] ( , ) shrinks the original image half its size in both spatial directions
2 2
Affine transform
t11
[ x , y ,1] [v , w ,1]T [v , w ,1] t 21
t 31
t12
t 22
t 32
0
0
1
x t11v t 21 w t 31
y t12v t 22 w t 33
(AT)
This transform can scale, rotate, translate, or sheer a set of coordinate points, depending
on the elements of the matrix T. If we want to resize an image, rotate it, and move the
result to some location, we simply form a 3x3 matrix equal to the matrix product of the
scaling, rotation, and translation matrices from Table 1.

Course 3
Affine transformations

Course 3
The preceding transformations relocate pixels on an image to new locations. To complete

the process, we have to assign intensity values to those locations. This task is done by
using intensity interpolation (like nearest neighbor, bilinear, bicubic interpolation).
In practice, we can use equation (AT) in two basic ways:
forward mapping : scanning the pixels of the input image and, at each location (v,w),
computing the spatial location (x,y) of the corresponding in the image using (AT)
directly;
Problems:
- intensity assignment when 2 or more pixels in the original image are transformed to the
same location in the output image,
- some output locations have no correspondent in the original image (no intensity
assignment)

Course 3
inverse mapping: scans the output pixel locations, and at each location, (x,y),
computes the corresponding location in the input image (v,w)
( v , w ) T 1 ( x , y )
It then interpolates among the nearest input pixels to determine the intensity of the output
pixel value.
Inverse mapping are more efficient to implement than forward mappings and are used in
numerous commercial implementations of spatial transformations (MATLAB for ex.).

Course 3

Course 3
Image registration align two or more images of the same scene

In image registration, we have available the input and output images, but the specific
transformation that produced the output image from the input is generally unknown.
The problem is to estimate the transformation function and then use it to register the two
images.
- it may be of interest to align (register) two or more image taken at approximately the
same time, but using different imaging systems (MRI scanner, and a PET scanner).
- align images of a given location, taken by the same instrument at different moments of
time (satellite images)
Solving the problem: using tie points (also called control points), which are
corresponding points whose locations are known precisely in the input and reference
image.

Course 3
How to select tie points?

- interactively selecting them
- use of algorithms that try to detect these points
- some imaging systems have physical artifacts (small metallic objects) embedded in the
imaging sensors. These objects produce a set of known points (called reseau marks)
directly on all images captured by the system, which can be used as guides for
establishing tie points.
The problem of estimating the transformation is one of modeling. Suppose we have a set
of 4 tie points both on the input image and the reference image. A simple model based on
a bilinear approximation is given by:
x c1v c2 w c3v w c4
y c5v c6 w c7 v w c8
(v,w) and (x,y) are the coordinates of the tie points (we get a 8x8 linear system for {ci })

Course 3
When 4 tie points are insufficient to obtain satisfactory registration, an approach used
frequently is to select a larger number of tie points and using this new set of tie points
subdivide the image in rectangular regions marked by groups of 4 tie points. On the
subregions marked by 4 tie points we applied the transformation model described above.
The number of tie points and the sophistication of the model required to solve the register
problem depend on the severity of the geometrical distortion.

Course 3
ab
cd
(a) reference image
(b) geometrically distorted image
(c) - registered image
(d) difference between (a) and (c)

Course 3
Probabilistic Methods
zi = the values of all possible intensities in an MxN digital image, i=0,1,,L-1

p(zk) = the probability that the intensity level zk occurs in the given image
p( z k )
nk
MN
nk = the number of times that intensity zk occurs in the image (MN is the total number of
pixels in the image)
L 1
p( z
k 0
)1
The mean (average) intensity of an image is given by:

L 1
m zk p( zk )
k 0

Course 3
The variance of the intensities is:

L 1
( zk m )2 p( zk )
2
k 0
The variance is a measure of the spread of the values of z about the mean, so it is a
measure of image contrast. Usually, for measuring image contrast the standard deviation
( ) is used.
The n-th moment of a random variable z about the mean is defined as:
L 1
n ( z ) ( z k m ) n p( z k )
k 0
( 0 ( z ) 1 , 1 ( z ) 0 , 2 ( z ) 2 )
3 ( z ) 0 the intensities are biased to values higher than the mean ;

( 3 ( z ) 0 the intensities are biased to values lower than the mean) ;

Course 3
3 ( z ) 0 the intensities are distributed approximately equally on both side of the

mean
Fig.1
(a)
Low contrast
(b) medium contrast
(c) high contrast
Figure 1(a) standard deviation 14.3 (variance = 204.5)

Figure 1(b) standard deviation 31.6 (variance = 998.6)
Figure 1(c) standard deviation 49.2 (variance = 2420.6)

Course 3
Intensity Transformations and Spatial Filtering

g ( x , y ) T f ( x , y )
f(x,y) input image , g(x,y) output image , T an operator on f defined over a
neighborhood of (x,y).
- the neighborhood of the point (x,y), Sxy usually is rectangular, centered on (x,y), and
much smaller in size than the image

Course 3
- spatial filtering, the operator T (the neighborhood and the operation applied on it) is
called spatial filter (spatial mask, kernel, template or window)
S xy {( x , y )} T becomes an intensity (gray-level or mapping) transformation function

s T (r )
s and r are denoting, respectively, the intensity of g and f at (x,y).
Fig. 2
Intensity transformation functions.
left - contrast stretching
right - thresholding function
Figure 2 left - T produces an output image of higher contrast than the original, by
darkening the intensity levels below k and brightening the levels above k this technique
is called contrast stretching.

Course 3
Figure 2 right - T produces a binary output image. A mapping of this form is called
thresholding function.
Some Basic Intensity Transformation Functions

Image Negatives
The negative of an image with intensity levels in [0 , L-1] is obtain using the function
s T (r ) L 1 r
- equivalent of a photographic negative
- technique suited for enhancing white or gray detail embedded in dark regions of an
image

Course 3
Fig. 3
Left original digital mammogram
Right negative transformed image

Course 3
Log Transformations
s T ( r ) c log(1 r ) , c - constant , r 0
Some basic intensity transformation functions

Course 3
This transformation maps a narrow range of low intensity values in the input into a wider
range. An operator of this type is used to expand the values of dark pixels in an image
while compressing the higher-level values. The opposite is true for the inverse log
transformation. The log functions compress the dynamic range of images with large
variations in pixel values.
ab
(a) Fourier spectrum
(b) log transformation
applied to (a), c=1
Fig. 4
Figure 4(a) intensity values in the range 0 to 1.5 x 106

Figure 4(b) = log transformation of Figure 4(a) with c=1 range 0 to 6.2

Course 3
Power-Law (Gamma) Transformations

s T ( r ) c r , c , - positive constants ( s c( r ) )
Plots of gamma transformation for different values of (c=1)

Course 3
Power-law curves with 1 map a narrow range of dark input values into a wider range
of output values, with the opposite being true for higher values of input values. The
curves with 1 have the opposite effect of those generated with values of 1 .
c 1 - identity transformation.
A variety of devices used for image capture, printing, and display respond according to a
power law. The process used to correct these power-law response phenomena is called
gamma correction.

Course 3
ab
cd
(a) aerial image
(b) (d) results of applying gamma
transformation with c=1 and
=3.0, 4.0 and 5.0 respectively

Course 3
Piecewise-Linear Transformations Functions

Contrast stretching
- a process that expands the range of intensity levels in an image so it spans the full
intensity range of the recording tool or display device
ab
cd
Fig.5

Course 3
s1
r
r
1
s2 ( r r1 ) s1 ( r2 r )
T (r )
( r2 r1 )
( r2 r1 )
s2 ( L 1 r )
( L 1 r2 )
r [0, r1 ]
r [r1 , r2 ]
r [r2 , L 1]

Course 3
r1 s1 , r2 s2 identity transformation (no change)

r1 r2 , s1 0 , s2 L 1 thresholding function
Figure 5(b) shows an 8-bit image with low contrast.
Figure 5(c) - contrast stretching, obtained by setting the parameters r1 , s1 rmin ,0 ,
r2 , s2 rmax , L 1
where rmin and rmax denote the minimum and maximum gray levels
in the image, respectively. Thus, the transformation function stretched the levels linearly
from their original range to the full range [0, L-1].
Figure 5(d) - the thresholding function was used r1 , s1 m ,0 , r2 , s2 m , L 1
where m is the mean gray level in the image.
The original image on which these results are based is a scanning electron microscope
image of pollen, magnified approximately 700 times.

Course 3
Intensity-level slicing
- highlighting a specific range of intensities in an image
There are two approaches for intensity-level slicing:
1. display in one value (white, for example) all the values in the range of interest and in
another (say, black) all other intensities (Figure 3.11 (a))
2. brighten (or darken) the desired range of intensities but leaves unchanged all other
intensities in the image (Figure 3.11 (b)).

Course 3
Highlights intensity range [A ,B]

and reduces all other intensities to a
lower level
Highlights range [A,B] and

preserves all other intensities
Figure 6 (left) aortic angiogram near the kidney. The purpose of intensity slicing is to
highlight the major blood vessels that appear brighter as a result of injecting a contrast
medium. Figure 6(middle) shows the result of applying technique 1. for a band near the
top of the scale of intensities. This type of enhancement produces a binary image which is
useful for studying the shape of the flow of the contrast substance (to detect blockages)

Course 3
In Figure 3.12(right) the second technique was used: a band of intensities in the midgray image around the mean intensity was set to black, the other intensities remain
unchanged.
Fig. 6 - Aortic angiogram and intensity sliced versions

Course 3
Bit-plane slicing
For a 8-bit image, f(x,y) is a number in [0,255], with 8-bit representation in base 2
This technique highlights the contribution made to the whole image appearances by each
of the bits. An 8-bit image may be considered as being composed of eight 1-bit planes
(plane 1 the lowest order bit, plane 8 the highest order bit)

Course 3

Course 3
The binary image for the 8-th bit plane of an 8-bit image can be obtained by processing
the input image with a threshold intensity transformation function that maps all the
intensities between 0 and 127 to 0 and maps all levels between 128 and 255 to 1.
The bit-slicing technique is useful for analyzing the relative importance of each bit in the
image helps in determining the proper number of bits to use when quantizing the image.
The technique is also useful for image compression.

Course 3
Histogram processing
The histogram of a digital image is with intensity levels in [0 , L-1]:
h( rk ) nk , k 0,1,..., L 1
rk the k -th intensity level
nk the number of pixels in the image with intensity rk
Normalized histogram for an M x N digital image:

p( rk )
nk
, k 0,1,..., L 1
MN
p( rk ) = an estimate of the probability of occurrence of intensity level rk in the image

L 1
p(r ) 1
k 0

Course 3
Fig. 8 dark and light images, low-contrast,

and high-contrast images and their
histograms

Course 3
Histogram Equalization
- determine a transformation function that seeks to produce an output image that has a
uniform histogram
s T (r ) , 0 r L 1
(a) T(r) monotonically increasing
(b) 0 T ( r ) L 1 for 0 r L 1
T(r) monotonically increasing guarantees that intensity values in output image will not
be less than the corresponding input values
Relation (b) requires that both input and output images have the same range of intensities

Course 3
Histogram equalization or histogram linearization transformation
( L 1) k
sk T ( rk ) L 1 pr ( rj )
nj
M
N
j 0
j 0
k
The output image is obtained by mapping each pixel in the input image with intensity rk
into a corresponding pixel with intensity sk in the output image.
Consider the following example: 3-bit image (L=8), 64x64 image (M=N=64, MN=4096)
Intensity distribution and histogram values for a 3-bit 6464 digital image

Course 3
0
s0 T ( r0 ) 7 pr ( rj ) 7 pr ( r0 ) 1.33
j 0
s1 T ( r1 ) 7 pr ( rj ) 7 pr ( r0 ) 7 pr ( r1 ) 3.08
j 0
s2 4.55 , s3 5.67 ,
s4 6.23 ,
s0 1.33 1
s1 3.08 3
s2 4.55 5
s3 5.67 6
s5 6.65 ,
s4
s5
s6
s7
6.23 6
6.65 7
6.86 7
7.00 7
s6 6.86 , s7 7.00

Course 3

Course 3

Course 3
Histogram Matching (Specification)

Sometimes is useful to be able to specify the shape of the histogram that we wish the
output image to have. The method used to generate a processed image that has a specified
histogram is called histogram matching or histogram specification.
Suppose {zq;q=0,,L-1} are the new values of histogram we desire to match.
Consider the histogram equalization transformation for the input image:
( L 1) k
sk T ( rk ) L 1 pr ( rj )
n j , k 0,1,..., L 1
M
N
j 0
j0
k
Consider the histogram equalization transformation for the new histogram:

q
G ( zq ) L 1 pz ( zi ) , q 0,1,..., L 1
i 0
T ( rk ) sk G ( zq ) for some value of q

z q G 1 ( sk )
(2)
(1)

Course 3
Histogram-specification procedure:
1) Compute the histogram pr(r) of the input image, and compute the histogram
equalization transformation (1). Round the resulting values sk to integers in [0, L-1]
2) Compute all values of the transformation function G using relation (2), where pz(zi)
are the values of the specified histogram. Round the values G(zq) to integers in the
range [0, L-1] and store these values in a table
3) For every value of sk ,k=0,1,,L-1 use the table for the values of G to find the
corresponding value of zq so that G(zq) is closest to sk and store these mappings
from s to z. When more than one value of zq satisfies the property (i.e., the mapping
is not unique), choose the smallest value by convention.
4) Form the histogram-specified image by first histogram-equalizing the input image
and then mapping every equalized pixel value, sk , of this image to the corresponding
value zq in the histogram-specified image using the mappings found at step 3).

Course 3
The intermediate step of equalizing the input image can bin skipped by combining the
two transformation functions T and G-1.
Reconsider the above example:
Fig. 9

Course 3
Figure 9(a) shows the histogram of the original image. Figure 9 (b) is the new histogram
to achieve.
The first step is to obtain the scaled histogram-equalized values:
s0 1
s1 3
s2 5
s3 6
s4
s5
s6
s7
6
7
7
7
Then we compute the values of G:

0
G ( z0 ) 7 pz ( zi ) 0.00 , G ( z1 ) G ( z2 ) 0.00 , G ( z3 ) 1.05 1

i 0
G ( z4 ) 2.45 2 , G ( z5 ) 4.55 5, G ( z6 ) 5.95 6 , G ( z7 ) 7.00 7

Course 3
The results of performing step 3) of the procedure are summarized in the next table:
In the last step of the algorithm, we use the mappings in the above table to map every
pixel in the histogram equalized image into a corresponding pixel in the newly-created

Course 3
histogram-specified image. The values of the resulting histogram are listed in the third
column of Table 3.2, and the histogram is sketched in Figure 9(d).
r0 0
r1 1
r2 2
r3 3
r4 4
r5 5
r6 6
r7 7
790
1023
850
656
329
245
122
81
s0 1
790
zq 3
s1 3
1023
zq 4
s2 5
850
zq 5
s3 s4 6
656 329
zq 6
s5 s6 s7 7 245 122 81
zq 7

Course 3
Local Histogram Processing

The histogram processing techniques previously described are easily adaptable to local
enhancement. The procedure is to define a square or rectangular neighborhood and move
the center of this area from pixel to pixel. At each location, the histogram of the points in
the neighborhood is computed and either a histogram equalization or histogram
specification transformation function is obtained. This function is finally used to map the
gray level of the pixel centered in the neighborhood. The center of the neighborhood
region is then moved to an adjacent pixel location and the procedure is repeated.
Updating the histogram obtained in the previous location with the new data introduced at
each motion step is possible.

Course 3

Course 3
Using Histogram Statistics for Image Enhancement

Let r denote a discrete random variable representing discrete gray-levels in [0, L-1], and
let p(ri) denote the normalized histogram component corresponding to the i-th value of
r. The n-th moment of r about its mean is defined as:
L 1
n ( r ) ( ri m )n p( ri )
i 0
m is the mean (average intensity) value of r:

L 1
m ri p( ri ) - measure of average intensity

i 0
L 1
2 ( r ) ( ri m )2 p( ri ) , measure of contrast
2
i 0
Sample mean and sample variance:
1
m
MN
M 1 N 1
x 0 y0
f ( x, y) ,
1

MN
2
M 1 N 1
f ( x, y) m
x 0 y0

Course 3
Spatial Filtering
The name filter is borrowed from frequency domain processing, where filtering means
accepting (passing) or rejecting certain frequency components. Filters that pass low
frequency are called lowpass filters. A lowpass filter has the effect of blurring
(smoothing) an image. The filters are also called masks, kernels, templates or windows.
The Mechanics of Spatial Filtering

A spatial filter consists of:
1) a neighborhood (usually a small rectangle)
2) a predefined operation performed on the pixels in the neighborhood
Filtering creates a new pixel with the same coordinates as the pixel in the center of the
neighborhood, and whose intensity value is modified by the filtering operation.

Course 3
If the operation performed on the image pixels is linear, the filter is called linear spatial
filter, otherwise the filter is nonlinear.
Fig. 10 Linear spatial filtering with a 3 3 filter mask

Course 3
In Figure 10 is pictured a 3 3 linear filter:
g ( x , y ) w ( 1, 1) f ( x 1, y 1) w ( 1,0) f ( x 1, y )
w (0,0) f ( x , y ) w (1,1) f ( x 1, y 1)
For a mask of size m n, we assume m=2a+1 and n=2b+1, where a and b are positive
integers. The general expression of a linear spatial filter of an image of size M N with a
filter of size m n is:
g( x , y )
s a
t b
w( s, t ) f ( x s, y t )
Spatial Correlation and Convolution

Correlation is the process of moving a filter mask over the image and computing the sum
of products at each location. Convolution is similar with correlation, except that the filter
is first rotated by 180.

Course 3
Correlation
w ( x , y ) f ( x , y )
s a
t b
s a
t b
w( s, t ) f ( x s, y t )
Convolution
w ( x , y ) f ( x , y )
w( s, t ) f ( x s, y t )
A function that contains a single 1 and the rest being 0s is called a discrete unit
impulse. Correlating a function with a discrete unit impulse produces a rotated
version of the filter at the location of the impulse.
Linear filters can be found in DIP literature also as: convolution filter,
convolution mask or convolution kernel.

Course 3
Vector Representation of Linear Filtering

mn
R w1 z1 w2 z2 wmn zmn wk zk w T z
k 1
Where the w-s are the coefficients of an mn filter and the z-s are the corresponding
image intensities encompassed by the filter.
R w1 z1 w2 z2 w9 z9 wk zk w T z , w , z 9
k 1

Course 3
Smoothing Linear Filters

A smoothing linear filter computes the average of the pixels contained in the
neighborhood of the filter mask. These filters are called sometimes averaging filters or
lowpass filters.
The process of replacing the value of every pixel in an image by the average of the
intensity levels in the neighborhood defined by the filter mask produces an image with
reduced sharp transitions in intensities. Usually random noise is characterized by such
sharp transitions in intensity levels smoothing linear filters are applied for noise reduction.
The problem is that edges are also characterized by sharp intensity transitions, so
averaging filters have the undesirable effect that they blur edges.
A major use of averaging filters is the reduction of irrelevant detail in an image (pixel
regions that are small with respect to the size of the filter mask).

Course 3
There is the possibility of using weighted average: the pixels are multiplied by different
coefficients, thus giving more importance (weight) to some pixels at the expense of other.
A general weighted averaging filter of size m n (m and n are odd) for an MN image is
given by the expression:
a
g( x , y )
w( s, t ) f ( x s, y t )
s a t b
w( s, t )
s a t b
x 0,1,..., M 1 , y 0,1,..., N 1

Course 3
ab
cd
ef
(a) original image 500500
(b) (f) results of smoothing with square averaging filters
of size m=3,5,9,15, and 35, respectively
The black squares at the top are of size 3, 5, 9, 15, 25, 35,
45, 55. The letters at the bottom range in size from 10 cu 24
points. The vertical bars are 5 pixels wide and 100 pixels
high, separated bu 20 pixels. The diameter of the circles is
25 pixels, and their borders are 15 pixels apart. The noisy
rectangles are 50120 pixels.

Course 3
An important application of spatial averaging is to blur an image for the purpose of

getting a gross representation of objects of interest, such that the intensity of smaller
objects blends with the background and larger object become blob like and easy to
detect. The size of the mask establishes the relative size of the objects that will
disappear in the background.
Left image from the Hubble Space Telescope, 528485; Middle Image filtered with a 1515 averaging mask;
Right result of averaging the middle image

Course 4

Course 4
Order-Statistic (Nonlinear) Filters

Order-statistic filters are nonlinear spatial filters based on
ordering (ranking) the pixels contained in the image area defined
by the selected neighborhood and replacing the value of the center
pixel with the value determined by the ranking result. The best
known filter in this class is the median filter, which replaces the
value of a pixel by the median of the intensity values in the
neighborhood of that pixel (the original value of the pixel is
included in the computation of the median). Median filters provide

Course 4
excellent noise-reduction capabilities, and are less blurring than

linear smoothing filters of similar size. Median filters are
particularly
effective
against
impulse
noise
(also
called
salt-and-pepper noise).
The median,, of a set of values is such that half the values in
the set are less than or equal to , and half are greater than or equal
to .
For a 3 3 neighborhood with intensity values (10, 15, 20, 20,
30, 20, 20, 25, 100) the median is =20.

Course 4
The effect of median filter is to force points with distinct intensity

levels to be more like their neighbors. Isolated clusters of pixels
that are light or dark with respect to their neighbors, and whose
m2
are eliminated by an m m median filter
area is less than
2
(eliminated means forced to the median intensity of the neighbors).

Max/min filter is the filter which replaces the intensity value of the
pixel with the max/min value of the pixels in the neighborhood.
The max/min filter is useful for finding the brightest/darkest points
in an image.

Course 4
Min filter 0% filter

Median filter 50% filter
Max filter 100% filter
(a)
(b)
(c)
(a) X-ray image of circuit board corrupted by salt&pepper noise
(b) noise reduction with a 33 averaging filter
(c) noise reduction with a 33 median filter

Course 4
Sharpening Spatial Filters

The principal objective of sharpening is to highlight transitions in
intensity. These filters are applied in electronic printing, medical
imaging, industrial inspection, autonomous guidance in military
systems.
Averaging analogous to integration
Sharpening spatial differentiation
Image differentiation enhances edges and other discontinuities
(noise, for example) and deemphasizes areas with slowly varying
intensities.

Course 4
For digital images, discrete approximation of derivatives are used
f
f ( x 1) f ( x )
x
2 f
f ( x 1) 2 f ( x ) f ( x 1)
2
x

Course 4
Illustration of the first and second derivatives of a 1-D digital function

Course 4
Using the Second Derivative for Image Sharpening

the Laplacian
Isotropic filters the response of this filter is independent of the
direction of the discontinuities in the image. Isotropic filters are
rotation invariant, in the sense that rotating the image and then
applying the filter gives the same result as applying the filter to the
image and then rotating the result.
The simplest isotropic derivative operator is the Laplacian:
2
2
f
2
f
2
2
x
y

Course 4
This operator is linear.

2 f
f ( x 1, y ) 2 f ( x , y ) f ( x 1, y )
2
x
2 f
f ( x , y 1) 2 f ( x , y ) f ( x , y 1)
2
y
2 f ( x , y ) f ( x 1, y ) f ( x 1, y ) f ( x , y 1) f ( x , y 1) 4 f ( x , y )

Course 4
Filter mask that approximate the Laplacian
The Laplacian being a derivative operator highlights gray-level

discontinuities in an image and deemphasizes regions with slowly
varying gray levels. This will tend to produce images that have

Course 4
grayish edge lines and other discontinuities, all superimposed on a

dark, featureless background. Background features can be
recovered while still preserving the sharpening effect of the
Laplacian operation simply by adding the original and Laplacian
images.
The basic way to use the Laplacian for image sharpening is given
by:
g ( x , y ) f ( x , y ) c 2 f ( x , y )
The (discrete) Laplacian can contain both negative and positive

values it needs to be scaled.

Course 4
Blurred image of the North Pole of the Moon; Lapalce filtered image
Sharpening with c=1 and c=2

Course 4
Unsharp Masking and Highboost Filtering

- process used in printing and publishing industry to sharpen
images
- subtracting an unsharp (smoothed) version of an image from
the original image
1.Blur the original image
2.Subtract the blurred image from the original (the resulting
difference is called the mask)
3.Add the mask to the original

Course 4
Let
f ( x, y)
be the blurred image. The mask is given by:

gmask ( x , y ) f ( x , y ) f ( x , y )
g ( x , y ) f ( x , y ) k gmask ( x , y )
k = 1 unsharp masking
k > 1 highboost filtering

Course 4
original image
blurred image (Gaussian filter 55, =3)
mask difference between the above images
unsharp masking result
highboost filter result (k=4.5)

Course 4
The Gradient for (Nonlinear) Image Sharpening

f
g x x
f grad ( f )
g y f
y
- it points in the direction of the greatest rate of change of f at

location (x,y).
The magnitude (length) of the gradient is defined as:
M ( x , y ) mag(f )
g x2 g 2y

Course 4
M(x,y) is an image of the same size as the original called the

gradient image (or simply as the gradient). M(x,y) is rotation
invariant (isotropic) (the gradient vector
is not isotropic). In
some application the following formula is used:

M ( x, y) gx g y
(not isotropic)
Different ways of approximating g x and g y produce different filter

operators.

Course 4
Roberts cross-gradient operator (1965)

g x f ( x 1, y 1) f ( x , y ) 1
g y f ( x , y 1) f ( x 1, y ) 2
M ( x , y ) 12 22
M ( x, y) 1 2

Course 4
Sobel operators
g x f ( x 1, y 1) 2 f ( x , y 1) f ( x 1, y 1)
f ( x 1, y 1) 2 f ( x , y 1)
f ( x 1, y 1)
g y f ( x 1, y 1) 2 f ( x 1, y ) f ( x 1, y 1)
f ( x 1, y 1) 2 f ( x 1, y )
f ( x 1, y 1)

Course 4
Roberts cross gradient operators
Sobel operators

Course 4
Filtering in the Frequency Domain

Filter: a device or material for suppressing or minimizing waves or
oscillations of certain frequencies
Frequency: the number of times that a periodic function repeats the
same sequence of values during a unit variation of the independent
variable
Fourier series and Transform

Fourier in a memoir in 1807 and published in 1822 in his book La
Thorie Analitique de la Chaleur states that any periodic function
can be expressed as the sum of sines and/or cosines of different

Course 4
frequencies, each multiplied by a different coefficient (called now

a Fourier series). Even function that are not periodic (but whose
area under the curve is finite) can be expressed as the integral of
sines and/or cosines multiplied by a weighing function the
Fourier transform. Both representation share the characteristic that
a function expressed in either a Fourier series or transform, can be
reconstructed (recovered) completely via an inverse process, with
no loss of information. It allows us to work in the Fourier
domain and then return to the original domain of the function
without losing any information.

Course 4
Complex Numbers
C Ri I ,
R, I , i 1 , R - real part , C imaginary part
C R i I the conjugate of the complex number C

C C ( cos i sin ) , C R 2 I 2 complex number in polar coordinates
e i cos i sin Euler's formula
C C e i

Course 4
Fourier series
f(t) a periodic function ( f ( t T ) f ( t ) t )
n
ce
f (t )
1
cn
T
T
2
T
f ( t )e
2 n
t
T
2 n
t
T
dt n 0, 1, 2,...
Impulses and the Sifting Property

A unit impulse located at t=0, denoted (t) is defined as:
(t )
0
if t 0
if t 0
satisfying
( t )dt 1

Course 4
Physically, an impulse may be interpreted as a spike of infinity

amplitude and zero duration, having unit area. An impulse has the
sifting property with respect to integration:
f ( t ) ( t )dt f (0) , f continuous in t 0
f ( t ) ( t t0 )dt f ( t0 ) , f continuous in t0
The unit discrete impulse, (x) is defined as:
1
( x)
0
if x 0
if x 0
satisfying
( x) 1

Course 4
The sifting property:
f ( x ) ( x ) f (0)
f ( x ) ( x x0 ) f ( x0 )
The impulse train, sT (t) :
s T ( t )
( t n T )

Course 4
The Fourier Transform of Function of One Continuous Variable
The Fourier transform of a continuous function f(t) of a

continuous variable t is:
F f ( t ) F ( )
f ( t )e i 2 t dt
Conversely, given F ( ) , we can obtain f(t) back using the inverse

Fourier transform, f ( t ) F 1 F ( ), given by:
f ( t ) F ( ) e i 2 t d

Course 4
F ( )
f ( t ) cos(2 t ) i sin(2 t ) dt
The sinc function:
sin( x )
sinc( x )
, sinc(0) 1
x

Course 4
The Fourier transform of the unit impulse:
F ( ) ( t )e i 2 t dt 1
F ( ) ( t t0 )e i 2 t dt cos(2 t0 ) i sin(2 t0 )
The Fourier series for the impulse train, sT:

1
s T ( t )
T
2 n
t
T
i 2Tn t
n
F e


Course 4
The Fourier transform of the periodic impulse train, S() is also an

impulse train:
1
S( )
T
Convolution
f ( t )h( t )
f ( s ) g ( t s ) ds , f , h continuous functions
F f ( t )h( t ) H ( )F ( )
F f ( t ) h( t ) H ( )F ( )

Course 4
Convolution
in
the
frequency
domain
is
analogous
to
multiplication in the spatial domain.

The convolution theorem is the foundation for filtering in the
frequency domain.
Sampling and the Fourier Transform of Sampled Functions

Continuous functions have to be converted into a sequence of
discrete valuees in order to be used by a computer. Consider a
continuous function, f(t), that we wish to sample at uniform

Course 4
intervals (T). We assume that the function extends from - to .

One way to model sampling is by using an impulse train function:
f ( t ) f ( t ) sT ( t )
f ( t ) ( t n T ) , f ( t ) sampled function
The value fk of an arbitrary sample in the sequence is given by:

fk
f ( t ) ( t k T )dt f ( k T )

Course 4

Course 4
The Fourier Transform of a Sampled Function

Let F () be the Fourier transform of a continuous function f (t)
and let
f ( t )
the sampled function. The Fourier transform of the
sampled function is:
F ( ) F f ( t ) = F f ( t ) sT ( t ) F ( )S ( ) ,
1
S( )
T
1
F ( ) F ( )S ( )
T

Course 4
The Fourier transform
F ( )
of the sampled function
f ( t )
is an
1
infinite, periodic sequence of copies of F(), the period is
.
T
The Sampling Theorem
Consider the problem of establishing the conditions under which a
continuous function can be recovered uniquely from a set of its
samples.
A function f(t) is called band-limited if its Fourier transform is 0
outside the interval [-max ,max].

Course 4
We can recover f(t) from its sampled version if we can isolate a

copy of F() from the periodic sequence of copies of this function
contained in F ( ) , the transform of the sampled function f ( t ) .

Course 4
1
Recall that F ( ) is continuous, periodic with period
. All we
T
need is one complete period to characterize the entire transform.
This implies that we can recover f(t) from that single period by
using the inverse Fourier transform.
Extracting from F ( ) a single period that is equal to F() is
possible if the separation between copies is sufficient, i.e.,
1
max
2 T
1
2 max
T

Course 4
Sampling Theorem
A continuous, band-limited function can be recovered completely
from a set of its samples if the samples are acquired at a rate
exceeding twice the highest frequency content of the function.
The number 2 max is called Nyquist rate.

Course 4

Course 4
To see how the recovery of F() from is possible F ( ) will

proceed as follows (see Figure 4.8).
T
H ( )
0
max max
otherwise
F ( ) H ( )F ( )
f ( t ) F ( ) e i 2 t d

Course 4
Function H() is called a lowpass filter because it passes

frequencies at the low end of frequency range but it eliminates
(filter out) all higher frequencies. It is also called an ideal lowpass
filter.
The Discrete Fourier Transform (DFT) of One Variable

Obtaining the DFT from the Continuous Transform of a
Sampled Function
The Fourier transform of a sampled, band-limited in [-, ]
function is continuous, periodic in [-, ]. In practice, we work

Course 4
with a finite number of samples, and the objective is to derive the

DFT corresponding to such sample sets.
F ( )
f ( t )e
i 2 t
dt
f ( t ) ( t n T ) e
i 2 t
dt
f n e i 2 n T
(1)
What is the discrete version of F ( ) ? All we need to characterize
F ( ) is one period, and sampling one period is the basis of DFT.

Suppose that we want to obtain M equally spaced samples of F ( )
1
taken over the period 0,
. Consider:

Course 4
m
M T
m 0,1,..., M 1
substitute it in (1):
Fm
M 1
fe
n 0
2 m n
M
m 0,1, 2,..., M 1
(2)
This expression is the discrete Fourier transform.

Given a set { fn } of M samples of f(t), equation (2) yields a sample
set { Fm } of M complex discrete values corresponding to the
discrete Fourier transform of the input sample.

Course 4
Conversely, given { Fm } , we can recover the sample set { fn } by

using the inverse discrete Fourier transform (IDFT):
1
fn
M
M 1
m 0
2 m n
M
n 0,1, 2,..., M 1

Course 4

Course 4
Extension to Functions of Two Variables

The 2-D Impulse and Its Sifting Property
Continuous case
(t , z )
0
if t z 0
otherwise
( t , z ) dt dz 1
Sifting property
f ( t , z ) ( t , z ) dt dz f (0,0)
f ( t , z ) ( t t0 , z z0 ) dt dz f ( t0 , z0 )

Course 4
Discrete case
1
( x, y)
0
if x y 0
otherwise
f ( x , y ) ( x , y ) f (0,0)
x y
x y
f ( x , y ) ( x x0 , y y0 ) f ( x0 , y0 )

Course 4
The 2-D Continuous Fourier Transform Pair

F ( , )
f (t , z )
f ( t , z )e i 2 ( t z ) dt dz
F ( , )e i 2 ( t z ) d d
Two Dimensional Sampling and the 2-D Sampling Theorem

2-D impulse train
s T Z ( t , z )
(t mT , z nZ )
m n
f(t,z) is band-limits if its Fourier transform is 0 outside the

rectangle defined by the intervals max , max and max , max :

Course 4
F ( , ) 0 for max and max

The two-dimensional sampling theorem states that a continuous,
band-limited function f(t,z) can be recovered with no error from a
set of its sample if the sampling intervals are:
T
Z
1
2 max
1
2 max
1
2 max
T
1
2 max
Z

Course 4
The 2-D Discrete Fourier Transform and Its Inverse

F ( u, v )
M 1 N 1
f ( x , y )e
ux v y
i 2
M N
x 0 y 0
f(x,y) is a digital image of size M N.

Given the transform F(u,v) we can obtain f(x,y) by using the
inverse discrete Fourier transform (IDFT):
1
f ( x, y)
MN
M 1 N 1
F (u, v )e
u 0 v 0
ux v y
i 2
M N
, x 0,1,..., M 1 , y 0,1,..., N 1

Course 4
Some Properties of the 2-D Discrete Fourier Transform

Relationships Between Spatial and Frequency Intervals
A digital image f(x,y) consists of M N samples taken at T and
Z distances. The separation between the corresponding discrete,

frequency domain variables are given by:
1
u
M T
1
v
N Z

Course 4
Translation and Rotation

f ( x , y )e
u x v y
i 2 0 0
M N
f ( x x 0 , y y0 )
F ( u u0 , v v0 )
F ( u, v )e
u x v y
i 2 0 0
M N
Using polar coordinates
x r cos
y r sin ,
u cos
v sin
we get the rotating f(x,y) by an angle 0, the same happens with the
Fourier transform, F:
f ( r , 0 )
F ( , 0 )

Course 4
Periodicity
F ( u, v ) F ( u k1 M , v ) F ( u, v k2 N ) F ( u k1 M , v k2 N )
f ( x , y ) f ( x k1 M , y ) f ( x , y k2 N ) f ( x k1 M , y k2 N ) ,
k1 , k2 integers
f ( x , y )( 1)
x y
M
N
F (u
,v )
2
2
This last relation shifts the data so that F(0,0) is at the center of the
frequency rectangle defined by the intervals [0,M-1] and [0,N-1].

Course 4
Symmetry Properties
Odd and even part of a function:
w ( x , y ) we ( x , y ) wo ( x , y )
w( x , y ) w( x , y )
we ( x , y )
2
w( x, y ) w( x, y )
wo ( x , y )
2
we ( x , y ) we ( x , y ) symmetric
wo ( x , y ) wo ( x , y ) antisymmetric

Course 4
For digital images, evenness and oddness become:

we ( x , y ) we ( M x , N y )
wo ( x , y ) wo ( M x , N y )
M 1 N 1
w ( x , y )w ( x , y ) 0
x 0 y 0

Course 4

Course 4
Fourier Spectrum and Phase Angle

Express the Fourier transform in polar coordinates:
F ( u , v ) F ( u , v ) e i ( u ,v ) ,
F ( u, v ) R 2 ( u, v ) I 2 ( u, v ) is called Fourier or frequency spectrum
I ( u, v )
( u, v ) arctan
is the phase angle
R( u , v )
P ( u, v ) F ( u, v ) R 2 ( u, v ) I 2 ( u, v ) - the power spectrum
2
F ( u, v ) F ( u, v )
( u, v ) ( u, v )

Course 4
F (0,0)
M 1 N 1
f ( x, y)
x 0 y 0
1
F (0,0) MN f , f
MN
M 1 N 1
f ( x, y)
the average value of the image f
x0 y0
F (0,0) MN f
Because MN usually is large, |F(0,0)| is the largest component of

the spectrum by a factor that can be several orders of magnitude
larger than other terms.

Course 4
F(0,0) sometimes is called the dc component of the transform.

(dc=direct current current of zero frequency)
The 2-D Convolution Theorem
2-D circular convolution:
f ( x , y )h( x , y )
M 1 N 1
f (m , n)h( x m , y n), x 0,1,..., M 1, y 0,1,..., N 1
m 0 n 0
The 2-D convolution theorem

f ( x , y )h( x , y )
f ( x , y )h( x , y )
F ( u, v ) H ( u , v )
F ( u, v )H ( u, v )

Course 4

Course 4

Course 4

Course 4

Course 5

Course 5
Filtering in the Frequency Domain

Let f(x,y) be a digital image and F(u,v) its (discrete) Fourier
transform. Usually it is not possible to make direct
associations between specific components of an image and its
transform. We know that F(0,0) is proportional to the average
intensity of the image. Low frequencies correspond to the
slowly varying intensity components of an image, the higher
frequencies correspond to faster intensity change in an image
(edges, for ex.).

Course 5
The 2-D Discrete Fourier Transform and Its Inverse
F ( u, v )
M 1 N 1
f ( x , y )e
ux v y
i 2
M N
x 0 y0
f(x,y) is a digital image of size M N.

Given the transform F(u,v) we can obtain f(x,y) by using the
inverse discrete Fourier transform (IDFT):
1
f ( x, y)
MN
M 1 N 1
F (u, v )e
ux v y
i 2
M
N
, x 0,1,..., M 1,
u 0 v 0
y 0,1,..., N 1

Course 5
F (0,0) MN f
1
f
MN
M 1 N 1
f ( x, y)
x 0 y0
f the average value of the image f

f ( x x 0 , y y0 )
f ( r , 0 )
F ( u, v )e
x y y v
i 2 0 0
N
M
F ( , 0 )
The spectrum is insensitive to image translation, and it rotates

by the same angle as the image rotates.

Course 5
image
centered Fourier spectrum
Fourier spectrum
log transformed centered Fourier spectrum

Course 5
translated image
45 rotated image
Fourier spectrum
Fourier spectrum

Course 5
The magnitude of the 2-D DFT is an array whose components

determine the intensities in the image, the corresponding
phase is an array of angles that carry the information about
where discernible objects are located in the image.

Course 5
Woman
Rectangle
phase angle
reconstruction only with phase angle
rectangle spectrum+phase angle woma rectangle phase angle + spectrum woman

Course 5
The 2-D Convolution Theorem

2-D circular convolution:
f ( x , y )h( x , y )
M 1 N 1
f (m , n)h( x m , y n) ,
m 0 n 0
x 0,1,..., M 1 , y 0,1,..., N 1
The 2-D convolution theorem

f ( x , y )h( x , y )
F ( u, v ) H ( u , v )
f ( x , y )h( x , y )
F ( u, v )H ( u, v )

Course 5

Course 5
If we use the DFT and the convolution theorem to obtain the

same result in the left column of Figure 4.28, we must take
into account the periodicity inherent in the expression for the
DFT. The problem which appears in Figure 4.28 is commonly
referred to as wraparound error. The solution to this problem
is simple. Consider two functions f and h composed of A and
B samples. It can be shown that if we append zeros to both
functions so that they have the same length, denoted by P,

then wraparound is avoided by choosing:

Course 5
P A B 1
This process is called zero padding.
Let f(x,y) and h(x,y) be two image arrays of size AB and
CD pixels, respectively. Wraparound error in their circular
convolution can be avoided by padding these functions with

zeros:
f ( x, y)
f p ( x, y)
0
0 x A 1 and 0 y B 1
A x P and B y Q

Course 5
h( x , y )
hp ( x , y )
0
0 x C 1 and 0 y D 1
C x P and D y Q
P A C 1 ( P 2 M 1)
Q B D 1 (Q 2 N 1)

Course 5
Frequency Domain Filtering Fundamentals
Given a digital image f(x,y) of size MN, the basic filtering

equation has the form:
g ( x , y ) F 1 H ( u, v ) F ( u, v )
(1)
Where F 1 is the inverse discrete Fourier transform (IDFT),

F(u,v) is the discrete Fourier transform (DFT) of the input
image, H(u,v) is a filter function (also called filter or the filter

transfer function) and g(x,y) is the filtered (output) image.
F, H, and g are arrays of the same size as f, MN.

Course 5
H(u,v) symmetric about its center simplifies the
computations and also requires that F(u,v) to be centered.

In order to obtain a centered F(u,v) the image f(x,y) is
multiplied by (-1)x+y before computing its transform.
0
H ( u, v )
1
u M / 2, v N / 2( u v 0)
elsewhere
This filter rejects the dc term (responsible for the average

intensity of an image) and passes all other terms of F(u,v).

Course 5
This filter will reduce the average intensity of the output

image to zero.
Low frequencies in the transform are related to slowly
varying intensity components in an image (such as walls in a
room, or a cloudless sky) and high frequencies are caused by
sharp transitions in intensity, such as edges or noise. A filter
H(u,v) that attenuates high frequencies while passing low

frequencies (i.e. a lowpass filter) would blur an image while a
filter with the opposite property (highpass filter) would

Course 5
enhance sharp detail, but cause a reduction of contrast in the

image.
Image of damaged integrated circuit
Fourier spectrum
F(0,0)=0

Course 5

Course 5
The DFT is a complex array of the form:
F ( u , v ) R( u, v ) i I ( u , v )
g( x , y ) F 1 H ( u, v ) R( u, v ) i H ( u, v ) I ( u, v )
The phase angle is not altered by filtering in this way. Filters
that affect the real and the imaginary parts equally, and thus
have no effect on the phase are called zero-phase-shift filters.
Even small changes in the phase angle can have undesirable
effects on the filtered output.

Course 5

Course 5
Main Steps for Filtering in the Frequency Domain

1. Given an input image f(x,y) of size MN, obtain the
padding parameters P and Q (usually P=2M , Q=2N)
2. Form a padded image fp(x,y), of size PQ by
appending the necessary numbers of zeros to f(x,y) (f is
in the upper left corner of fp)
3. f p ( x , y ) ( 1) x y f p ( x , y ) - to center its transform
4. Compute the DFT, F(u,v), of the image obtain from 3.

Course 5
5. Generate a real, symmetric filter function H(u,v) of
P Q
size PQ with center at coordinates , . Compute
2 2
the array product G ( u, v ) H ( u, v )F ( u, v )
6. Obtain the processed image:
g p ( x , y ) real F 1 G ( u, v ) ( 1) x y
The real part is selected in order to ignore parasitic
complex components resulting from computational
inaccuracies.

Course 5
7. Obtain the output, filtered image, g(x,y) by extracting

the MN region from the top, left corner of gp(x,y).

Course 5
Correspondence between Filtering in the Spatial and

Frequency Domains
The link between filtering in the spatial domain and
frequency domain is the convolution theorem.
Given a filter H(u,v), suppose that we want to find its
equivalent representation in the spatial domain.
f ( x , y ) ( x , y ) F ( u, v ) 1
g ( x , y ) F 1 H ( u, v )F ( u, v ) h( x , y ) F 1 H ( u, v )

Course 5
The inverse transform of the frequency domain filter, h(x,y) is

the corresponding filter in the spatial domain.
Conversely, given a spatial filter, h(x,y) we obtain its
frequency domain representation by taking the Fourier
transform of the spatial filter:
h( x , y ) H ( u, v )
h(x,y) is sometimes called as the (finite) impulse response
(FIR) of H(u,v).

Course 5
One way to take advantage of the properties of both domains

is to specify a filter in the frequency domain, compute its
IDFT, and then use the resulting, full-size spatial filter as a
guide for constructing smaller spatial filter masks.
Let H(u) denote the 1-D frequency domain Gaussian filter:
H ( u) Ae
u2
2 2
, the standard deviation
The corresponding filter in the spatial domain is obtained by

taking the inverse Fourier transform of H(u):

Course 5
h( x ) 2 Ae
2 2 2 x 2
which is also a Gaussian filter. When H(u) has a broad profile

(large value of ), h(x) has a narrow profile and vice versa.
As approaches infinity, H(u) tends toward a constant
function and h(x) tends towards an impulse, which implies no
filtering in the frequency and spatial domains.

Course 5
Image Smoothing Using Frequency Domain Filters

Smoothing (blurring) is achieved in the frequency domain by
high-frequency attenuation that is by lowpass filtering. We
consider three types of lowpass filters:
ideal ,
Butterworth ,
Gaussian
The Butterworth filter has a parameter called the filter order.

For high order values, the Butterworth filter approaches the
ideal filter and for low values is more like a Gaussian filter.

Course 5
All filters and images in these sections are consider padded

with zeros, thus they have size PQ. The Butterworth filter
may be viewed as providing a transition between the other
two filters.
Ideal Lowpass Filters (ILPF)

1
H ( u, v )
0
if D( u, v ) D0
if D( u, v ) D0
Where D0 0 is a positive constant and D(u,v) is the distance

between (u,v) and the center of the frequency rectangle:

Course 5
2
P
Q
D ( u, v ) u v
2
2
(DUV)
The name ideal indicates that all frequencies on or inside the

circle of radius D0 are passed without attenuation, whereas all
frequencies outside the circle are completely eliminated
(filtered out).
For an ILPF cross section, the point of transition between
H(u,v)=1 and H(u,v)=0 is called the cutoff frequency. The

sharp cutoff frequencies of an ILPF cannot be realized with

Course 5
electronic components, but they can be simulated in a

computer.
We can compare the ILPF by studying their behavior with
respect to the cutoff frequencies.

Course 5

Course 5

Course 5
Butterworth Lowpass Filter (BLPF)

The transfer function of a Butterworth lowpass filter of order
n and with cutoff frequency at distance D0 from the origin is:

H ( u, v )
1
D ( u, v )
1
D0
2n
where D(u,v) is given by the relation (DUV).

Course 5
The BLPF transfer function does not have a sharp discontinuity

that gives a clear cutoff
between passed and filtered
frequencies. For filters with smooth transfer functions, defining

a cutoff frequency locus is made at points for which H(u,v) is
down to a certain fraction of its maximum value.

Course 5

Course 5
Gaussian Lowpass Filter (GLPF)

H ( u, v ) e
D 2 ( u ,v )
2
D 2 ( u ,v )
2 D02
D0 is the cutoff frequency. When D(u,v) = D0 the GLPF is
down to 0.607 of its maximum value.

Course 5

Course 5
Image Sharpening Using Frequency Domain Filters
Edges and other abrupt changes in intensities are associated

with high-frequency components, image sharpening can be
achieved in the frequency domain by highpass filters, which

Course 5
attenuates the low-frequency components without changing

the high-frequency information in the Fourier transform.
A highpass filter is obtained from a given lowpass filter
using the equation:
H HP ( u, v ) 1 H LP ( u, v )
where HLP(u,v) is the transfer function of a lowpass filter.

Course 5

Course 5
Ideal Highpass Filter
A 2-D ideal highpass filter (IHPF) is defined as:

0
H ( u, v )
1
if D( u, v ) D0
if D( u, v ) D0
where D0 is the cutoff frequency and D(u,v) is given by

equation (DUV). As for ILPF, the IHPF is not physically
realizable.

Course 5
Butterworth Highpass Filter (BHPF)
The transfer function of a Butterworth highpass filter of order

n and with cutoff frequency at distance D0 from the origin is:

Course 5
H ( u, v )
1
D0
1
(
,
)
D
u
v
2n

Course 5
Gaussian Highpass Filter (GLPF)

H ( u, v ) 1 e
D 2 ( u ,v )
2 D02

Course 5

Course 5
Figure 4.57(a) is a 1026962 image of a thumb print in which

smudges are present. A keystep in automated figerprint
recognition is enhancement of print ridges and the reduction
of smudges. In this example a highpass filter was used to
enhance ridges and reduce the effects of smudging.
Enhancement of the ridges is accomplished by the fact that
they contain high frequencies, which are unchanged by a
highpass filter. This filter reduces low frequency components

Course 5
which correspond to slowly varying intensities in the image,

such as background and smudges.
Figure 4.57(b) is the result of using a BHPF of order n=4,
with a cutoff frequency D0=50.
Figure 4.57(c) is the result of setting to black all negative
values and to white all positive values in Figure 4.57(b) (a
threshold intensity transformation)

Course 5
The Laplacian in the Frequency Domain
The Laplacian can be implemented in the frequency domain

using the filter:
H ( u, v ) 4 2 u 2 v 2
The centered Laplacian is:

2
2
P
Q
2
H ( u, v ) 4 u v 4 2 D 2 ( u, v )
2
2
The Laplacian image is obtained as:
2 f ( x , y ) F 1 H ( u, v )F ( u, v )

Course 5
Enhancement is obtained with the equation:
g( x , y ) f ( x , y ) 2 f ( x , y )
(1)
Computing 2 f ( x , y ) with the above relation introduces

DFT scaling factors that can be several orders of magnitude
larger than the maximum value of f. To fix this problem, we
normalize the values of f(x,y) to the range [0,1] (before
computing its DFT) and divide 2 f ( x , y ) by its maximum
value which will bring it to [-1,1].

Course 5
g ( x , y ) F 1 F ( u, v ) H ( u, v )F ( u, v )
F
1 4
D ( u, v ) F ( u, v )
2
(2)
The above formula is simple but has the same scaling

problems as those mentioned above. Between (1) and (2), the
former is preferred.

Course 6

Course 6
Unsharp Masking, Highboost Filtering and

High-Frequency-Emphasis Filtering
gmask ( x , y ) f ( x , y ) f LP ( x , y )
f LP ( x , y ) F 1 H LP ( u, v )F ( u, v )
HLP(u,v) is a lowpass filter. Here fLP(x,y) is a smoothed image
analogous to f ( x , y ) from the spatial domain.
g ( x , y ) f ( x , y ) k gmask ( x , y )
k=1 unsharp masking, k>1 highboost filtering
g ( x , y ) F 1 1 k H HP ( u, v ) F ( u, v )

Course 6
The factor 1 k H HP ( u, v ) is called high-frequency-emphasis

filter. Highpass filter set the dc term to zero, thus reducing the
average intensity in the filtered image to 0. The high-frequencyemphasis filter does not have this problem. The constant k gives
control over the proportion of high frequencies that influence the
final result. A more general high-frequency-emphasis filter:
g ( x , y ) F 1 k1 k2 H HP ( u, v ) F ( u, v )
k1 0 controls the offset from the origin, k2 0 controls the

contribution of high frequencies.

Course 6
Homomorphic Filtering
An image can be expressed as the product of its ilumination
i(x,y) and reflectance r(x,y):
f ( x , y ) i ( x , y )r ( x , y )
Because F f ( x , y ) F i ( x , y ) F r ( x , y ) , consider:
z ( x , y ) ln f ( x , y ) ln i ( x , y ) ln r ( x , y )
Taking the Fourier transform of this relation we have:
Z ( u, v ) Fi ( u, v ) Fr ( u, v )

Course 6
where Z, Fi , Fr are the Fourier transform of z(x,y), ln i(x,y),
ln r(x,y), respectively.
We can filter Z(u,v) using a filter H(u,v) so that
S ( u, v ) H ( u, v ) Z ( u, v ) H ( u, v )Fi ( u, v ) H ( u, v )Fr ( u, v )
The filtered image in the spatial domain is:
s( x , y ) F 1 S ( u, v ) F 1 H ( u, v )Fi ( u, v ) F 1 H ( u, v )Fr ( u, v )
Define:
i ( x , y ) F 1 H ( u, v )Fi ( u, v )
r ( x , y ) F 1 H ( u, v )Fr ( u, v )

Course 6
Because z(x,y)=ln f(x,y), we reverse the process to produce

the output (filtered) image:
s ( x , y ) i ( x , y ) r ( x , y )
g ( x , y ) e s ( x , y ) e i ( x , y )e r ( x , y ) i0 ( x , y )r0 ( x , y )
i0 ( x , y ) e
i( x , y )
illumination of the output image,
r0 ( x , y ) e r ( x , y ) reflectance of the output image

Course 6
The illumination component of an image generally is

characterized by slow spatial variations, while the reflectance
component tends to vary abruptly, particularly at the junction
of dissimilar objects. These characteristics lead to associating
the low frequencies of the Fourier transform of the logarithm
of an image with illumination and the high frequencies with
reflectance.

Course 6
Selective Filtering
There are applications in which it is of iterest to process
specific bands of frequencies (bandreject or bandpass filters)
or small regions of the frequency rectangle (notch filters)
Bandreject and Bandpass Filters

Ideal bandreject filter
0
H ( u, v )
1
W
W
if D0
D( u, v ) D0
2
2
otherwise

Course 6
Butterworth Bandreject Filter
H ( u, v )
1
W D ( u, v )
1 2
2
D
u
v
D
(
,
)
2n
Gaussian Bandreject Filter
H ( u, v ) 1 e
D 2 ( u ,v ) D02

W D ( u ,v )
In the above bandreject filters (ideal, Butterworth and

Gaussian) D(u,v) is the distance from the center of the

Course 6
rectangle given by (DUV), D0 is the radial center of the band,

and W is the width of the band.
A bandpass filter is obtained from a bandreject filter using
the formula:
H BP ( u, v ) 1 H BR ( u, v )

Course 6
Notch Filters
A notch filter rejects (or passes) frequencies in a predefined
neighborhood about the center of the frequency rectangle.
Zero-phase-shift filters must be symmetric about the origin,
so a notch filter with center at (u0,v0) must have a
corresponding notch at location (-u0,-v0).
Notch reject filters are constructed as products of highpass
filters whose centers have been translated to the center of the
notches. The general form is:

Course 6
Q
H NR ( u, v ) H k ( u, v ) H k ( u, v )
k 1
Where Hk(u,v) and H-k(u,v) are highpass filters whose centers

are at (uk,vk) and (-uk,-vk), respectively. These centers are
specified with respect to the center of the frequency rectangle
M N
2 , 2 . The distance computations for each filter are made
using the expressions:

Course 6
2
M
N
Dk ( u, v ) u
uk v vk
2
2
M
N
uk v vk
D k ( u, v ) u
2
2
A Butterworth notchreject filter of order n with 3 notch pairs:

3
H NR ( u, v ) {
k 1
1
D0 k
1
(
,
)
D
u
v
k
2n
}{
1
D0 k
1
(
,
)
D
u
v
k
2n

Course 6
A notch pass filter is obtained from a notch reject filter using

the expression:
H NP ( u, v ) 1 H NR ( u, v )
One of the applications of notch filtering is for selectively
modifying local regions of the DFT. This type of processing
is done interactively, working directly on DFTs obtained
without padding.

Course 6

Course 6
Figure 4.65(a) shows an image of part of the rings

surrounding Saturn. The vertical sinusoidal pattern was
caused by an AC signal superimposed on the video camera
signal just prior to digitizing the image. Figure 4.65(b) shows
the DFT spectrum. The white vertical lines which appears in
the DFT corresponds to the nearly sinusoidal interference.
The problem was solved by using a narrow notch rectangle
filter shown in Figure 4.65(c).

Course 6
Image Restoration and Reconstruction

Restoration attempts to recover an image that has been
degraded supposing we have some knowledge of the
degradation phenomenon. Restoration techniques are oriented
toward modeling the degradation and applying the inverse
process in order to recover the original image. This involves
formulating a criterion of goodness
that will produce an
optimal estimate of the desired result. Enhancement

techniques basically are heuristic procedures designed to

Course 6
manipulate the image in order to satisfy some demands

required by the human vision system. Contrast stretching is
considered an enhancement technique (it is done to please in
some sense the viewer), removal of image blur by applying a
debluring function is considered a restoration technique.
A Model of the Image Degradation/Restoration Process

We consider the case when the degraded image, g(x,y) is
obtained from the original, f(x,y), by applying a degradation
function together with an additive noise term.

Course 6
g ( x , y ) H [ f ( x , y )] ( x , y )
Given g(x,y), some knowledge about the degradation function
H, and some knowledge about the additive noise term, (x,y),

the objective of restoration is to obtain an estimate f ( x , y ) of

Course 6
the original image. The more we know about H and the

closer f ( x , y ) will be to f(x,y).
Noise Models
g( x , y ) f ( x , y ) ( x , y ) ( H I )
The main sources of noise in digital images arise during
image
acquisition
and/or
transmission
(environmental
conditions during image acquisition, the quality of the

sensors).

Course 6
Parameters that define the spatial characteristics of the

noise and whether the noise is correlated with the image are
important properties to be studied. We assume that the noise
is independent of spatial coordinates and that it is
uncorrelated with the image itself (i.e. there is no correlation
between pixel values and the values of noise components).

Course 6
Some Important Noise Probability Density Functions

The
noise
may
be
considered
random
variable,
characterized by a probability denisty function (pdf).
Gaussian noise
p( z )
( z z )2
2 2
where z represents intensity, z is the mean value, and is its

standard deviation, 2 is called variance of z.

Course 6
Rayleigh noise
( z a )
2
( z a )e b
p( z ) b
0
The mean and variance for this pdf are:

z a
b
4
b(4 )

4
2
for z a
for z a

Course 6
Erlang (gamma) noise

a b z b 1 az
e
p( z ) (b 1)!
0
for z 0
for z 0
b
z
a
b
2
a
2
a , b 0, b

Course 6
Exponential noise
ae az
p( z )
0
for z 0
,
for z 0
1
z
a
1
2
a
2
(Erlang b=1)
a0

Course 6
Uniform noise
p( z ) b a
0
for a z b
otherwise
ab
z
2
2
(
b
a
)
2
12

Course 6
Impulse (salt-and-pepper) noise
The pdf of (bipolar) impulse noise is given by

Pa
p( z ) Pb
0
for z a
for z b
otherwise
b > a intensity b appear as a light dot in the image

b < a intensity a appear as a dark dot in the image
Pa = 0 or Pb = 0 the impule noise is called unipolar

Course 6
Pa Pb - impulse noise values will ressemble salt and pepper
granules randomly distributed over the image. For this reason,

bipolar impulse noise is called also salt-and-pepper noise.
Noise impulses can be negative or positive. Because impulse
corruption usually is large compared with the strength of the
image signal, impulse noise generally is digitized as extreme
(pure black or white) values in an image. Thus, the
assumption is that a and b are equal to the minimum and
maximum allowed values in the digitized image. As a result,

Course 6
negative impulses appear as black (pepper) points in an

image, and positive impulses appear as white (salt) points.

Course 6
Periodic noise
Periodic noise arises from electrical or electromechanical

interference during image acquisition. This type of noise is
spatially dependent and can be reduced significantly via
frequency domain filtering.

Course 6
Figure 5.5(a) is corrupted by sinusoidal noise of various

frequencies. The Fourier transform of a pure sinusoid is a pair
of conjugate impulses located at the conjugate frequencies of
the sine wave. If the amplitude of a sine wave in the spatial
domain is strong enough, we would expect to see in the
spectrum of the image a pair of impulses for each sine wave
in the image. In Figure 5.5(b) we can see the impulses
appearing in a circle.

Course 6
Estimation of Noise Parameters
The parameters of periodic noise are estimated by inspection

of the Fourier spectrum of the image. Sometimes it is possible
to deduce the periodicity of noise just by looking at the
image.
The parameters of noise pdfs may be known partially from
sensors specifications. If the image system is available, one
simple way to study the characteristics of system noise is to
capture a set of images of flat environments (in the case of

Course 6
an optical sensor, this means taking images of a solid gray

board that is illuminated uniformly). The resulting images are
good indicators of system noise.
When only images already generated by a sensor ar available,
frequently it is possible to estimate the parameters of the pdf
from small portions of the image that are of constant
background intensity.
The simplest use of the data from the image strips is for
calculating the mean and the variance of intensity levels.

Course 6
Consider a subimage S and let pS(zi), i=0,1,2,...,L-1 denote the

probability estimates (normalized histogram values) of the
intensities of the pixels in S, where L is the number of
possible intensities in the entire image. We estimate the mean
and the variance of the pixels in S:
L 1
z z i pS ( z i )
i 0
L 1
2 ( z i z ) 2 pS ( z i )
i 0

Course 6
The shape of the histogram identifies the closest pdf match. If

the shape is almost Gaussian then the mean and the variance
are all we need. For the other shapes, we use the mean and the
variance to solve for parameters a and b.
Impulse noise is handled differently because the estimate
needed is of the actual probability of occurrence of white and
black pixels. Obtaining this estimate requires that both black
and white pixels be visible, so a midgray, relatively constant
area is needed in the image in order to be able to compute a

Course 6
histogram. The heights of the peaks corresponding to black

and white pixels are the estimates of Pa and Pb.

Course 6
Restoration in the Presence of Noise Only Spatial Filetring
g( x , y ) f ( x , y ) ( x , y )
(1)
G ( u , v ) F ( u, v ) N ( u , v )
(2)
The noise terms are unknown. In the case of periodic noise,

usually it is possible to estimate N(u,v) from the spectrum of
G(u,v). In this case, an estimate of the original image is given
by:
f ( x , y ) F 1 G ( u, v ) N e ( u, v )

Course 6
Spatial filtering is the method of choice in situations when only

additive random noise is present.
Mean Filters
Suppose Sxy represent a recatngular neighborhood of m n

size centered at point (x,y).
Arithmetic mean filter
f ( x , y )
1
g( s, t )
m n ( s ,t )S xy

Course 6
A mean filter smooths local variations of an image and noise

is reduced as a result of blurring.
Geometric mean filter
f ( x , y ) g ( s , t )
( s ,t )S xy
1
mn
A geometric mean filter achieves smoothing comparable to

the arithmetic mean filter, but it tends to lose less image
detail.

Course 6
Harmonic mean filter

mn
g( s, t )
f ( x , y )
( s , t )S xy
Harmonic mean filter works well for salt noise, but fails for
pepper noise. It also works well on Gaussian noise.
Contraharmonic mean filter
f ( x , y )
g ( s , t )Q 1
( s , t )S xy
( s , t )S xy
g( s, t )
, Q the order of the filter

Course 6
This filter is good for reducing or virtually eliminating the

effects of salt-and-pepper noise.
For Q > 0 the filter eliminates pepper noise, for Q < 0 the filter
eliminates salt noise, but it cannot do both simultaneously.
Q = 0 arithmetic mean filter, Q = -1 harmonic mean filter

Course 6

Course 6

Course 6
Order-Statistic Filters
Median filter
f ( x , y ) median{ g ( s , t );( s , t ) S xy }
Median filters have excellent noise-reduction capabilities,

with less blurring than linear smoothing filters. Median filters

Course 6
are particularly effective in the presence of bipolar and

unipolar impulse noise.
Max and min filters
f ( x , y ) max{ g ( s , t );( s , t ) S xy }
This filter is useful for finding the brightest points in an

image. This filter reduces pepper noise.
f ( x , y ) min{ g ( s , t );( s , t ) S xy }
This filter is useful for finding the darkest points in an image.

This filter reduces salt noise.

Course 6
Midpoint filter
f ( x , y ) 1 max{ g ( s , t );( s , t ) S } min{ g ( s , t );( s , t ) S }
xy
xy
2
It works best for randomly distributed noise, like Gaussian or

uniform noise.

Course 6
Linear, Position-Invariant Degradations

g ( x , y ) H f ( x , y ) ( x , y )
Assume that H is linear:

H a f1 ( x , y ) b f 2 ( x , y ) a H f1 ( x , y ) b H f 2 ( x , y ) ,
a , b , f1 , f 2 images
The operator H[f(x,y)] = g(x,y) is said to be position (or

space) invariant if:
H f ( x , y ) g ( x , y )
, , , f

Course 6
This definition indicates that the response at any point in the

image depends only on the value of the input at that point, not
on its position.
Let (,) be the impulse function, the impulse response of
H is:
h( x , , y , ) H ( x , y )
The function h(x,,y,) is also called the point spread

function.

Course 6
We have the following relations:

g ( x , y ) h( x , y ) f ( x , y ) ( x , y )
or in the frequency domain:

G ( u , v ) H ( u, v ) F ( u, v ) N ( u, v )
A linear, spatially-invariant degradation system with additive

noise can be modeled in the spatial domain as the convolution
of the degradation (point spread) function with an image,
followed by the addition of noise. In the frequency domain
the transformation is given as the product of the transforms

Course 6
of the image and degradation, followed by the addition of the

trasform of the noise.
Because degradations are modeled as being the result of
convolution, and restoration is the reverse process, the term
image deconvolution is used for linear image restoration, and
the filters are referred as deconvolution filters.

Course 6
Estimating the Degradation Function
There are 3 ways to estimate the degradation function in

image restoration:
1.
observation
2.
experimentation
3.
mathematical modelling
Estimation by Image Observation
Suppose that we have a degraded image without any

knowledge about the degradation function. Assuming that the

Course 6
image was degraded by a linear, position-invariant process,

one way to estimate H is to gather information from the
image itself.
If the image is blurred, we can study a small rectangular
section of the image containing sample structures (part of an
object and the background). In order to reduce the effects of
noise, we would look for an area in which the signal content
is strong (e.g. an area of high contrast). The next step is to

Course 6
process the subimage in order to unblur it as much as it is

possible (by using a sharpening filter, by example).
Let gs(x,y) denote the observed subimage, and fs ( x , y ) be the
processed subimage. Assuming that the effect of noise is
negligible (because of the choice of a strong-signal area) it
follows that:
G s ( u, v )
H s ( u, v )
.
F ( u, v )
s

Course 6
Based on the assumption of position invariance, we can

deduce from the above function the characteristics of the
complete degradation function H.
Estimation by Experimentation
Suppose is available equipment similar to the the equipment

used to acquire the degraded image. Images similar to the
degraded image can be acquired with various system settings
until they are degraded as closely as possible to the image we
want to restore. The idea is to obtain the impulse response of

Course 6
the degradation by imaging an impulse (a smoll dot of light)

using the same system settings. We know that a linear,
space-invariant system is characterized completely by its
impulse response.
An impulse is simulated by a bright dot of light, as bright as
possible to reduce the effect of noise almost to zero. Using
the relation:
G ( u , v ) H ( u, v ) F ( u, v ) N ( u, v )
where F(u,v)=A (the Fourier transform of the inpulse) we get:

Course 6
G ( u, v )
H ( u, v )
A
G(u,v) is the Fourier transform of the observed image and A
is a constant describing the strength of the impulse.

Course 6
Estimation by Modelling
A degradation model proposed by Hufnagel and Stanley is

based on the physical characteristics of atmospheric
turbulence:
H ( u, v ) e k ( u
5
2 6
v )
where k is a constant that depends on the nature of the

turbulence.

Course 6

Course 6
Another approach in modeling is to derive a mathematical

model from basic principles.
Suppose that an image has been blurred by uniform linear
motion between the image and the sensor during image
acquisition. Suppose that an image f(x,y) undergoes planar
motion and that x0(t) and y0(t) are the time varying
components of motion in the x- and y-directions, respectively.

Course 6
Assuming that shutter opening and closing takes place

instantaneously and that the optical imaging process is
perfect, we can simulate the effect of image motion.
If T is the duration of the exposure, we have:
T
g ( x , y ) f ( x x0 ( t ), y y0 ( t ))dt
0
g(x,y) is the blurred image. We can compute the Fourier
transform of g with respect to the Fourier transform of the

unblurred image f:

Course 6
T
G ( u, v ) F ( u, v ) e
i 2 ux0 ( t ) vy0 ( t )
H ( u, v ) e
i 2 ux0 ( t ) vy0 ( t )
dt
dt
G ( u , v ) H ( u, v ) F ( u , v )
If the motion variables x0(t) and y0(t)
are known, the
transfer function H(u,v) can be computed using the formula

above.
at
T
, y0 ( t ) 0 H ( u , v )
sin( ua )e i ua
x0 ( t )
ua
T

Course 6
at
bt
, y0 ( t )
x0 ( t )
T
T
T
sin ( ua vb ) e i ( ua vb )
H ( u, v )
( ua vb )

Course 6
Inverse Filtering
The simplest approach to restoration is direct inverse filtering,

where we compute an estimate F ( u, v ) of the transform of the
original image simply by dividing the transform of the
degraded image G(u,v) by the degradation function:
G ( u, v )
F ( u, v )
array operation
H ( u, v )
G ( u , v ) H ( u, v ) F ( u, v ) N ( u, v )

Course 6
N ( u, v )
F ( u, v ) F ( u, v )
H ( u, v )
Even if we know the degradation function we cannot recover

the undegraded image exactly because N(u,v) is not known.
Another problem appears when the degradation function has
N ( u, v )
dominates the
zero or very small values, the term
H ( u, v )
estimate F ( u, v ) .

Course 6
Minimum Mean Square Error (Wiener) Filtering
This approach treats both the degradation function and

statistical characteristics of the noise into the restoration
process. The method is founded on considering images and
noise as random variables, and the objective is to find an
estimate f of the uncorrupted image f such that the mean
square error between them is minimized. The error measure is
given by:
e 2 E ( f f )2
(1)

Course 6
It is assumed that:
- the noise and the image are uncorrelated;
- the noise or the image has zero mean;
- the intensity levels in the estimate are a linear function of
the levels in the degraded image
From relation (1) we get:

Course 6
H ( u, v )
1
G ( u, v ) (2)
F ( u, v )
S ( u, v )
H ( u, v )
2
H ( u, v )
S
(
u
,
v
)
f
H(u,v) degradation function

S(u,v)=|N(u,v)|2 power spectrum of the noise
Sf(u,v)=|F(u,v)|2 power spectrum of the undegraded image

Course 6
Relation (2) is known as Wiener filter. The part of (2) inside

the brackets is referred to as minimum mean square error
filter or the least square error filter.
A number of useful measures are based on the power

spectra of noise and of the undegraded image.
Signal-to-noise ratio
M 1 N 1
SNR
2
|
F
(
u
,
v
)
|
u 0 v 0
M 1 N 1
2
N
u
v
|
(
,
)
|
u 0 v 0

Course 6
This ratio gives a measure of the level of information bearing

signal power (i.e. of the original , undegraded image) to the
level of noise power. Images with low noise tend to have
high SNR, and conversely, high level of noise implies low
SNR. This ratio is an important metric used to characterize

the performance of restoration algorithms.
Mean square error (approximation of (1))
1
MSE
MN
M 1 N 1
f ( x , y ) f ( x , y )
x 0 y0

Course 6
If the restored image is considerred to be signal and the

difference between this image an the original is noise, we can
define a signal-to-noise ratio in the spatial domain as:
M 1 N 1
SNR
f ( x , y )2
x 0 y0
M 1 N 1
( x , y )
f
x
y
f
(
,
)

x 0 y0
The closer f and f are, the larger this ratio will be.

Course 6
When we are dealing with spectrally white noise

(|N(u,v)|2=const.) relation (2) simplifies. However, the power
spectrum of the undegraded image is rarely known.
approach used frequently in this case is:
2
1
H
(
u
,
v
)
F ( u, v )
G ( u, v )
2
H ( u, v ) H ( u, v ) K
An

Course 7

Course 7
Color Image Processing Color Image Processing

Color Image Processing
Color is very important characteristic of an image that in
most cases simplifies object identification and extraction
form a scene. Human eye can discern thousands of color
shades and intensities and only two dozen shades of gray.
Color image processing is divided in 2 major areas: full-color
(images acquired with a full-color sensor) and pseudocolor
(gray images for which color is assigned) processing.
2

Course 7
In 1666, Sir Isaac Newton discovered that when a beam of

sunlight passes through a glass prism, the emerging beam of
light is not white but consists instead of a continuous
spectrum of colors ranging from violet at one end to red at
the other. The color spectrum may be divided into 6 broad
regions: violet, blue, green, yellow, orange, and red.

Course 7
The colors that humans can perceive in an object are

determinde by the nature of the light reflected from the
object.Visible light is composed of a relatively narrow band
of frequencies in the electromagnetic spectrum (390nm
to750nm). A body that reflects light that is balanced in all
visible wavelengths appears white to the observer. A body
that favors reflectance in a limited range of the visible
spectrum exhibits some shades of color.

Course 7
For example, blue objects reflect light with wavelengths from

450 to 475 nm, while absorbing most of the energy of other
wavelengths.

Course 7
How to characterize light? If the light is achromatic (void of

color) its only attribute is its intensity (or amount)
determined by levels of gray (black-grays-white).
Chromatic light spans the electromagnetic spectrum from
approximately 400 to 720 nm. Three basic quantities are used
to describe the quality of a chromatic light source: radiance,
luminance, and brightness.
- Radiance is the total amount of energy that flows from the
light source (usually measured in watts).
6

Course 7
- Luminance (measured in lumens lm) gives a measure of

the amount of energy an observer percieves from a light
source. For example, the light emitted from a source
operating in the infrared region of the spectrum could have
significant energy (radiance), but an observer would hardly
perceive it (the luminance is almost zero).
- Brightness is a subjective descriptor, that cannot be
measured, it embodies the achromitic notion of intensity
and is a factor describing color sensation.
7

Course 7
Cones are the sensors in the eye responsible for color vision.
It has been established that the 6 to 7 million cones in the
human eye can be devided into three principal sensing
categories, corresponding roughly to red, green, and blue.
Approximately 65% of all cones are sensitive to red light,
33% are sensitive to green light, an only about 2% are
sensitive to blue (but the blue cones are the most sensitive).

Course 7
Due to these absorbtion characteristics of the human eye,

colors are seen as variable combinations of the so-called
primary colors : red (R), green (G), and blue (B).

Course 7
For the purpose of standardization, the CIE (Commission

Internationale de lEclairage) designated in 1931 the
following specific wavelength values to the three primary
colors: blue= 435.8 nm, green = 546.1 nm, and red=700 nm.
The CIE standards correspond only approximately with
experimental data.
These three standard primary colors, when mixed in various
intensity proportions, can produce all visible colors.
10

Course 7
The primary colors can be added to produce the secondary

colors of light magenta (red+blue), cyan (green+blue), and
yellow (red+green). Mixing the three primaries, or a
secondary with its opposite primary color in the right
intensities produces white light.
We must differentiate between the primary colors of light
and the primary colors of pigments. A primary color for
pigments is one that substracts or absorb a primary color of
light and reflects or transmits the other two. Therefore, the
11

Course 7
primary colors of pigments are magenta, cyan, and yellow,

and the secondary colors are red, green, and blue.
12

Course 7
The characteristics usually used to distinguish one color from

another are brightness, hue, and saturation. Brightness
embodies the achromatic notion of intensity. Hue is an
attribute associated with the dominant wavelength in a
mixture of light waves. Hue represents dominat color as
percieved by an observer (when we call an object to be red,
orange or yellow we refer to its hue).
Saturation refers to
the relative purity or the amount of white light mixed with a

hue. The pure spectrum colors are fully saturated. Color such
13

Course 7
as pink (red+white) and lavander (violet+white) are less

saturated, with the degree of saturation being inversely
proportional to the amount of white light added.
Hue and saturation taken together are called chromaticity,
and therefore a color may be characterized by its brightness
and chromaticity.
The amounts of red, green, and blue needed to form any
particular color are called the tristimulus values and are
14

Course 7
denoted X, Y and Z, respectively. A color is specified by its

trichromatic coefficients, defined as:
X
x
X Y Z
Y
y
X Y Z
Z
z
X Y Z
x y z 1
15

Course 7
For any wavelength of light in the visible spectrum, the

tristimulus values needed to produce the color coresponding
to that wavelength can be obtained from the existing curves
or tables.
Another approach for specifying colors is to use the CIE
chromaticity diagram, which shows color compositin as a
function of x (red) and y (green); z (blue) is obtained from
relation z = 1-x-y.
16

Course 7
17

Course 7
The positions of the various spectrum colors (from violet at

380 nm to red at 780 nm) are indicated around the boundary
of the tongue-shaped chromaticity diagram.
The chromaticity diagram is useful for color mixing because a
straight-line segment joining any two points in the diagram
defines all the different color variation that can be obtained
by combining these two colors. This procedure can be
extended to three colors: to triangle determined by the three
18

Course 7
color-points on the diagram embodies all the possible colors

that can be obtained by mixing the three colors.
19

Course 7
Color Models
A color model (color space or color system) is a
specification of a coordinate system and a subspace within
that system where each color is represented by a single point.
http://www.colorcube.com/articles/models/model.htm
Most color models in use today are oriented either toward
hardware (color monitors or printers) or toward applications
where color manipulation is a goal.
20

Course 7
The most commonly used hardware-oriented model is RGB

(red-green-blue) for color monitors, color video cameras.
The
CMY
(cyan-magenta-yellow)
and
CMYK
(cyan-magenta-yellow-black) models are in use for color

printing.
The HSI (hue-saturation-intensity) model corespond with
the way humans describe and interpret colors. The HSI model
has the advantage that it decoupes the color and gray-scale
21

Course 7
information in an image, making it suitable for using the

gray-scale image processing techniques.
The RGB Color Model

In the RGB model, each color appears decomposed in its
primary color components: red, green, blue. This model is
based on a Cartesian coordinate system. The color subspace
of interest is the unit cube (Figure 6.7), in which the primary
22

Course 7
and the seconadary colors are at the corners; black is at the

origin, and white is at the corner farthest from the origin.
The gray scale (point of equal RGB values) extends from

black to white along the line joining these two points. The
23

Course 7
different colors in this model are points on or inside the cube,

and are defined by vectors extending from the origin.
Images represented in the RGB color model consist of three

component images, one for each primary color. The number
of bits used to represent each pixel in RGB space is called the
24

Course 7
pixel depth. Consider an RGB image in which each of the red,

green, and blue images are an 8-bit image. In this case, each
RGB color pixel has a depth of 24 bits. The term full-color
image is used often to denote a 24-bit RGB color image. The
total number of colors in a 24-bit RGB image is
2
8
16.777.216
A convenient way to view these colors is to generate color

planes (faces or cross sections of the cube).
25

Course 7
A color image can be acquired by using three filters, sensitive

to red, green, and blue.
26

Course 7
Because of the variety of systems in use, it is of considerable

interest to have a subset of colors that are likely to be
reproduced faithfully, resonably independently of viewer
hardware capabilities. This subset of colors is called the set of
safe RGB colors, or the set of all-systems-safe colors. In
Internet applications, they are called safe Web colors or safe
browser colors.
We assume that 256 colors is the minimum number of colors
that can be reproduced faithfully by any system. Forty of
27

Course 7
these 256 colors are known to be processed differently by

varoius operating system. We have 216 colors that are
common to most systems, and are the safe colors, especially
in Internet applications. Each of the 216 safe colors has a
RGB representation with:
R, G , B 0,51,102,153, 204, 255

We have (6)3=216 possible color values. It is costumary to
express these values in the hexagonal number system.
28

Course 7
Each safe color is formed from three of the two digit hex
numbers from the above table. For example purest red if
FF0000. The values 000000 and FFFFFF represent black and
white respectively.
Figure 6.10(a) shows the 216 safe colors, organized in
descending RGB values. Figure 6.10(b) shows the hex codes
for all the possible gray colors in the 216 safe color system.
Figure 6.11 shows the RGB safe-color cube.
29

Course 7
http://www.techbomb.com/websafe/
30

Course 7
The CMY and CMYK Color Models

Cyan, magenta, and yellow are the secondary colors of light
but the primary color of pigments. For example, when a
surface coated with yellow pigment is illuminated with white
light, no blue light is reflected from the surface. Yellow
substracts blue light from reflected white light (which is
composed of equal amounts of red, green, and blue light).
Most devices that deposit color pigments on paper, such as
color printers and copiers, require CMY data input and
31

Course 7
perform RGB to CMY conversion. Assuming that the color

values were normalized to range [0,1], this conversion is:
C 1 R
M 1 G

Y 1 B
From this equation we can easily deduce, that pure cyan does
not reflect red, pure magenta does not reflect green, and pure
yellow does not reflect blue.
32

Course 7
Equal amount of pigments primary, cyan, magenta, and

yellow should produce black. In practice, combining these
colors for printing produces a muddy-looking black. In order
to produce true black (which is the predominant color in
printing), a fourth color, black, is added, giving rise to the
CMYK color model.
33

Course 7
34

Course 7
The HSI Color Model

The RGB, CMY, and other similar color models are not well
suited for describing colors in terms that are practical for
human interpretation.
We (humans) describe a color by its hue, saturation and
brightness. Hue is a color attribute that describes a pure color,
saturation gives a measure of the degree to which a pure color
is diluted by white light and brightness is a subjective
descriptor that embodies the achromatic notion of intensity.
35

Course 7
The HSI (hue, saturation, intensity) color model, decouples

the intensity component from the color information (hue and
saturation) in a color image.
What is the link between the RGB color model and HSI color
model? Consider again the RGB unit cube. The intensity axis
is the line joining the black and the white vertices. Consider a
color point in the RGB cube. Let P be a plane perpedicular to
the intensity axis and containing the color point. The
intersection of this plane with the intensity axis gives us the
36

Course 7
intensity of the color point. The saturation (purity) of the

considered color point increases as a function of distance
from the intensity axis (the saturation of the point on the
intensity axis is zero).
In order to determine how hue can be linked to a given RGB
point, consider a plane defined by black, white and cyan. The
intensity axis is also included in this plane. The intersection
of this plane with the RGB-cube is a triangle. All point
contained in this triangle would have the same hue (i.e. cyan).
37

Course 7
The HSI space is represented by a vertical intensity axis and

the locus of color points that lie on planes perpedicular to this
axis. As the planes move up and down the intensity axis, the
boundary defined by the intersection of this plane with the
faces of the cube have either triangular or hexagonal shape.
38

Course 7
39

Course 7
In the plane shown in Figure 6.13(a) primary colors are

separated by 120. The secondary colors are 60 from the
primaries. The hue of the point is determined by an angle
from some reference point. Usually (but not always) an angle
of 0 from the red axis designates 0 hue, and the hue increases
countercloclwise from there. The saturation (distance from
the vertical axis) is the length of the vector from the origin to
the point. The origin is defined by the intersection of the color
plane with the vertical intensity axis.
40

Course 7
Converting colors from RGB to HSI

H
360
if B G
if B G
( R G ) ( R B )
2
arccos
1
( R G )2 ( R B )(G B ) 2

3
S 1
min{ R, G , B }
RG B
1
I R G B
3
41

Course 7
It is assumed that the RGB values have been normalized to

the range [0,1] and that angle is measured with respect to
the red axis of the HSI space in Figure 6.13. Hue can be
normalized to the range [0,1] by dividing it to 360. The
other two HSI components are in this range if the RGB values
are in the interval [0,1].
R=100, G=150, B=200 H=210, S=1/3, I=150/255=0.588
42

Course 7
Converting colors from RGB to HSI

Given values of HSI in the interval [0,1] we now want to find
the corresponding RGB values in the same range.
RG sector (0 H < 120)
B I (1 S )
S cos H
R I 1
cos(60
H
)
G 3I ( R B)
43

Course 7
GB sector (120 H < 240)

R I (1 S )
S cos H
H H 120 , G I 1
cos(60
)
H
B 3I ( R B)
BR sector (120 H < 240)

G I (1 S )
S cos H
H H 240 , B I 1
cos(60
)
H
R 3I ( R B)
44

Course 7
Pseudocolor Image Processing

Pseudocolor (also called false color) image processing
consists of assigning colors to gray values based on a
specified criterion. The main use of pseudocolor is for human
visualization and interpretation of gray-scale events in an
image or sequence of images.
Intensity (Density) Slicing

If an image is viewed as a 3-D function, the method can be
described as one of placing planes parallel to the coordinate
45

Course 7
plane of the image; each plane then slices the function in

the area of intersection.
46

Course 7
The plane at f ( x , y ) li slices the image function into two

levels. If a different color is assigned to each side of the
plane, any pixel whose intensity level is above the plane will
be coded with one color and any pixel below the plane will be
coded with other color. Levels that lie on the plane itself may
be arbitrarily assigned one of the two colors. The result is a
two color image whose relative appearance can be controlled
by moving the slicing plane up and down the intensity axis.
47

Course 7
Let [0,L-1] represent the gray scale, let l0 represent black

(f(x,y)=0) and level lL-1 represent white (f(x,y)=L-1). Suppose
that P planes perpendicular to the intensity axis are defined at
levels l1, l2,, lP , 0<P<L-1. The P planes partition the gray
scale into P+1 intervals, V1, V2,, VP+1. Intensity to color
assignments are made according to the relation:
f ( x , y ) ck if f ( x , y ) Vk .
48

Course 7
Measurements of rainfall levels with ground-base sensors are

difficult and expensive, and total rainfall figures are even
more difficult to obtain because a significant portion of
49

Course 7
precipitations occurs over the ocean. One way to obtain these

figures is to use a satellite. The TRMM (Tropical Rainfall
Measuring Mission) satellite utilizes, among others, three
sensors specially designed to detect rain: a precipitation radar,
a microwave imager, and a visible and infrared scanner. The
results from the various rain sensors are processed, resulting
in estimates of average rainfall over a given time period in the
area monitored by the sensors. From these estimates, it is not
difficult to generate gray-scale images whose intensity values
50

Course 7
correspond directly to rainfall, with each pixel representing a

physical land area whose size depends on the resolution of the
sensors.
51

Course 7
Basics of Full-Color Image Processing
c R ( x , y ) R( x , y )
3
4
f : D / , f ( x , y ) c cG ( x , y ) G ( x , y )
cB ( x , y ) B( x , y )
C ( x , y )
M ( x , y )
f ( x, y) c
Y ( x , y )
K ( x, y)
52

Course 7
Color Transformations
- processing the components of a color image within the
context of a single color model

g ( x , y ) T f ( x , y )
si Ti ( r1 , r2 ,..., rn ) , i 1, 2,..., n ( f ( x , y ) r , g ( x , y ) s )
ri, si are the color components of f(x,y) and g(x,y), n is the
number of color components, and {T1, T2,, Tn} is a set of

transformations or color mapping functions that operate on
ri to produce si. (n=3 or n=4)

53

Course 7
54

Course 7
In theory, any transformation can be performed in any color

model. In practice, some operations are better suited to
specific color models.
Suppose we wish to modify the intensity of a color image,
using
g( x , y ) k f ( x , y ) , 0 k 1
In the HSI color space, this can be done with:

s1= r1 , s2= r2 , s3=k r3
55

Course 7
In the RGB/CMY color model all components must be

transformed
s1= kr1 , s2= kr2 , s3=kr3 (RGB)
si = kri+(1-k) , i=1,2,3 (CMY)
Although the HSI transformation involves the fewest number

of operations, the costs for converting an RGB or CMY(K)
image to the HSI color space are much bigger than the
transformations.
56

Course 7
57

Course 7
Color Complements
The hues directly opposite one another on the above color

circle are called complements (analogous to the gray-scale
negatives).
58

Course 7
59

Course 7
Unlike the intensity transformation, the RGB complement

transformation functions used in this example do not have
straightforward
HSI
space
equivalent.
The
saturation
component of the complement cannot be computed from the

saturation component of the input image alone.
Color Slicing
Highlighting a specific range of colors in an image is useful

for separating objects from their surroundings. The basic idea
is either to:
60

Course 7
- display the colors of interest so they stand out from the
background
- use the region defined by the colors as a mask for further
processing.
One of the simplest ways to slice a color image is to map
the colors outside some range of interest to a neutral color.
If the colors of interest are enclosed by a cube (or
hypercube, if n>3) of width W and centered at a
61

Course 7
prototypical
(e.g.
average)
color
with
components
a1 , a2 ,..., an the set of transformations is:

W
, 1 j n
0.5 if rj a j
si
2
ri
otherwise
, i 1, 2,..., n
These transformations highlight the colors around the

prototype by forcing all other colors to the midpoint of the
reference color space (an arbitrarily chosen neutral point).
62

Course 7
For the RGB color space, for example, a suitable neutral point
is middle gray or color (0.5, 0.5, 0.5).
If a sphere is used to specify the colors of interest, the
transformations are:
n
2
2
0.5
if
(
r
a
)
R
j
j
0
si
j 1
r
otherwise
i
, i 1, 2,..., n
where R0 is the radius of the enclosing sphere and
a1 , a2 ,..., an are the components of its center.

63

Course 7
64

Course 7
Tone and Color Corrections
The effectiveness of such transformations is judged ultimately

in print. The transformations are developed and evaluated on
monitors. It is necessary to have a high degree of consistency
between the monitors and the output devices. This is best
accomplished with a device-independent color model that
relates the color gamuts of the monitors and output devices,
as well as any other devices being used, to one another. The
model of choice for many color management systems (CMS)
65

Course 7
is the CIE L*a*b* model, also called CIELAB. The L*a*b*

color components are given by the following equations:
Y
L* 116 h
YW
16
X
Y
a* 500 h
h
YW
XW
Y
b* 200 h
YW
66
Z
h
ZW

Course 7
3 q
h(q )
16
7.787q
116
q 0.008856
q 0.008856
X W , YW , ZW are reference tristimulus values typically the
white of a perfectly reflecting diffuser under CIE standard

D65 illumination ( x 0.3127 , y 0.33290 , z 1 x y ).
The L*a*b* color space is colorimetric (i.e. colors perceived
as matching are encoded identically), perceptually uniform
(i.e. color differences among various hues are perceived
67

Course 7
uniformly), and device independent. Like the HSI system, the

L*a*b* system is an excellent decoupler of intensity
(represented by lightness L*) and color (represented by a* for

red minus green and b* for green minus blue), making it
useful in both image manipulation (tone and contrast editing)
and image compression applications.
Histogram Processing
It is not advisable to histogram equalize the components of a

color image independently. This can produce wrong colors. A
68

Course 7
more logical approach is to spread the color intensity

uniformly, leaving the colors themselves (e.g., hues)
unchanged. The HSI color space is ideally suited for this type
of approach.
69

Course 7
The unprocessed image contains a large number of dark

colors that reduce the median intensity to 0.36. Histogram
equalizing the intensity component, without altering the hue
and saturation produced image Figure 6.37(c). The image is
brighter. Figure 6.37(d) was obtained by increasing also the
saturation component.
70

Course 7
Color Image Smoothing
Let Sxy denote a neighborhood centered at (x,y) in an RGB

color image. The average of the RGB component vectors in
this neighborhood is:
1
c ( x, y)
K
K
1
c ( x, y)
K
1
K
c( s , t )
( s , t ) S xy
71
R( s , t )
( s , t ) S xy
G ( s, t )
( s , t ) S xy
B( s , t )
( s , t ) S xy

Course 7
72

Course 7
Color Image Sharpening

g( x , y ) f ( x , y ) c 2 f ( x , y )
73

Course 7
2 R( x , y )
2
2
c( x , y ) G ( x , y )
2 B( x , y )
74

Course 8

Course 8
Morphological Image Processing

Morphology deals with form and structure. Mathematical
morphology is a tool for extracting image components that are
useful in the representation and description of region shape,
such as boundaries, skeletons, and the convex hull. In this
chapter, the inputs are images but the outputs are attributes
extracted from these images.

Course 8
Preliminaries
The reflection of a set B, denoted B is defined as
B { w ; w b , for b B }
The translation of a set B by point z = (z1, z2), denoted (B)z
is defined as
( B ) z { c ; c z b for b B }

Course 8
Set reflection and translation are used in morphology to

formulate operations based on so-called structuring elements
(SE): small sets or subimages used to probe an image under
study for properties of interest. In addition to a definition of
which elements are members of the SE, the origin of a

Course 8
structuring element also must be specified. The origin of the

SE is usually indicated by a black dot. When the SE is
symmetric and no dot is shown, the assumption is that the
origin is at the center of symmetry.
When working with images, it is required that structuring
elements are rectangular arrays. This is accomplished by
appending the smallest possible number of background
elements necessary to form a rectangular array.

Course 8

Course 8
Erosion and Dilation

Many of the morphological algorithms are based on these two
primitive operations: erosion and dilation.
Erosion
Let A and B be to sets in 2 . The erosion of A by B, denoted
A B is defined as
A B { z ; ( B )z A }
This definition indicates that the erosion of A by B is the set
of all points z such that B, translated by z, is contained in A.

Course 8
In the following, set B is assumed to be a structuring element.

Because the statement that B has to be contained in A is
equivalent to B not shearing any common elements with the
background, erosion can be expressed equivalently:
A B { z ; ( B ) z Ac }

Course 8

Course 8
Equivalent definitions of erosion:
A B { w 2 ; w b A for every b B }
A B ( A) b
bB
Erosion shrinks or thins objects in a binary image. We can

view erosion as a morphological filtering operation in which
image details smaller than the structuring element are filtered
(removed) from the image.

Course 8
Dilation
Let A and B be to sets in 2 . The dilation of A by B, denoted
A B is defined as:
A B { z ; ( B )z A }
The dilation of A by B is the set of all displacements, z, such
that B and A overlap by at least one element. The above
definition can be written equivalently as:
A B { z ; ( B ) z A A }
We assume that B is a structuring element.

Course 8
Equivalent definitions of dilation:

A B { w 2 ; w a b , for some a A and b B }
A B ( A) b
bB
The basic process of rotating B about its origin and then

successively displacing it so that it slides over set (image) A
is analogous to spatial convolution. Dilation being based on
set operations is a nonlinear operation, whereas convolution is
a linear operation.

Course 8
Unlike erosion which is a shrinking or thinning operation,

dilation grows or thickens objects in a binary image. The
specific manner and the extent of this thickening are
controlled by the shape of the structuring element used.

Course 8
One of the simplest applications of dilation is for bridging

gaps.

Course 8
Duality
Erosion and dilation are duals of each other with respect to set
complementation and reflection:
c
A
A
B
c
A
A
B
The duality property is useful particularly when the

structuring element is symmetric with respect to its origin, so
that B B . Then, we can obtain the erosion of an image by B

Course 8
simply by dilating its background (i.e. dilating Ac) with the

same structuring element and complementing the result.
Opening and Closing
Opening generally smoothes the contour of an object, breaks
narrow isthmuses, and eliminates thin protrusions. Closing

also tends to smooth section of contours but, as opposed to
opening, it generally fuses narrow breaks and long thin gulfs,
eliminates small holes, and fills gaps in the contour.

Course 8
The opening of set A by structuring element B is defined as:

A B A B B
Thus, the opening of A by B is the erosion of A by B,

followed by a dilation of the result by B.
Similarly, the closing of set A by structuring element B is
defined as:
A B A B B
which says that the closing of A by B is the dilation of A by

B, followed by an erosion of the result by B.

Course 8
The opening operation has a simple geometric interpretation.

Suppose that we view the structuring element B as a (flat)
rolling ball. The boundary of A B is then established by
the points in B that reach the farthest into the boundary of A
as B is rolled around the inside of this boundary. The opening
of A by B is obtained by taking the union of all translates of B
that fit into A.
A B { ( B )z ; ( B )z A }

Course 8
Closing has a similar geometric interpretation, except that

now we roll B on the outside of the boundary.

Course 8

Course 8

Course 8
Opening and closing are dual of each other with respect to

set complementation and reflection:
c
A
A
B

c
A B
Ac B
The opening operation satisfies the following properties:

1.
A B A
2.
if C D then C B D B
3.
A B B A B

Course 8
Similarly, the closing operation satisfies the following

properties:
1)
A AB
2)
if C D then C B D B
3)
AB B AB
Condition 3 in both cases states that multiple openings or

closings of a set have no effect after the operator has been
applied once.

Course 8

Course 8
The Hit-or-Miss Transformation

The morphological hit-or-miss transformation is a basic tool
for shape detection. Consider the set A from Figure 9.12
consisting of three shapes (subsets) denoted C, D, and E. The
objective is to locate one of the shapes, say, D.
Let the origin of each shape be located at its center of gravity.
Let D be enclosed by a small window, W. The local
background of D with respect to W is defined as the set
difference (W-D) (Figure 9.12(b)). Figure 9.12(c) shows the

Course 8
complement of A. Fig. 9.12(d) shows the erosion of A by D.

Figure 9.12(e) shows the erosion of the complement of A by
the local background set (W-D). From Figures 9.12(d) and
(e) we can see that the set of location for which D exactly fits
inside A is the intersection of the erosion of A by D and the
erosion of Ac by (W-D) as shown in Figure 9.12(f).
If B denotes the set composed of D and its background, the
match (or the set of matches) of B in A, denoted A B is:
A B ( A D ) Ac (W D )

Course 8

Course 8
We can generalize the notation by letting B = (B1, B2), where
B1 is the set formed from elements of B associated with an

object and B2 is the set of elements of B associated with the
corresponding background (B1=D, B2=W-D) in the preceding
example).
A B ( A B1 ) Ac B2
The set A B contains all the (origin) points at which,
simultaneously, B1 found a match (hit) in A and B2 found a

Course 8
match in Ac. Taking into account the definition and properties

of erosion we can rewrite the above relation as:
A B ( A B1 ) ( A B2 )
The above three equations for A B are referred as the
morphological hit-or-miss transform.

Course 8
Some Basic Morphological Algorithms

When dealing with binary images, one of the principal
applications
of
morphology
is
in
extracting
image
components that are useful in the representation and

description of shape. We consider morphological algorithms
for extracting boundaries, connected components, the convex
hull, and the skeleton of a region.
The images are shown graphically with 1s shaded and 0s in
white.

Course 8
Boundary Extraction
The boundary of a set A, denoted (A), can be obtained by
first eroding A by B and then performing the set difference
between A and its erosion.
( A) A A B
where B is a suitable structuring element.

Course 8

Course 8

Course 8
Hole Filling
A hole may be defined as a background region surrounded by
a connected border of foreground pixels.
We present an
algorithm based on set dilation, complementation, and

intersection for filling holes in an image.
Let A denote a set whose elements are 8-connected
boundaries, each boundary enclosing a background region

Course 8
(i.e. a hole). Given a point in each hole, the objective is to fill

all the holes with 1s.
We form an array, X0, of 0s (the same size as the array
containing A), except at the location in X0 corresponding to
the given point in each hole, which is set to 1. The following
procedure fills all the holes with 1s:
X k X k 1 B Ac
, k 1, 2, 3,...
where B is the symmetric structuring element in Figure

9.15(c).
The algorithm terminates at iteration step k if

Course 8
Xk=Xk-1. The set Xk then contains all the filled holes. The set
union of Xk and A contains all the filled holes and their

boundaries.

Course 8

Course 8
Extraction of Connected Components
Extraction of connected components from binary images is

important in many automated image analysis applications.
Let A be a set containing one or more connected
components. Form an array X0 (of the same size as the array
containing A) whose elements are 0s (background values),
except at each location known to correspond to a point in
each connected component in A, which we set to 1

Course 8
(foreground value). The objective is to start with X0 and find

all the connected components.
The procedure that accomplishes this task is the following:
X k ( X k 1 B ) A , k 1, 2, 3,...
where B is a suitable structuring element. The procedure

terminates when Xk = Xk-1, with Xk containing all connected
components of the input image.

Course 8

Course 8

Course 8
Figure 9.18(a) shows an X-ray image of a chicken breast that

contains bone fragment. It is of considerable interest to be
able to detect such objects in processed food befor packing
and/or shiping. In this case, the density of the bones is such
that their normal intensity values are different from the
background. This makes extraction of the bones from the
background a simple matter by using a single threshold. The
result is the binary image in Figure 9.18(b). We can erode the
thresholded image so that only objects of significant size

Course 8
remain. In this example, we define as significant any object

that remains after erosion with a 55 structuring elemnt of 1s.
The result of erosion is shown in Figure 9.18(c). The next
step is to analyse the objects that remain. We identify these
objects by extracting the connected components in the image.
There are a total of 15 connected components, with four of
them being of dominant size. This is enough to determine that
significant undesirable objects are containd in the original
image.

Course 8
Convex Hull
A set A is said to be convex if the straight line segment

joining any two points in A lies entirely within A. The convex
hull H of an arbitrary set S is the smallest convex set
containing S. The set difference H-S is called the convex

deficiency of S.
Convex hull and convex deficiency are
useful for objects description.

We present a simple morphological algorithm for obtaining
the convex hall C(A) of a set A.

Course 8

Course 8
Let Bi , i=1,2,3,4, represent the four structuring elements in

Figure 9.19(a). The procedure consists of implementing the
equation:
X 0i A
X ki ( X ki 1 B i ) A , i 1, 2, 3,4 and k 1, 2, 3,...
When the procedure converges ( X ki X ki 1 ), we let Di X ki .

Then the convex hull of A is
4
C ( A) D i
i 1

Course 8
The method consists of iteratively applying the hit-or-miss

transform to A with B1; when no further changes occur, we
perform the union with A and call the result D1. The
procedure is repeated with B2 (applied to A) until no further
chances occur; and so on. The union of the four resulting Ds
constitutes the convex hull of A.

Course 8
Morphological Reconstruction
Morphological reconstruction is a powerful morphological
transformation that involves two images and a structuring
element. One image, the marker, contains the starting points
for the transformation. The second image, the mask,
constrains the transformation. The structuring element is used
to define connectivity.

Course 8
Geodesic dilation and erosion

Let F denote the marker image and G the mask image. We
assume that both F and G are binary images and that F G.
The geodesic dilation of size 1 of the marker image with
respect to the mask is defined as:
DG(1) ( F ) ( F B ) G
The geodesic dilation of size n of F with respect to G is
defined as:
DG( n ) ( F ) DG(1) DG( n1) ( F ) , with DG(0) ( F ) F

Course 8

Course 8
Similarly, the geodesic erosion of size 1 of marker F with

respect to mask G is defined as:
EG(1) ( F ) ( F B ) G
The geodesic erosion of size n of F with respect to G is
defined as:
EG( n ) ( F ) EG(1) EG( n1) ( F ) with EG(0) ( F ) F

Course 8

Course 8
Morphological reconstruction by dilation and erosion

Morphological reconstruction by dilation of a mask image G
from a marker image F, denoted RGD ( F ) is defined as the

geodesic dilation of F with respect to G, iterated until
stability is achieved:
RGD ( F ) DG( k ) ( F ) , with k such that DG( k ) ( F ) DG( k 1) ( F )

In a similar manner, the morphological reconstruction by
erosion of a mask image G from a marker image F, denoted

Course 8
RGE ( F ) is defined as the geodesic dilation of F with respect to

G, iterated until stability is achieved:
RGE ( F ) EG( k ) ( F ) , with k such that EG( k ) ( F ) EG( k 1) ( F )

Course 8
Opening by reconstruction
In a morphological opening, erosion removes small objects
and the dilation attempts to restore the shape of the objects
that remains. The accuracy of this restoration depends on the
similarity of the shape of the objects and the structuring
element used. Opening by reconstruction restores exactly the
shapes of the objects that remain after erosion. The opening
by reconstruction of size n of an image F is defined as the
reconstruction by dilation of F from the erosion of size n of F

Course 8
OR( n ) ( F ) RFD F nB
where ( F nB ) denotes n erosions of F by B.
Figure 9.29 shows an example of opening by reconstruction.
We are interested in extracting from this image the characters
that contain long, vertical strokes. Opening by reconstruction
requires at least one erosion operation that was performed and
produced Figure 9.29(b). The structuring elements length
was proportional to the average height of the tall characters
(51 pixels) and width of one pixel.

Course 8
In Figure 9.29(c) is the opening of the image using the same

structuring element. Figure 9.29(d) is the opening by
reconstruction of size 1 of F.

Course 8

Course 8
Filling holes
The following procedure is a fully automated procedure for
filling holes based on morphological reconstruction. Let
I(x,y) denote a binary image and we form a marker image F

that is 0 everywhere, except at the image border, where it is
ser to 1-I, that is:
1 I ( x , y )
F ( x, y)
0
if ( x , y ) is on the border of I
otherwise

Course 8
Then
H R ( F )
D
Ic
is a binary image equal to I with all holes filled.

Course 8

Course 8

Course 8

Course 8

Course 9

Course 9
Gray-Scale Morphology
In this section we denote by f(x,y) the gray-scale image and
by b(x,y) the structuring element. Structuring elements in
gray-scale morphology are of two categories: nonflat and flat.

Course 9
As in the binary case, the origin of the structuring element

must be clearly identified. In this section only symmetrical,
flat structuring elements of unit height whose origin are the
center are considered. The reflection of a structuring element
in gray-scale morphpology is defined as:
b ( x , y ) b( x , y ).

Course 9
Erosion and Dilation

The erosion of f by a flat structuring element b at any locatio
(x,y) is defined as the minimum value of the image in the

region coincident with b when the origin of b is at (x,y):
f b ( x , y ) min f ( x s, y t ) ; ( s, t ) b
To find the erosion of f by b, we place the origin of the
structuring element at every pixel location in the image. The
erosion at any location is determined by selecting the

Course 9
minimum value of f from all the values of f contained in the

region coincident with b.
Similarly, the dilation of f by a flat structuring element b at
any location (x,y) is defined as the maximum value of the
image in the window outlined by b when the origin of b is at
(x,y):
f b ( x , y ) max f ( x s, y t ) ; ( s, t ) b
Because gray-scale erosion with a flat SE computes the
minimum value of f in every neighborhood of (x,y) coincident

Course 9
with b, we expect in general that an eroded gray-scale image

will be darker than the original, that the sizes (with respect to
the size of the SE) of bright features will be reduced, and that
the sizes of dark feature will be increased.
Nonflat SE have gray-scale values that vary over their domain
of definition. The erosion of image f by nonflat structuring
element bN is defined as
f bN ( x , y ) min f ( x s , y t ) bN ( s , t ) ; ( s , t ) bN

Course 9
Erosion using a nonflat element is not bounded in general by

the values of f, which can present problems in interpreting
results, and that is the reason for which this type of erosion is
seldom used in practice.
In a similar manner, dilation using nonflat SE is defined as:
f bN ( x , y ) max f ( x s , y t ) bN ( s , t ) ; ( s , t ) bN
As in binary case, erosion and dilation are duals with respect

to function complementation and reflection, that is:

Course 9
( f b )c ( x , y ) ( f c b )( x , y ) ,
f c ( x , y ) f ( x , y ) , b b( x , y )
( f b )c ( f c b )
( f b )c ( f c b )
Opening and Closing

The expressions for opening and closing gray-scale images
have the same form as for the binary image. The opening of
image f by structuring element b is:
f b ( f b) b

Course 9
Opening is simply the erosion of f by b, followed by dilation

of the result with b. Similarly, the closing of f by b is:
f b ( f b) b
The opening and the closing for gray-scale images are dual
with respect to complementation and SE reflection:
( f b )c f c b
( f b )c f c b
Opening and closing of images have a simple geometric

interpretation. Suppose that an image function f(x,y) is
viewed as a 3-D surface. The opening of f by b can be

Course 9
interpreted geometrically as pushing the structuring element

up from below against the undersurface of f. At each location
of the origin of b, the opening is the highest value reached by
any part of b as it pushes up against the undersurface of f. The
complete opening is the set of all such values obtained by
having the origin of b visit every (x,y) coordinate of f. Figure
9.36 illustrates the concept in one dimension.
The gray-scale opening operation satisfies the following
properties:

Course 9

Course 9
1. f b f
2. If f1 f 2 then f1 b
f2 b
3. f b b f b
The notation e r indicates that the domain of e is a subset
of the domain of r, and also that e(x, y) r(x, y) for any (x, y)
in the domain of e.
Similarly, the closing operation satisfies the following
properties:

Course 9
a. f f b
b. If f1 f 2 then f1 b
f2 b
c. f b b f b
Some Basic Gray-Scale Morphological Algorithms

Morphological smoothing
Because opening suppresses bright details smaller than the
specified SE, and closing suppresses dark details, they are

Course 9
used often in combination as morphological filters for image

smoothing and noise removal.
Figure 9.38(a) is the image of Cygnus Loop supernova taken
in X-ray band and the region of interest in the central light
region, the smaller components are noise. The objective is to
remove the noise. Figure 9.38(b) shows the result of opening
the original image with a flat disk of radius 2 and then closing
the opening with an SE of the same size. Figures 9.38(c) and
(d) show the results of the same operation using SEs of radii 3

Course 9
and 5, respectively. The noise components on the lower side

of the image could not be removed completely because of
their density.

Course 9

Course 9
Morphological gradient
Dilation and erosion can be used in combination with image
subtraction to obtain the morphological gradient of an image:
g f b f b
The dilation thickens regions in an image and the erosion
shrinks them. Their difference emphasizes the boundaries
between regions. Homogeneous areas are not affected (as
long as the SE is relatively small) so the subtraction operation
tends to eliminate them. The net result is an image in which

Course 9
the edges are enhanced and the contribution of the

homogeneous areas is suppressed, thus producing a
derivative-like (gradient) effect.
Figure 9.39(a) is a head CT scan, and the next two figures are
the opening and the closing with a 3 3 SE of all 1s. Figure
9.39(d) is the morphological gradient, in which the
boundaries between regions are clearly delineated.

Course 9

Course 9
Image Segmentation
Segmentation subdivides an image into its constituent regions
and objects. The level of detail to which the subdivision is
carried depends on the problem being solved. Segmentation
should stop when the objects or regions of interest in an
application have been detected. For example, in the
automated inspection of electronic assemblies, interest lies in
analyzing images of products with the objective of
determining the presence or absence of specific anomalies,

Course 9
such as missing components or broken connection paths.

There is no point in carrying segmentation past the level of
detail required to identify those elements.
Segmentation of nontrivial images is one of the most
difficult tasks in image processing. Segmentation accuracy
determines the eventual success or failure of computerized
analysis procedure.

Course 9
Most of the segmentation algorithms described in this

section are based on one of two basic properties of intensity
values:
- discontinuity
- similarity
In the first category, the approach is to partition an image
based on abrupt changes in intensity, such as edges. The
principal approaches in the second category are based on
partitioning an image into regions that are similar according

Course 9
to a set of predefined criteria. Thresholding, region growing,

and region splitting and merging are examples of methods in
this category.
Fundamentals
Let R represent the entire spatial region occupied by an
image. Image segmentation can be viewed as a process that
partitions R into n subregions R1, R2,, Rn such that:

Course 9
n
(a)
R
i 1
(b) Ri is a connected set, i = 1,2,,n

(c) Ri R j for all i and j , i j
(d) Q( Ri ) = TRUE for i = 1,2,,n
(e) Q(Ri Rj ) = FALSE for any adjacent regions Ri , Rj
Q(R) is a logical predicate defined over the points in set R.

Condition (a) indicates that the segmentation must be
complete; that every pixel must be in a region. Condition (b)

Course 9
requires that points in a region be connected in some

predefined sense (e.g. the points must be 4- or 8-connected).
Condition (c) indicates that the regions must be disjoint.
Condition (d) deals with the properties that must be satisfied
by the pixels in a segmented region (for example Q(Ri ) is true
if all pixels have the same intensity level). Finally, condition
(e) indicates that two adjacent regions Ri and Rj must be
different in the sense of predicate Q.

Course 9
The fundamental problem in segmentation is to partition an

image into regions that satisfy the preceding conditions.
Segmentation algorithms for monochrome images generally
are based on one of two basic categories dealing with
properties of intensity values: discontinuity and similarity. In
the first category, the assumption is that boundaries of regions
are sufficiently different from each other and from the
background to allow boundary detection based on local
discontinuities in intensity.

Course 9
Edge-based segmentation is the principal approach used in
this category. Region-based segmentation approaches from

the second category are based on partitioning an image into
regions that are similar according to a set of predefined
criteria.

Course 9

Course 9
Point, Line, and Edge Detection

Edge pixels are pixels at which the intensity of an image
function changes abruptly, and edges (or edge segments) are

sets of connected edge pixels. Edge detectors are local image
processing methods designed to detect edge pixels. A line
may be viewed as an edge segment in which the intensity of
the background on either side of the line is either much higher
or much lower than the intensity of the line pixels.

Course 9
Background
Abrupt, local changes in intensity can be detected using
derivatives, usually first- and second-order derivatives which
are defined in terms of differences.
Any approximation for a first derivative must be:
(1) zero in areas of constant intensity
(2) nonzero at the onset of an intensity step or ramp
(3) nonzero at points along an intensity ramp.

Course 9
An approximation for a second derivative must be:

(1) zero in areas of constant intensity
(2) nonzero at the onset and end of an intensity step or ramp
(3) zero along intensity ramps.
f
f ( x ) f ( x 1) f ( x )
x
2 f f ( x )
f ( x 1) f ( x ) f ( x 2) 2 f ( x 1) f ( x )
2
x
x
2 f
f ( x ) f ( x 1) f ( x 1) 2 f ( x )
2
x

Course 9

Course 9
(a) First-order derivatives generally produce thicker

edges in an image
(b) Second-order derivatives have a stronger response to
fine detail, such as thin lines, isolated points, and
noise
(c) Second-order derivatives produce a double-edge
response at ramp and step transitions in intensity

Course 9
(d) The sign of the second-order derivative can be used

to determine whether a transition into an edge is
from light to dark or dark to light.
Computing first and second derivatives at every pixel location
is done using spatial filters.

Course 9
The response of the mask at the center point of the region is:
9
R w1 z1 w2 z2 w9 z9 wk zk
(1)
k 1
where zk is the intensity of the pixel whose spatial location

corresponds to the location of the k-th coefficient in the mask.
Detection of isolated points

Point detection is based on computation of the second
derivative of the image.

Course 9
2
2
f
2
f ( x, y)
2 the Laplacian
2
x
y
2 f
f ( x 1, y ) f ( x 1, y ) 2 f ( x , y )
2
x
2 f
f ( x , y 1) f ( x , y 1) 2 f ( x , y )
2
y
2 f ( x , y ) f ( x 1, y ) f ( x 1, y ) f ( x , y 1)
f ( x , y 1) 4 f ( x , y )

Course 9

Course 9
Using the Laplacian mask in Figure 10.4(a) we say that a

point has been detected at the location (x,y) on which the
mask is centered if the absolute value of the response of the
mask at that point exceeds a specified threshold. Such points
are labeled 1 in the output image and all others are labeled 0,
thus producing a binary image. The output is obtained using
the following expression:
1
g( x , y )
0
if
R( x , y ) T
otherwise

Course 9
Where g is the output image, T > 0 is the threshold, and R is

given by (1). This formulation measures the weighted
difference between a pixel and its 8-neighbors. The idea is
that the intensity of an isolated point will be quite different
from its surroundings and thus will be easily detectable by
this type of mask. The only differences in intensity that are
considered of interest are those large enough (as determined
by T) to be considered isolated points. The sum of the

Course 9
coefficients of the mask is zero, indicating that the mask

response will be zero in areas of constant intensity.
Line Detection
For line detection we can expect second derivatives to result
in a stronger response and to produce thinner lines than first
derivatives. We can use the Laplacian mask in Figure 10.4(a)
for line detection also, taking care of the double-line effect of
the second order derivative.

Course 9
Figure 10.5(a) shows a 486 486 (binary) portion of a

wire-bond mask for an electronic circuit and Figure 10.5(b)
shows its Laplacian. Scaling is necessary in this case (the
Laplacian image contains negative values). Mid gray
represents 0, darker shades of gray represent negative values,
and lighter shades are positive. It might appear that negative
values can be handled simply by taking the absolute value of
the Laplacian image. Figure 10.5(c) shows that this approach
doubles the thickness of the lines. A more suitable approach

Course 9
is to use only the positive values of the Laplacian (Figure

10.5(d)).

Course 9
The Laplacian detector in Figure 10.4(a) is isotropic, so its

response is independent of the direction (with respect to the
four directions of the 3 3 Laplacian mask: vertical,
horizontal, and two diagonals). Often, interest lies in
detecting lines in specified directions.
Consider the masks in Figure 10.6. Suppose that an image
with a constant background and containing various lines
(oriented at 0, 45 and 90) is filtered with the first mask.
The maximum responses would occur at image locations in

Course 9
which horizontal lines passed through the middle row of the

mask.

Course 9
A similar experiment would reveal that the second mask in

Figure 10.6 responds best to lines oriented +45; the third
mask to vertical lines; and the fourth mask to lines in the -45
direction.
Let R1, R2, R3 and R4 denote the response of the masks in
Figure 10.6 from left to right, where the Rs are given by (1).
Suppose that an image is filtered (individually) with the four
masks. If at a given point in the image |Rk| > |Rj|, for all j k,

Course 9
that point is said to be more likely associated with a line in

the direction of mask k.
If we are interested in detecting all the lines in an image in the
direction defined by a given mask, we simply run the mask
through the image and threshold the absolute value of the
result. The points that are left are the strongest responses
which, for line 1 pixels thick, correspond closest to the
direction defined by the mask.

Course 9
In Figure 10.7(a) image we are interested in lines oriented at

+45. We use the second mask, the result is in Figure 10.7(b).

Course 9
Edge Models
Edge detection is the approach used most frequently for
segmenting images based on abrupt (local) changes in
intensity.
Edge models are classified according to their intensity
profiles. A step edge involves a transition between two
intensity levels occurring ideally over the distance of 1 pixel.
Figure 10.8(a) shows a section of a vertical step edge and a
horizontal intensity profile through the edge.

Course 9
In practice, digital images have edges that are blurred and

noisy, with the degree of blurring determined principally by
limitations in the focusing mechanism, and the noise level
determined principally by the electronic components of the
imaging system. In such situations, edges are more closely

Course 9
modeled as having an intensity ramp profile, such as the edge

in Figure 10.8(b). The slope of the ramp is inversely
proportional to the degree of blurring in the edge. In this
model, we no longer have a thin (1 pixel thick) path. An edge
point now is any point contained in the ramp and an edge
segment would then be a set of such points that are
connected.
A third model of an edge is the so-called roof edge, having
the characteristics illustrated in Figure 10.8(c). Roof edges

Course 9
are models of lines through a region, with the base (width) of

a roof edge being determined by the thickness and sharpness
of the line.
It is not unusual to find images that contain all three types of
edges.
The magnitude of the first derivative can be used to detect the
presence of an edge at a point in an image. Similarly, the sign
of the second derivative can be used to determine whether an

Course 9
edge pixel lies on the dark or light side of an edge. The

second derivative has the following properties:
(1) it produces two values for every edge in an image (an
undesirable feature)
(2) its zero crossing can be used for locating the centers of
thick edges
The zero crossing of the second derivative is the intersection
between the zero intensity axis and a line extending between
the extrema of the second derivative.

Course 9

Course 9
There are three fundamental steps performed in edge

detection:
1. Image smoothing for noise reduction
2. Detection of edge points this is a local operation that
extracts from an image all points that are potential
candidates to become edge points
3. Edge localization the objective of this step is to select
from the candidate points only the points that are true
members of the set of points comprising an edge.

Course 10

Course 10
Basic Edge Detection

The image gradient and its properties
The gradient of an image is the tool for finding edge strength
and direction at location (x,y):
f
g x x
f grad ( f )
g y f
y

Course 10
This vector has the important geometrical property that it

points in the direction of the greatest rate of change of f at
location (x,y).
The magnitude (length) of vector f
M ( x , y ) mag(f )
g x2 g 2y
is the value of the rate of change in the direction of the

gradient vector.

Course 10
The direction of the gradient vector is given by the angle:

gx
( x , y ) arc tan
g y
measured with respect to the x-axis. The direction of an edge

at any arbitrary point (x,y) is orthogonal to the direction,
( x , y ) , of the gradient vector at the point.

Course 10
The gradient vector sometimes is called the edge normal.

When the vector is normalized to unit length (by dividing it
by its magnitude) the resulting vector is commonly referred to
as the edge unit normal.
Gradient operators
f ( x , y )
gx
f ( x 1, y ) f ( x , y )
x
f ( x , y )
gy
f ( x , y 1) f ( x , y )
y

Course 10
When diagonal edge direction is of interest, we need a 2-D

mask. The Roberts cross-gradient operators are one of the
earliest attempts to use 2-D masks with a diagonal preference.
Consider the 33 region in Figure 10.14(a). The Roberts
operators are based on implementing the diagonal differences.

Course 10

Course 10
f
z9 z5 f ( x 1, y 1) f ( x , y )
gx
x
f
z8 z6 f ( x 1, y ) f ( x , y 1)
gy
y
Masks of size 2 2 are simple conceptually, but they are
not as useful for computing edge direction as masks that are
symmetric about the center point, the smallest of which are of
size 3 3.

Course 10
Prewitt operators
f
gx
( z7 z8 z9 ) ( z1 z2 z3 )
x
f
gy
( z3 z6 z9 ) ( z1 z4 z7 )
y
Sobel operators
f
( z7 2 z8 z9 ) ( z1 2 z2 z3 )
gx
x
f
( z3 2 z6 z9 ) ( z1 2 z4 z7 )
gy
y

Course 10
The Sobel masks have better noise-suppression (smoothing)

effects than the Prewitt masks.

Course 10

Course 10

Course 10
When interest lies both in highlighting the principal edges

and on maintaining as much connectivity as possible, it is
common practice to use both smoothing and thresholding.

Course 10

Course 10
More Advanced Techniques for Edge Detection

The edge-detection methods described until now are based on
filtering an image with one or more masks, without
approaching the edge characteristics or the noise content of
the image. In this section, the noise and the nature of the
edges are considered in more advanced edge-detection
techniques.

Course 10
The Marr-Hildreth edge detector

Marr and Hildreth noticed that:
(1) intensity changes are not independent of image scale
and so their detection requires the use of operators of
different sizes;
(2) a sudden intensity change will give rise to a peak or
trough in the first derivative or, equivalently, to a zero
crossing in the second derivative.

Course 10
These ideas suggest that an operator used for edge detection

should have two features:
1) it should be a differential operator capable of
computing a digital approximation of the first or
second derivative at every point in the image
2) it should be capable of being tuned to act at any
desired scale, so that large operators can be used to
detect blurry edges and small operators to detect
sharply focused fine detail.

Course 10
Marr and Hilderth argued that the most satisfactory operator

fulfilling these conditions is the filter 2G , the Laplacian of
G, the 2-D Gaussian function with standard deviation :

G( x, y) e
x2 y2
2 2
(2)
x y 2
G( x, y)
e
x2 y2
2 2
(3)

Course 10
The last expression is called the Laplacian of a Gaussian

(LoG).

Course 10
Because of the shape illustrated in Figure 10.21(a), the LoG

function sometimes is called the Mexican hat operator. Figure
10.21(d) shows a 5 5 mask that approximates the shape in
Figure 10.21(a) (in practice, the negative of this mask is
used). This approximation is not unique. Its purpose is to
capture the essential shape of the LoG function.
Masks of arbitrary size can be generated by sampling
equation (3) and scaling the coefficients so that they sum to
zero. A more effective approach for generating LoG filters is

Course 10
to sample equation (2) to the desired n n size and then

convolve the resulting array with a Laplacian mask, such as
the mask in Figure 10.4 (a).
There are two fundamental ideas behind the selection of the
operator 2G . First the Gaussian part of the operator blurs
the image, thus reducing the intensities of structures
(including noise) at scales much smaller than . The Gaussian
function is smooth in both spatial and frequency domains and
is thus less likely to introduce artifacts (e.g. ringing) not

Course 10
present in the original image. Although first derivatives can

be used for detecting abrupt changes in intensity, they are
directional operators. The Laplacian, on the other hand, has
the important advantage of being isotropic (invariant to
rotation), which not only corresponds to characteristics of the
human visual system but also responds equally to changes in
intensity in any mask direction, thus avoiding having to use
multiple masks to calculate the strongest response at any
point in the image.

Course 10
The Marr-Hildreth algorithm consists of convolving the LoG

filter with an input image f(x,y)
g ( x , y ) 2G ( x , y ) f ( x , y ) 2 G ( x , y ) f ( x , y )
The
Marr-Hildreth
edge-detection
algorithm
may
be
summarized as follows:
1. Filter the input image with an n n Gaussian lowpass
filter obtained by sampling equation (2)
2. Compute the Laplacian of the image resulting in Step 1
3. Find the zero crossing of the image from Step 2.

Course 10
The size of an n n LoG discrete filter should be such that n

is the smallest odd integer greater than or equal to 6.
Choosing a filter mask smaller than this will tend to
truncate the LoG function, with the degree of truncation
being inversely proportional to the size of the mask; using a
larger mask would make little difference in the result.
One approach for finding the zero crossing at any pixel pm
of the filtered image, g(x,y), is based on using a 3 3
neighborhood centered at p. A zero crossing at p implies that

Course 10
the signs of at least two of its opposing neighboring pixel

must differ. There are 4 cases to test: left/right, up/down, and
the two diagonals.

Course 10
The Canny edge detector

Cannys approach is based on three basic objectives:
1. Low error rate. All edges should be found, and there
should be no false responses. The edges detected must
be as close as possible to the true edges
2. Edge points should be well localized. The edges located
must be as close as possible to the true edges, that is, the
distance between a point marked as an edge by the
detector and the center of the true edge should be minim.

Course 10
3. Single edge response. The detector should return only

one point for each true edge point. That is, the number
of local maxima around the true edge should be minim.
This means that the detector should not identify multiple
edge pixels where only a single edge point exists.
In general, it is difficult (or impossible) to find a closed form
solution that satisfy all the preceding objectives. However,
using numerical optimization with 1-D step edges corrupted
by additive white Gaussian noise led to the conclusion that a

Course 10
good approximation to the optimal step edge detector is the

first derivative of a Gaussian:
x2
x2
d 2 2
x 2 2
2e
.
e
dx
Let f(x,y) denote the input image and G(x,y) denote the
Gaussian function:
G( x, y) e
x2 y2
2 2
Form a smoothed, fs(x,y) , by convolving G and f:

Course 10
f s ( x , y ) G ( x , y ) f ( x , y ) .
We compute the gradient magnitude and the angle for fs
M ( x, y)
g g ,
2
x
2
y
f s
f s
gx
, gy
x
y
gx
( x , y ) arctan
g y
M(x, y) contains ridges around local maxima. The next step is
to thin those ridges. One approach is to use nonmaxima
suppression. This can be done in several ways, but the

Course 10
essence of this approach is to specify a number of discrete

orientations of the edge normal (gradient vector). For
example, in a 3 3 region we can define four orientations for
an edge passing through the center point of the image:
horizontal, vertical, +45 and -45.

Course 10
Let d1, d2 , d3 and d4 denote the four basic edge directions for
a 33 region: horizontal, -45, vertical, and +45, respectively.

Course 10
We can formulate the following nonmaxima suppression

scheme for a 3 3 region centered at every point (x,y) in
(x,y):
1. Find the direction dk that is closest to (x,y).
2. If the value of M(x,y) is less than at least one of its two
neighbors
along
dk,
let
gN(x,y)=0
(suppression);
otherwise, let gN(x,y)=M(x,y), where gN(x,y) is the

nonmaxima-supressed image.

Course 10
The final operation is to threshold gN(x, y) to reduce false

edge points. If we set the threshold to low, there will still be
some false edges (called false positive). If the threshold is
too high, then actual valid edge points will be eliminated
(false negative). Cannys algorithm attempts to improve on
this situation by using hysteresis thresholding, which uses
two thresholds: a low threshold TL and a high threshold TH.
Canny suggested that the ration of the high to low threshold
should be two or three to one.

Course 10
We can visualize the thresholding operation as creating two

additional images
g NH ( x , y ) g N ( x , y ) TH
g NL ( x , y ) g N ( x , y ) TL
g NH g NL 0 initially
After thresholding gNH(x,y) will have fewer nonzero pixels
than gNL(x,y) in general, but all the nonzero pixels in
gNL(x,y) will be contained in gNH(x,y) because the later

Course 10
image is formed with a lower threshold. We eliminate from

gNL(x,y) all the nonzero pixels from gNH(x,y) by letting:
g NL ( x , y ) g NL ( x , y ) g NH ( x , y )
The nonzero pixels in gNL(x,y) and gNH(x,y) may be viewed

as being strong and weak edge pixels.
After the thresholding operation, all strong pixels in
gNH(x,y) are assumed to be valid edge pixels and are so
marked immediately. Depending on the value of TH , the

Course 10
edges in gNH(x,y) typically have gaps. Longer edges are

formed using the following procedure:
(a) Locate the next unvisited edge pixel, p, in gNH(x,y).
(b) Mark as valid edge pixels all the pixels in gNL(x,y) that
are connected to p (using 8-connectivity, for example)
(c) If all nonzero pixels in gNH(x,y) have been visited go to
Step (d). Else return to Step (a).
(d) Set to zero all pixels in gNL(x,y) that were not marked as
valid edge pixels.

Course 10
At the end of this procedure, the final image output by the

Canny algorithm is formed by appending to gNH(x,y) all the
nonzero pixels from gNL(x,y).
In practice, hysteresis thresholding can be implemented
directly during nonmaxima suppression, and thresholding can
be implemented directly on gN(x,y) by forming a list of strong
pixels and the weak pixels connected to them.

Course 10
Canny edge detection algorithm consists of the following

basic steps:
1. Smooth the input image with a Gaussian filter.
2. Compute the gradient magnitude and angle images.
3. Apply
nonmaxima
suppression
to
the
gradient
magnitude image.
4. Use double thresholding and connectivity analysis to
detect and link edges.

Course 10
Edge Linking and Boundary Detection
Ideally, edge detection should yield sets of pixels lying only

on edges. In practice, these pixels seldom characterizes edge
completely because of noise, breaks in the edges due to
nonuniform illumination, and other effects that introduce fake
discontinuities in intensity values. Therefore, edge detection
typically is followed by linking algorithms designed to
assemble edge pixels into meaningful edges and/or region
boundaries. We discuss three fundamental approaches to edge

Course 10
linking that are representative of techniques used in practice.

The first requires knowledge about edge points in a local
region; the second requires that points on the boundary of a
region be known; the third is a global approach that works
with an entire edge image.
Local processing
A simple way to link edge points is to analyze the

characteristics of pixels in small neighborhood about every
point (x,y) that has been declared an edge point. All points

Course 10
that are similar according to predefined criteria are linked,

forming an edge of pixels that share common properties
according to the specified criteria.
The two principal properties used for establishing similarity
of edge pixels in this kind of analysis are:
(1) the strength (magnitude)
(2) the direction
of the gradient vector. Let Sxy denote the set of coordinates of
a neighborhood centered at (x,y) in an image. An edge pixel

Course 10
with coordinates (s,t) in Sxy is similar in magnitude to the

pixel at (x,y) if:
| M ( s , t ) M ( x , y ) | E , E 0 - positive threshold.
An edge pixel with coordinates (s,t) in Sxy has an angle

similar to the pixel at (x,y) if:
| ( s , t ) ( x , y ) | A , A 0 - positive angle threshold.
The direction of the edge at (x,y) is perpendicular to the

direction of the gradient vector at that point.

Course 10
A pixel with coordinates (s, t) in Sxy is linked to the pixel at
(x, y) if both magnitude and direction criteria are satisfied.

This process is repeated at every location in the image. A
record must be kept of linked points as the center of the
neighborhood is moved from pixel to pixel. A simple
procedure would be to assign a different intensity value to
each set of linked edge pixels.
The preceding formulation is computationally expensive
because all neighbors of every point have to be examined. A

Course 10
simplification
particularly
well
suited
for
real
time
applications consists of the following steps:

1. Compute the gradient magnitude and the angle arrays
M(x,y)and (x,y) of the input image f(x,y).

2. Form a binary image g:
1
g( x , y )
0
if M ( x , y ) TM AND ( x , y ) A TA
otherwise
where TM is a threshold, A is a specified angle direction,

and TA defines a band of acceptable directions about A

Course 10
3. Scan the rows of g and fill (set to 1) all gaps (sets of 0s)
in each row that do not exceed a specified length K.
Note that a gap is bounded at both ends by one or more
1s. The rows are processed individually, with no
memory between them.
4. To detect gaps in any other direction , rotate g by this
angle and apply the horizontal scanning procedure in
Step 3. Rotate the result back by .

Course 10
In general, image rotation is an expensive computational

process so, when linking in numerous angle directions is
required, it is more practical to combine Steps 3 and 4 into a
single radial scanning procedure.
Figure 10.27(a) shows an image of the rear of a vehicle. The
objective of this example is to illustrate the use of the
preceding algorithm for finding rectangles whose sizes makes
them suitable candidates for licence plates. The formation of

Course 10
these rectangles can be accomplished by detecting strong

horizontal and vertical edges.

Course 10
TM =30% of the maximum gradient value, A=90, TA = 45,

K=25.
Regional processing
Often the location of the regions of interest in an image is
known or can be determined. This implies that knowledge is
available regarding the regional membership of pixels in the
corresponding edge image. We can use techniques for linking
pixels on a regional basis, with the desired result being an
approximation to the boundary of the region. One approach is

Course 10
to fit a 2-D curve to the known points. Interest lies in

fast-executing techniques that yield an approximation to
essential features of the boundary, such as extreme points and
concavities.
Polygonal
approximations
are
particularly
attractive because they capture the essential shape features of

a region while keeping the representation of the boundary
relatively simple. We present an algorithm suitable for this
purpose.

Course 10
Two important requirements are necessary. First, two starting

points must be specified; second, all the points must be
ordered (e.g. in a clockwise or counter clockwise direction).
An algorithm for finding a polygonal fit to open and closed
curves may be stated as follows:
1. Let P be a sequence of ordered, distinct, 1-valued points
of a binary image. Specify two starting points, A and B.
These are the two starting vertices of the polygon.

Course 10
2. Specify a threshold T, and two empty stacks OPEN and

CLOSED.
3. If the points in P correspond to a closed curve, put A
into OPEN and put B into OPEN and into CLOSED. If
the points correspond to an open curve, put A into
OPEN and put B into CLOSED.
4. Compute the parameters of the line passing from the last
vertex in CLOSED to the last vertex in OPEN.

Course 10
5. Compute the distance from the line in Step 4 to all

points in P whose sequence places them between the
vertices from Step 4. Select the point Vmax with the
maximum distance Dmax (ties are resolved arbitrarily)
6. If Dmax > T, place Vmax at the end of the OPEN stack as a
new vertex. Go to step 4.
7. Else, remove the last vertex from OPEN and insert it as
the last vertex in CLOSED.

Course 10
8. If OPEN is not empty go to Step 4.

9. Else, exit. The vertices in CLOSED are the vertices of
the polygonal fit to the points in P.

Course 10

Course 10
Global processing using the Hough transform

The previous methods assumed available knowledge about
pixels belonging to individual objects. Often, we work with
unstructured environments in which all we have is an edge
image and no knowledge about where objects of interest
might be. In such situations, all pixels are candidates for
linking and thus have to be accepted or eliminated based on
predefined global properties. The technique approach in this
section is based on whether set of pixels lie on curves of a

Course 10
specified shape. Once detected, these curves form the edges

or region boundaries of interest.
Given n points in an image, suppose that we want to find
subsets of these points that lie on straight lines. One possible
solution is to find first of all lines determined by every pair of
points and then find all subsets of points that are close to
n( n 1)
n2
particular lines. This approach involves finding
2
n( n 1)
n 3 comparisons of
lines and then performing n
2

Course 10
every point to all lines. This is a computationally prohibitive

task.
Hough proposed an alternative approach, commonly referred
to as Hough transform. Consider a point (xi , yi) in the
xy-plane and the general equation of a line that passes
through this point:

yi a x i b

Course 10
Infinitely many lines pass through (xi , yi) , but they all satisfy
the equation yi a xi b for varying values of a and b.
However, writing this equation as:
b x i a yi
And considering the ab-plane (also called parameter space)

yields the equation of a single line for a fixed pair (xi , yi) .
Furthermore, a second point (xj , yj) also has a line in the
parameter space associated with it, and, unless they are

Course 10
parallel, this line intersects the line associated with (xi , yi) at
some point (a', b'). In fact, all the points on this line have
lines in parameter space that intersect at (a', b').

Course 10
In principle, the parameter-space lines corresponding to all

points (xk , yk ) in the xy-plane could be plotted, and the
principal lines in that plane could be found by identifying
points in parameter space where large numbers of parameter
space line intersect. A practical difficulty with this approach,
however, is that a tends to infinity as the lines approaches the
vertical direction. To solve this inconvenient we use the
normal representation of a line:
x cos y sin

Course 10

Course 10
The computational attractiveness of the Hough transform

arises from subdividing the parameter space into so called
accumulator cells, as Figure 10.32(c) illustrates, where
min , max
and min , max are the expected ranges of the
parameter values: -90 90 and -D D , where D is

the maximum distance between opposite corners in an image.
The cell at coordinates (i , j ) with accumulator value A(i, j),
corresponds to the square associated with parameter-space
coordinates (i , j ). In initially, these cells are set to zero.

Course 10
Then, for every non-background point (xk , yk ) in the

xy-plane, we let equal each of the allowed subdivision
values on the -axis and solve for the corresponding using

the equation xk cos yk sin . The resulting values are
then rounded off to the nearest allowed cell value along the
axis. If a choice of p results in solution q, then we let
A(p,q)=A(p,q)+1.
At the end of this procedure, a value of P in A(i,j) means that

P points in the xy-plane lie on the line x cos j y sin j i .

Course 10
The number of subdivisions in the -plane determines the

accuracy of the colinearity of these points. It can be shown
that the number of computations in the above described
method is linear with respect to n, the number of
non-background points in the xy-plane.
An approach based on the Hough transform for edge-linking
is as follows:
1. Obtain a binary edge image.
2. Specify subdivisions in the -plane.

Course 10
3. Examine the counts of the accumulator cells for high

pixel concentration
4. Examine the relationship (principally for continuity)
between pixels in a chosen cell.
Continuity in this case usually is based on computing the
distance between disconnected pixels corresponding to a
given accumulator cell. A gap in a line associated with a
given cell is bridged if the length of the gap is less than a
specified threshold.

Course 11

Course 11
Image Segmentation Thresholding

We discuss techniques for partitioning images directly
into regions based on intensity values and/or properties
of these values.
Suppose that the intensity histogram in Figure 10.35(a)
corresponds to an image, f(x,y), composed of light
objects and a dark background, in such a way that object
and background pixels have intensity values grouped into
two dominant modes. One way to extract the objects

Course 11
from the background is to select a threshold T that

separates these modes.
Any point (x,y) in the image for which f(x,y) > T is called
an object point; otherwise, the point is called a

Course 11
background point. The segmented image, g(x,y), is given

by:
1
g( x , y )
0
if f ( x , y ) T
if f ( x , y ) T
where T is a constant applicable over an entire image, the

process given in this equation is referred to as global
thresholding. When the value of T changes over an
image, we use the term variable thresholding. The term
local or regional thresholding is used sometimes to

Course 11
denote variable thresholding in which the value of T at

any point (x, y) in an image depends on properties of a
neighborhood of (x, y) (for example, the average
intensity of the pixels in the neighborhood). If T depends
on the spatial coordinates (x, y) themselves, then variable
thresholding is often referred to as dynamic or adaptive
thresholding.

Course 11
If in an image we have, for example, two types of light

objects on a dark background, multiple thresholding is
used. The segmented image is given by:
a
g( x , y ) b
c
if f ( x , y ) T2
if T1 f ( x , y ) T2
if f ( x , y ) T1
where a, b, and c are any three distinct intensity values.

Segmentation problems requiring more than two

Course 11
thresholds are difficult to solve, and better results usually

are obtained using other methods.
The success of intensity thresholding is directly related
to the width and depth of the valley(s) separating the
histogram modes. The key factors affecting the properties
of the valley(s) are:
(1) the separation between peaks (the further apart the
peaks are, the better the chances of separating the
modes)

Course 11
(2) the noise content in the image (the modes broaden

as noise increases)
(3) the relative size of objects and background
(4) the uniformity of the illumination source
(5) the uniformity of the reflectance properties of the
image

Course 11
The role of noise in image thresholding

Consider Figure 10.36(a) the image is free of noise and
its histogram has two spike modes. Figure 10.36(b)
shows the original image corrupted by Gaussian noise of
zero mean and a standard deviation of 10 intensity levels.
Although the corresponding histogram modes are now
broader (Figure 10.36(e)), their separation is large
enough so that the depth of the valley between them is
sufficient to make the modes easy to separate.

Course 11

Course 11
Figure 10.36(c) shows the result of corrupting the image

with Gaussian noise of zero mean and a standard
deviation of 50 intensity levels. As the histogram in
Figure 10.36(f) shows, the situation is much more
difficult as there is now way to differentiate between two
modes.
The role of illumination and reflectance
Figure 10.37 illustrates the effect that illumination can
have on the histogram of an image. Figure 10.37(a) is the

Course 11
noisy image form Figure 10.36(b) and Figure 10.37(d)

shows its histogram.

Course 11
We can illustrate the effects of nonuniform illumination

by multiplying the image in Figure 10.37(a) by a variable
intensity function, such the intensity ramp in Figure
10.37(b), whose histogram is shown in Figure 10.37(e).
Figure 10.37(c) shows the product of the image and this
shading pattern. Figure 10.37(f) shows, the deep valley
between peaks was corrupted to the point where
separation of the modes without additional processing is
no longer possible.

Course 11
Illumination and reflectance play a central role in the

success of image segmentation using thresholding or
other segmentation techniques. Therefore, controlling
these factors when it is possible to do so should be the
first step considered in the solution of segmentation
problem. There are three basic approaches to the problem
when control over these factors is not possible. One is to
correct the shading pattern directly. For example,
nonuniform (but fixed) illumination can be corrected by

Course 11
multiplying the image by the inverse pattern, which can

be obtained by imaging a flat surface of constant
intensity. The second approach is to attempt to correct
the global shading pattern via processing it. The third
approach is to work around nonuniformities using
variable thresholding.

Course 11
Basic Global Thresholding

When the intensity distributions of objects and
background pixels are sufficiently distinct, it is possible
to use a single (global) threshold applicable over the
entire image. An algorithm capable of estimating
automatically the threshold value for each image is
required. The following iterative algorithm can be used
for this purpose:

Course 11
1. Select an initial estimate for the global threshold, T.

2. Segment the image using T. This will produce two
groups of pixels: G1 consisting of all pixels with
intensity values > T, and G2 consisting of pixels
with values T.
3. Compute the average (mean) intensity values m1 and
m2 for the pixels in G1 and G2.
4. Compute a new threshold value:
1
T ( m1 m2 )
2

Course 11
5. Repeat Steps 2 through 4 until the difference

between values of T in successive iteration is
smaller than a predefined parameter T
This simple algorithm works well in situations where
there is a reasonably clear valley between the modes of
the histogram related to objects and background.
Parameter T is used to control the number of iterations
in situations when speed is an important issue. The initial
threshold must be chosen greater than the minimum and

Course 11
less than maximum intensity level in the image. The

average intensity of the image is a good initial choice for
T.

Course 11
Optimum Global Thresholding Using Otsus Method

Thresholding may be viewed as a statistical-decision
theory problem whose objective is to minimize the
average error that appears in assigning pixels to two or
more groups (also called classes). The solution (Bayes
decision rule) is based on only two parameters: the
probability density function (PDF) of the intensity levels
of each class and the probability that each class occurs in
a given application. Estimating PDFs is not a trivial task,

Course 11
so the problem usually is simplified by making workable

assumption about the form of the PDFs, such as
assuming that they are Gaussian function.
Otsus method offers an alternative solution. The
method is optimum in the sense that it maximizes the
between-class variance. The basic idea is that the
well-thresholded classes should be distinct with respect
to the intensity values of their pixels and, conversely, that
a threshold giving the best separation between classes in

Course 11
terms of their intensity values would be the best

(optimum) threshold. Otsus method has the important
property that it is based entirely on computations
performed on the histogram of an image.
Let {0, 1, 2, , L-1} denote the L distinct intensity
levels in a digital image of size MN pixels, and let ni
denote the number of pixels with intensity i.
MN ( total number of pixels) n0 n1 n2 nL1

Course 11
The normalized histogram has components

ni
pi
MN
L 1
with
p
i 0
1 , pi 0.
Suppose that we select a threshold T(k)=k , 0 < k < L-1

and use it to threshold the image into two classes C1 and C2
where C1 consists of all pixels in the image with intensity
values in the range [0,k] and C2 consists of all pixels in the
image with intensity values in the range [k+1,L-1] . Using

Course 11
this threshold the probability P1(k) that a pixel is assigned to

class C1 is given by the cumulative sum:
k
P1 ( k ) pi
i 0
This is the probability of class C1 occurring. Similarly, the

probability of class C2 occurring is:
P2 ( k )
L 1
i k 1
pi .
The mean intensity values of the pixels assigned to class C1 is

Course 11
k
P (i )
1 k
m1 ( k ) iP ( i / C1 ) iP (C1 / i )
ipi
P (C1 ) P1 ( k ) i 0
i 0
i 0
P(i/C1) is the probability of value i , given that i comes from
class C1. We have used the Bayes formula:
P ( A)
.
P ( A / B ) P ( B / A)
P( B)
P(C1/i)=1 the probability of C1 given i (i belongs to C1).
Similarly, the mean intensity value of the pixels assigned to
class C2 is:

Course 11
L 1
1 L 1
m2 ( k ) i P ( i / C 2 )
i pi .
P2 ( k ) i k 1
i k 1
The cumulative mean (average intensity) up to level k is
given by:
k
m ( k ) ipi
i 0
and the average intensity of the entire image (the global

mean) is given by:
L 1
mG ipi
i 0

Course 11
We have:
P1 m1 P2 m2 mG ,
P1 P2 1 .
In order to evaluate the goodness of the threshold at level k

we use the normalized, dimensionless metric:
B2
2
G
where G2 is the global variance:
L 1
G2 ( i mG )2 pi
i 0

Course 11
and B2 is the between-class variance, defined as:
B2 P1 ( m1 mG )2 P2 ( m2 mG )2 P1 P2 ( m1 m2 )2
mG P1 m
P1 (1 P1 )
From the above formula, we see that the farther the two
means m1 and m2 are from each other the larger B2 will be,
indicating that the between-class variance is a measure of
separability between classes. Because G2 is a constant, it

Course 11
follows that also is a measure of separability, and

maximizing this metric is equivalent to maximizing B2 .
The objective then is to determine the threshold value k that
maximizes the between-class variance.
We have:
B2 ( k )
(k )
G2

Course 11
mG P1 ( k ) m ( k )
(k )
P1 ( k ) 1 P1 ( k )
2
B
The optimum threshold is the value k* that maximizes B2 ( k )
B2 ( k ) max{ B2 ( k ) ; 0 k L 1 , k integer }.
If the maximum exists for more than one value of k, it is
customary to average the various values of k for which B2 ( k )
is maximum.
Once k* has been obtained, the input image is segmented as:

Course 11
1
g( x , y )
0
if f ( x , y ) k
if f ( x , y ) k
The metric ( k ) can be used to obtain quantitative

estimate of the separability of classes.
0 (k ) 1
The lower bound is attainable only by images with a

single, constant intensity level; the upper bound is

Course 11
attainable only by 2-valued images with intensities equal

to 0 and L-1.
Otsus algorithm may be summarized as follows:
1. Compute the normalized histogram of the input
image, pi , i=0,1,2,,L-1
2. Compute the cumulative sums, P1(k),k=0,1,2,,L-1
3. Compute the cumulative means, m(k),k=0,1,,L-1
4. Compute the global intensity mean, mG
5. Compute the between-class variance,

Course 11
B2 ( k ) , k=0, 1,, L-1

6. Obtain the Otsu threshold, k*, as the value of k for which
B2 ( k ) is maximum. If the maximum is not unique,
obtain k* by averaging the values of k corresponding to

the various maxima detected
7. Obtain the separability measure, ( k )

Course 11

Course 11
Noise can turn a simple thresholding problem into an

unsolvable one. When noise cannot be reduced at the source,
and thresholding is the segmentation method used, a
technique that often enhances performances is to smooth the
image before thresholding it.

Course 11

Course 11
Multiple Thresholds
The idea of the thresholding method used by Otsus method
can be extended to an arbitrary number of thresholds, because
the separability measure on which it is based also extends to
an arbitrary number of classes. In the case of K classes, C1,
C2,, CK , the between-classes variance generalizes to the

expression:
K
B2 Pk ( mk mG )2
k 1

Course 11
Pk
iC k
1
mk
Pk
i p
iC k
mG is the global mean of the image. The K classes are
separated by K-1 thresholds whose values k1 , k2 ,..., k K 1 are

the values that maximizes B2
B2 ( k1 , k2 ,..., k K 1 )
max
{ B2 ( k1 , k2 ,..., k K 1 )}
0 k1 k2 k K 1 L 1
k1 , k2 ,k K 1 integers

Course 11
In practice, using multiple global thresholding is considered a

viable approach when there is reason to believe that the
problem can be solved effectively with two thresholds.
Applications that require more than two thresholds generally
are solved using more than just intensity values.
For three classes consisting of three intensity intervals
(which are separated by two thresholds) the between-class
variance is given by:
B2 P1 ( m1 mG )2 P2 ( m2 mG )2 P3 ( m3 mG )2

Course 11
k1
P1 pi
i 0
1 k1
m1
i pi
P1 i 0
P2
k2
i k1 1
1
, m2
P2
pi
k2
i k1 1
i pi
P1 m1 P2 m2 P3 m3 mG ,
P3
L 1
i k2 1
pi
1
m3
P3
L 1
i k2 1
i pi
P1 P2 P3 1.
The two optimum threshold values k1 and k2 are the values

that maximizes B2 ( k1 , k2 ).
B2 ( k1 , k2 )
max
0 k1 k2 L 1
B2 ( k1 , k2 ) .

Course 11
The thresholded image is given by:
g( x , y ) b
if f ( x , y ) k1
if k1 f ( x , y ) k2
if f ( x , y ) k2
where a, b, and c are any three distinct valid intensity

values. The separability measure extended to multiple
thresholds is given by:
2
(
k
,
k
B
1
2)
( k1 , k2 )
.
2
G

Course 11

Course 11
Variable Thresholding
Image partitioning
One of the simplest approaches to variable thresholding is to
subdivide an image into nonoverlapping rectangles. This
approach is used to compensate for non-uniformities in
illumination and/or reflectance. The rectangles are chosen
small enough so that the illumination of each is
approximately uniform.

Course 11

Course 11
Image subdivision generally works well when the objects of

interest and the background occupy regions of reasonably
comparable size. When this is not the case, the methods fails
because of the likelihood of subdivisions containing only
object are background pixels.
Variable thresholding based on local image properties

A more general approach than the image subdivision method
is to compute a threshold at every point (x,y) in the image

Course 11
based on one or more specified properties computed in a

neighborhood of (x, y).
We illustrate the basic approach to local thresholding by
using the standard deviation and mean of the pixels in a
neighborhood of every point in an image. Let xy and mxy
denote the standard deviation and mean value of a set of
pixels contained in a neighborhood Sxy centered at
coordinates (x, y) in an image.
The following are common forms of variable, local thresholds

Course 11
Txy a xy bm xy , a , b 0
Txy a xy bmG , mG - global image mean .
The segmented image is computed as:
1
g( x , y )
0
if f ( x , y ) Txy
if f ( x , y ) Txy
Significant improvement can be obtained in local

thresholding by using predicates based on the parameters
computed in the neighborhood of (x, y):

Course 11
1
g( x , y )
0
if Q (local parameters) is true

if Q (local parameters) is false
Where Q is a predicate based on parameters computed

using the pixels in neighborhood Sxy.
true if f ( x , y ) a xy AND f ( x , y ) bm xy
Q ( xy , m xy )
false otherwise

Course 11
Using moving averages

A special case of the local thresholding method just
discussed is based on computing a moving average along
scan lines of an image. This implementation is useful in
document processing, where speed is a fundamental
requirement. The scanning is typically carried out line by
line in a zigzag pattern to reduce illumination bias. Let
zk+1 denote the intensity of the point encountered in the

Course 11
scanning sequence at step k+1. The moving average

(mean intensity) at this new point is given by:
1 k 1
m ( k 1)
zi
n i k 2 n
1
m ( k ) ( zk 1 zk n )
n
z1
, m (1)
n
where n denotes the number of points used in computing

the average. The algorithm is initialized only once, not at
every row. Segmentation is implemented using the

Course 11
variable threshold Txy bm xy where b is a constant and

mxy is the moving average computed as above.

Course 11
Multivariable Thresholding
In some cases, a sensor can make available more than
one variable to characterize each pixel in an image, and
thus allow multivariable thresholding. A notable example
is color imaging where red (R), green (G), and blue (B)
components are used to form a composite color image. In
this case, each pixel is characterized by three values,
and can be represented as a 3-D vector z = (z1 , z2 , z3)T
whose components are the RGB colors at a point.

Course 11
These 3D points often are referred to as voxels, to denote

volumetric elements, as opposed to image elements.
Multivariable thresholding may be viewed as a distance
computation. Suppose that we want to extract from a
color image all regions having a specified color range,
for example, reddish hues. Let a denote the average
reddish color in which we are interested. One way to
segment a color image based on this parameter is to
compute a distance measure, D(z,a) between an arbitrary

Course 11
color point z and the average color a. Then we segment

the input image:
1
g
0
if D( z , a ) T
otherwise
, T is a threshold
D(z,a) Euclidian distance

D( z , a ) ( z a )T ( z a )
1
2
Mahalanobis distance
D( z , a ) ( z a )T C 1 ( z a )
1
2

Course 11
where C is the covariance matrix of the zs.

Region-Based Segmentation
Region growing
Region growing is a procedure that groups pixels or
subregions into larger regions based on predefined criteria for

growth. The basic approach is to start with a set of seed
points and form these grow regions by appending to each
seed those neighboring pixels that have predefined properties

Course 11
similar to the seed (such as specific ranges of intensity

colors).
Selecting a set of one or more starting points often can be
based on the nature of the problem. When a priori information
is not available, the procedure is to compute at every pixel the
same set of properties that ultimately will be used to assign
pixels to regions during the growing process. If the result of
these computations shows clusters of values, the pixels whose

Course 11
properties place them near the centroid of these clusters can

be used as seeds.
The selection of similarity criteria depends not only on the
problem under consideration, but also on the type of image
data available.
Another problem in region growing is the formulation of a
stopping rule. Region growth should stop when no more
pixels satisfy the criteria for inclusion in that region. Criteria
such as intensity values, texture, and color are local in nature

Course 11
and do not take into account the history of region growth.

Additional criteria that increase the power of a regiongrowing algorithm utilize the concept of size, likeness
between a candidate pixel and the pixels grown so far, and the
shape of the region being grown.
Let f(x,y) denote an input image array, S(x,y) denote a seed
array containing 1s at the locations of seed points and 0s
elsewhere, Q denote a predicate to be applied at each pixel
location (x, y). Arrays f and S are assumed to be of the same

Course 11
size.
basic
region-growing
algorithm
based
on
8-connectivity may be stated as follows:

1. Find all connected components in S(x,y) and erode each
connected component to one pixel; label all such pixels
found as 1. All other pixels in S are labeled 0.
2. Form an image fQ such that, at a pair of coordinates
(x,y), let fQ (x,y)=1 if the input image satisfies the given

predicate, Q,
fQ(x,y)=0.
at those coordinates, otherwise, let

Course 11
3. Let g be an image formed by appending to each seed

point in S all the 1-valued points in fQ that are
8-connected to that seed point.

4. Label each connected component in g with a different
region label. This is the segmented region obtained by
region growing.
TRUE
Q
FALSE
if the absoulte difference of the intensities

between the seed and the pixel at ( x , y ) is T
otherwise

Course 11

Course 11
Region Splitting and Merging

The method used in this case is to subdivide an image initially
into a set of arbitrary, disjoint regions and then merge and/or
split the regions in an attempt to satisfy the condition of
segmentation.
Let R represent the entire image region and select a
predicate Q. One approach for segmenting R is to subdivide it
successively into smaller and smaller quadrant regions so
that, for any region Ri , Q(Ri)=TRUE. We start with the entire

Course 11
region. If Q(R)=FALSE, we divide the image into quadrants.

If Q is False for any quadrant, we subdivide the quadrant into
subquadrant, and so on. This particular splitting technique has
a convenient representation in the form of so-called
quadtrees, that are trees in which each node has exactly four
descendants. The images corresponding to the nodes of a

quadtree sometimes are called quadregions or quadimages.
Note that the root of the tree corresponds to the entire image

Course 11
and that each node corresponds to the subdivision of a node

into four descendant nodes.
If only splitting is used, the final partition normally contains

adjacent region with identical properties. Satisfying the
constraints of segmentation requires merging only adjacent
regions whose combined pixels satisfy the predicate Q. That

Course 11
is, two adjacent regions Rj and Rk are merged only if
Q(RjRk)=TRUE.
The procedure described above can be summarized as follows
1. Split into four quadrants any region Ri for which
Q(Ri)=TRUE
2. When no further splitting is possible, merge any
adjacent regions Rj and Rk for which Q(RjRk)=TRUE
3. Stop when no further merging is possible.

Course 11
It is customary to specify a minimum quadregions size

beyond which no further splitting is carried out.
Numerous variations of the preceding basic theme are
possible. For example, a significant simplification results if in
Step 2 we allow merging for any two adjacent regions if each
one satisfies the predicate individually. This results in a much
simpler (and faster) algorithm, because testing the predicate is
limited to individual quadregions.

Course 11

Course 11
TRUE
Q
FALSE
if a AND 0 m b
otherwise
Where m and are the mean and the standard deviation of

the pixels in a quadregion, and a and b are constants.

Course 12

Course 12
Representation and Description

After segmentation, the resulting sets of pixels are
represented in a form suitable for further processing.
(1) Represent an image using the boundary (external
characteristics)
(2) Represent an image using the internal characteristics
(the pixels inside the region)
The next task is to describe the region based on the chosen
representation.

Course 12
External representation is chosen when the primary focus is

on shape characteristics, internal representation is used when
the focus is on regional properties, such as color and texture.
Representation
-
boundary following
chain codes
polygonal approximations
signatures
skeletons

Course 12
Boundary Following
We assume that the points in the boundary of a region are
ordered in a clockwise (or counterclockwise) direction. We
also assume that:
1. we are working with binary images in which objects are
labeled 1 and background 0;
2. the images are padded with a border of 0s to eliminate
the possibility of an object merging with the image
border.

Course 12
Given a binary region R or its boundary, an algorithm of

following the border of R consists of:
1. The starting point is b0 the uppermost, leftmost point in
the image that is labeled 1. Let c0 be the west neighbor of
b0, which is always a background point. Examine the 8neighbors of b0, starting at c0 and proceeding in a
clockwise direction. Let b1 denote the first neighbor
whose value is 1, and let c1 be the background point
immediately preceding b1. Store the location of b0 and b1.

Course 12
2. Let b= b1 and c= c1.

3. Let n1, n2,, n8 be the 8-neighbors of b starting at c in a
clockwise direction. Find the first nk labeled 1.
4. Let b= nk and c= nk-1.
5. Repeat Steps 3 and 4 until b=b0 and the next boundary
point found is b1. The sequence of b points found when
the algorithm stops constitutes the set of ordered boundary
points.

Course 12
This algorithm is referred to as the Moore boundary tracking

algorithm.

Course 12
Chain Codes
Chain codes are used to represent a boundary by a connected
sequence of straight line segments of specified length and
direction.
The direction of each segment is coded by using a numbering
scheme such as the ones shown below. A boundary code
formed a sequence of such directional numbers is referred to
as a Freeman chain code.

Course 12
This method generally is unacceptable to apply for the chain

codes to pixels:
(a) The resulting chain of codes usually is quite long;
(b) Sensitive to noise: any small disturbances along the
boundary owing to noise or imperfect segmentation
cause changes in the code that may not necessarily be
related to the shape of the boundary.
A frequently used method to solve the problem is to resample
the boundary by selecting a larger grid spacing. A boundary
point is assigned to each node of the large grid, depending on

Course 12
the proximity of the original boundary to that node. The

accuracy of the resulting code representation depends on the
spacing of the sampling grid.
The chain code of a boundary depends on the starting point.

The problem is solved by normalization.

Course 12
Normalization for starting point:

Treat the code as a circular sequence and redefine the starting
point s that the resulting sequence of numbers forms an
integer of minimum magnitude.
Normalization for rotation:
Use the first difference of the chain code instead of the code
itself. The difference is simply by counting counterclockwise
the number of directions that separate two adjacent elements
of the code.
Example: The first difference of the 4-direction chain code
10103322 is
33133030.

Course 12
Polygonal Approximations
The
objective is to capture the essence of the boundary shape

with the fewest possible polygonal segments.
This problem in general is not trivial and can quickly turn into
a time-consuming iterative search.
Minimum-Perimeter Polygons
The approach for generating a MPP is to enclose the
boundary by a set of concatenated cells. The boundary can be

Course 12
viewed as a rubber band constrained by the inner and outer

walls of the region defined by the cells.
The size of the cells determines the accuracy of the polygonal

approximation. The objective is to use the largest possible

Course 12
cell size acceptable in a given application, thus producing

MPPs with the fewest number of vertices.

Course 12
The boundary in the above figure consists of 4-connected

straight line segments. Suppose we traverse this boundary in a
counterclockwise direction. Every turn encountered in the
traversal will be either a convex or a concave vertex. with the
angle of a vertex an interior of the 4-connectcd boundary.
Convex and concave vertices are shown respectively as
whitte and black dots in the above figure. Note that these
vertices are the vertices of the inner wall of the light-gray
bounding region in Fig. 11.7(b ), and that every concave

Course 12
(black) vertex in the dark gray region has a corresponding

"mirror" vertex in the light gray wall, located diagonally
opposite the vertex. Figure 11.7(c) shows the mirrors of all
the concave vertices, with the MPP from Fig. 11.6(c)
superimposed for reference. We see that the vertices of the
MPP coincide either with convex vertices in the inner wall
(white dots) or with the mirrors of the concave vertices (black
dots) in the outer wall.

Course 12
MPP algorithm
The set of cells enclosing a digital boundary is called a
cellular complex. We assume that the boundaries under
consideration are not self intersecting, which leads to simply
connected cellular complexes. Based on these assumptions,
and letting white (W) and black (B) denote convex and
mirrored concave vertices, respectively, we state the
following observations:

Course 12
1. The MPP bounded by a simply connected cellular

complex is not selfintersecting.
2. Every convex vertex of the MPP is a W vertex, but not
every W vertex of a bounda ry is a vertex of the MPP.
3. Every mirrored concave vertex of the MPP is a B
vertex, but not every B vertex of a boundary is a vertex
of the MPP.
4. All B vertices are on or outside the MPP, and all W
vertices are on or inside the MPP.

Course 12
5. The uppermost, leftmost vertex in a sequence of vertices

contained in a cellular complex is always a W vertex of
the MPP.
Let a=(x1,y1), b=(x2,y2), c=(x3,y3) and
x1
A x2
x3
y1 1
y2 1
y3 1

Course 12
det A = 0
< 0
if (a , b, c )is a counterclockwise sequence

if the points are colinear
if (a , b, c )is a clockwise seq
Denote sgn(a , b, c ) det( A) . Geometrically sgn(a,b,c) < 0

indicates that the point c lies on the positive side of pair (a,b),
i.e., c lies on the positive side of the line passing through
points a and b.
Suppose we have a list with the coordinates of each vertex
and the additional information whether the vertex is W or B.

Course 12
It is important that the concave vertices be mirrored, that the

vertices be in sequential order, and that the first vertex be the
uppermost leftmost vertex, which we know is a W vertex of
the MPP. Let V0 denote this vertex. We assume that the
vertices are arranged in the counterclockwise direction. The
algorithm for finding MPPs uses two "crawler" points: a
white crawler (W0) and a black (B0) crawler. W0 crawls along
convex (W) vertices, and B0 crawls along mirrored concave
(B) vertices.

Course 12
The algorithm starts by setting W0 = B0 = V0 . Then, at any

step in the algorithm, let VL denote the last MPP vertex
found, and let Vk denote the current vertex being examined.
One of three conditions can exist between VL, Vk and the two
crawler points:
1. Vk lies to the positive side of the line through the pair of
points (VL, W0); that is sgn(VL, W0 , Vk ) > 0.
2. Vk lies to the negative side of the line through the pair
(VL, W0); that is sgn(VL, W0 , Vk ) 0. At the same time Vk

Course 12
lies to the positive side of the line through (VL, B0) or is

collinear with them; that is sgn(VL, B0 , Vk ) 0.
3. Vk lies to the negative side of the line through (VL, B0);
that is sgn(VL, B0 , Vk )< 0.
If condition 1. holds the next MPP vertex is W0, VL=W0 and
we set W0= B0= VL and continue with the next vertex after VL.
If condition 2. holds, Vk becomes a candidate MPP vertex.
We set W0= Vk if Vk is convex (i.e. labeled W) otherwise
B0= Vk and continue with the next vertex after in the list.

Course 12
If condition 3. holds the next MPP vertex is B0, VL=B0 and

we set W0= B0= VL and continue with the next vertex after VL.
The algorithm terminates when it reaches the first vertex
again. The VL vertices found by the algorithm are the vertices
of the MPP.
Merging technique
The idea is to merge points along a boundary until the least
square error line fit of the points merged so far exceeds a

Course 12
preset threshold. When this condition occurs, the parameters

of the line are stored, the error is set to 0, and the procedure is
repeated, merging new points along the boundary until the
error again exceds the threshold. One of the main problem
with this technique is that vertices do not corespond with
corners in the boundary.

Course 12
Splitting technique
One approach to boundary segment splitting is to subdivide a
segment successively into two parts until a specified criterion
is satisfied. For instance, a requirement might be that the
maximum perpendicular distance from a boundary segment to
the line joining its two end points not exceed a preset
threshold. If it does, the point having the greatest distance
from the line becomes a vertex, thus subdividing the initial
segment into two subsegments.

Course 12
This approach has the advantage of seeking prominent

inflection points. For a closed boundary, the best starting
points usually are the two farthest points in the boundary.

Course 12
Signatures
A signature is a 1-D functional representation of a boundary
and may be generated in various ways. One of the simplest is
to plot the distance from the centroid of the region to the
boundary as a function of angle.
The basic idea is to reduce the boundary representation to a
1-D function that presumably is easier to describe than the
original 2-D boundary. Signatures generated by the approach
just described are invariant to translation, but they do depend

Course 12
on rotation and scaling. Normalization with respect to rotation

can be achieved by finding a way to select the same starting
point to generate the signature, regardless of the shape's
orientation. One way to do so is to select the starting point as
the point farthest from the centroid, assuming that this point is
unique for each shape of interest.

Course 12

Course 12
Skeletons
The approach is representing the structural shape of a plane
image using graph theory. We first obtain the skeleton of the
image via a thinning (skeletonizing) algorithm.
The skeleton of a region may be defined via the medial axis
transformation (MAT) proposed by Blum. Let R be a region
with border B. The MAT of a region is computed as follows:
for each point p in R, we find its closest neighbor in B. If p
has more than one such neighbor then it belongs to the medial

Course 12
axis (skeleton) of R. The concept of "closest" (and the

resulting MAT) depend on the definition of a distance
The MAT of a region has an intuitive definition based on the
so-called "prairie fire concept." Consider an image region as a
prairie of uniform, dry grass, and suppose that a fire is lit
along its border. All fire fronts will advance into the region at
the same speed. The MAT of the region is the set of points
reached by more than one fire front at the same time.

Course 12
Direct implementation of this definition is expensive

computationally.
Implementation
potentially
involves
calculating the distance from every interior point to every

point on the boundary of a region. Thinning algorithms for
MAT computation, iteratively delete boundary points of a

Course 12
region subject to the constraints that deletion of these points

(1) does not remove end points, (2) does not break
connectivity, and (3) does not cause excessive erosion of the
region.
In the following we present an algorithm for thinning binary
regions. Region points are assumed to have value 1 and
background points to have value 0. The method consists of
successive passes of two basic steps applied to the border
points of the given region. A border point is any pixel with

Course 12
value 1 and having at least one neighbor valued 0. We

consider the 8-neighborhood pixels indexed as in the figure
below:

Course 12
Step 1
A contour point p1 is flaged for deletion if the following
conditions are satisfied:
a)
2 N(p1) 6
b)
T(p1)=1
c)
p2 p4 p6 = 0
d)
p4 p6 p8 = 0
N(p1)= p2+p3 ++p8+p9 (pi{0,1})

Course 12
T(p1)=the number of 0-1 transitions in the ordered sequence

p2, p3, ,p8, p9, p2
Step 2
Conditions a) and b) remain the same and we add
c) p2 p4 p8 = 0
d) p2 p6 p8 = 0
Step 1 is applied to every border pixel of the region. If one or
more of conditions a) - d) are violated, the value of the point

Course 12
in question is not changed. If all conditions are fulfilled the

point is flagged for deletion. However, the point is not deleted
until all border points have been processed. After Step 1 has
been applied to all border points, those that were flagged are
deleted (changed to 0). Then Step 2 is applied to the resulting
data in exactly the same manner as Step 1. Thus, one iteration
of the thinning algorithm consists of (1) applying Step 1 to
flag border points for deletion; (2) deleting the flagged points;
(3) applying Step 2 to flag the remaining border points for

Course 12
deletion; and (4) deleting the flagged points. This basic

procedure is applied iteratively until no further points are
deleted, at which time the algorithm terminates, yielding the
skeleton of the region.
Conditions c) and d) are satisfied simultaneously if:
(p4 = 0 or p6 = 0) or (p2 = 0 and p8 =0).
A point that satisfies all the conditions required for Step 1 is
an east or south boundary point or a northwest corner point in
the boundary. In either case, p1 is not part of the skeleton and

Course 12
should be removed. Similarly, conditions c') and d') are

satisfied simultaneously if:
(p2 = 0 or p8 = 0) or (p4 = 0 and p6 =0).
These correspond to north or west points, or a southeast
corner point.

Course 12
Boundary Descriptors
The length of a boundary is one of its simplest descriptors.
The number of pixels along a boundary gives a rough
approximation of its length.
The diameter of a boundary B is defined as:
Diam( B ) max{ D( pi , p j ); pi , p j B }
where D is a distance measure. The value of the diameter and

the orientation of a line segment connecting the two extreme
points that comprise the diameter (this line is called the major

Course 12
axis of the boundary) are useful descriptors of a boundary.

The minor axis of a boundary is defined as the line
perpendicular to the major axis, and of such length that a box
passing through the outer four points of intersection of the
boundary with the two axes
completely encloses the
boundary. The box just described is called the basic

rectangle, and the ratio of the major axis to the minor axis is
called the eccentricity of the boundary. This also is a useful
descriptor.Curvature is defined as the rate of change of slope.

Course 12
Shape numbers
Assume that the boundary is described by the first difference
of a the associated chain-coded. The shape number of such a
boundary, based on the 4-directional code, is defined as the
first difference of smallest magnitude. The order n of a shape
number is defined as the number of digits in its
representation. Moreover, n is even for a closed boundary,
and its value limits the number of possible different shapes.

Course 12

Course 12
Although the first difference of a chain code is independent of

rotation, in general the coded boundary depends on the
orientation of the grid. One way to normalize the grid
orientation is by aligning the chain-code grid with the sides of
the basic rectangle.
In practice, for a desired shape order, we find the rectangle of
order n whose eccentricity best approximates that of the basic
rectangle of the region and use this new rectangle to establish
the grid size.

Course 12

Course 12
Fourier descriptors
Assume we have a K-point digital boundary in the xy-plane:
( x0 , y0 ),( x1 , y1 ),...,( x K 1 , y K 1 ).
are
the
points
of
the
boundary encountered in traversing the boundary, say, in the

counterclockwise direction. In the complex plane we have:
s( k ) x ( k ) i y( k ) , k 0,1,..., K 1
We compute the discrete Fourier transform (DFT) of s(k) is
K 1
a ( u) s( k )e i 2 uk / K , u 0,1,..., K 1
k 0

Course 12
The complex coefficients a(u) are called the Fourier

descriptors of the boundary.
The inverse Fourier transform of these coefficients restores
the s(k):
1
s( k )
K
K 1
i 2 uk / K
(
)
, k 0,1,..., K 1
a
u
e
u 0
Suppose, however, that instead of all the Fourier coefficients,

only the first P coefficients are used. This is equivalent to

Course 12
setting a(u)=0 for u>P-1. The result is the following

approximation to s(k):
1 P 1
s ( k ) a ( u)e i 2 uk / P , k 0,1,..., K 1.
P u 0
Although P terms are used to obtain each component of s ( k )
k still ranges from 0 to K-1. That is, the number of points

exists in the approximate boundary, but not many terms the
reconstruction of each point. The smaller P becomes, the
more detail that is lost on the example demonstrates.

Course 12

Course 12
Statistical moments
The shape of boundary segments (and of signature
waveforms) can be described quantitatively by using
statistical moments, such as the mean, variance, and higher
order moments.

Course 12
We represent the segment of a boundary by a 1-D function
g(r). This function is obtained by connecting the two end

points of the segment and rotating the line segment until it is
horizontal. The coordinates of the points are rotated by the
same angle.
Let us treat the amplitude of g as a discrete random variable v
and form an amplitude histogram p(vi), i = 0, 1, 2, ... , A - 1,

Course 12
where A is the number of discrete amplitude increments in

which we divide the amplitude scale.
The nth moment of v about its mean is:
A1
A1
i 0
i 0
n ( v ) ( v i m ) n p ( v i ) , m v i p( v i ) .
The quantity m is recognized as the mean or average value of
v and 2 as its variance. Generally, only the first few

moments are required to differentiate between signatures of
clearly distinct shapes.

Course 12
Regional descriptors
The area of a region is defined as the number of pixels in the
region. The perimeter of a region is the length of its
boundary. These two descriptors apply primarily to situations
in which the size of the regions of interest is invariant. A
more frequent use of these two descriptors is in measuring
compactness of a region:
(perimeter)2 P 2
compactness =
A
area

Course 12
Another descriptor of compactness is the circularity ratio:b

area of the region
circularity ratio
area of the circle having the same perimeter
The area of a circle with perimeter length P is P2/4 .
4 A
Rc 2 .
P
The value of this measure is 1 for a circular region and /4 for
a square. Compactness is a dimensionless measure and thus is
insensitive to uniform scale changes; it is insensitive also to

Course 12
orientation, ignoring computational errors that may be

introduced in resizing and rotating a digital region.
Other simple measures used as region descriptors include the
mean and median of the intensity levels, the minimum and
maximum intensity values, and the number of pixels with
values above and below the mean.

Course 12
Topological Descriptors
Topology is the study of properties of a figure that are
unaffected by any deformation, as long as there is no tearing
or joining of the figure (sometimes these are called
rubber-sheet distortions).

Course 12
For example, the above figure shows a region with two holes.
Thus if a topological descriptor is defined by the number of
holes (H) in the region, this property obviously will not be
affected by a stretching or rotation transformation. In general,
however, the number of holes will change if the region is torn
or folded. Note that, as stretching affects distance, topological
properties do not depend on the notion of distance or any
properties implicitly based on the concept of a distance
measure.

Course 12
Another topological property useful for region description is

the number of connected components (C).
The number of holes H and connected components C in a
figure can be used to define the Euler number E:
E = C H.
Regions represented by straight-line segments (referred to as
polygonal networks) have a particularly simple interpretation
in terms of the Euler number.

Course 12
Figure 11.26 shows a polygonal network. Classifying interior

regions of such a network into faces and holes is often
important. Denoting the number of vertices by V, the number

Course 12
of edges by Q, and the number of faces by F gives the

following relationship, called the Euler formula:
V-Q+F = C-H = E.

Course 12
Suppose we want to segment the river from image in Fig.

11.27 (a). The image in Fig. 11.27 (b) has 1591 connected
components (obtained using 8-connectivity) and its Euler
number is 1552, from which we deduce that the number of
holes is 39. Figure 11.27(c) shows the connected component
with the largest number of elements (8479). This is the
desired result, which we already know cannot be segmented
by itself from the image using a threshold.

Course 12
Texture
An important approach to region description is to quantify its
texture content. Although no formal definition of texture
exists, this descriptor provides measures of properties such as
smoothness, coarseness and regularity. The three principal
approaches for describing the texture of a region are
statistical, structural, and spectral. Statistical approaches yield
characterizations of textures as smooth, coarse, grainy,
Structural techniques deal with the arrangement of image

Course 12
primitives, such as the description of texture based on

regularly spaced parallel lines. Spectral techniques are based
on properties of the Fourier spectrum and are used primarily
to detect global periodicity in an image by identifying
high-energy, narrow peaks in the spectrum.
Statistical approaches
One of the simplest for describing texture is to use statistical
moments of the intensity histogram of an image or region. Let
z be a random variable denoting intensity and let p(zi),

Course 12
i= 0, 1, 2, ... , L-1 be the corresponding histogram, where L is

the number of distinct intensity levels. The nth moment of z
about the mean is where m is the mean value of z is:
L 1
n ( z ) ( z i m ) n p( z i )
i 0
L 1
, m z i p( z i ) .
i 0
Note that 0 1 and 1 0 . The second moment, the

variance ( 2 ( z ) 2 ( z ) ) is of particular importance in texture
description. It is a measure of intensity contrast that can be

Course 12
used to establish descriptors of relative smoothness. For

example, the measure:
R( z ) 1
1
1 2 (z)
is 0 for areas of constant intensity and appraoches 1 for large

values of 2 ( z ) .
The third moment 3 ( z ) is a measure of the skewness of the
histogram while the forth moment is a measure of its relative

Course 12
flatness. Some other useful textures measure are uniformity

and the average entropy:
L 1
U ( z ) p 2 ( zi )
i 0
L 1
e( z ) p( zi )log 2 p( zi )
i 0

Course 12

Course 12
Structural aproach
Structural techniques deal with the arrangement of image
primitives. They use a set of predefined texture primitives and
a set of construction rules to define how a texture region is
constructed with the primitives and the rules.

Course 12
Spectral approaches
Spectral techniques use the Fourier transform of the image
and its properties in order to detect global periodicity in an
image, by identifying highenergy, narrow peaks in the
spectrum.
The Fourier spectrum is ideally suited for describing the
directionality of periodic or almost periodic 2-D patterns in an
image.

Course 12
Three features of the spectrum are suited for texture

description:
(1) prominent peaks give the principal direction of the
patterns;
(2) the location of the peaks gives the fundamental spatial
period of the patterns;
(3) eliminating any periodic components via filtering
leaves nonperiodic image elements, which can be
described by statistical techniques.

Course 12
We express the spectrum in polar coordinates to yield a

function S(r, ). For each direction , S(r, ) may be
considered a 1-D function S(r). Similarly, for each frequency
r, Sr() is a 1-D function. Analyzing S(r) for a fixed value of
yields the behavior of the spectrum (such as the presence

of peaks) along a radial direction from the origin, whereas
analyzing Sr() for a fixed value of r yields the behavior
along a circle centered on the origin.

Course 12
A more global description is obtained by using the following

functions:
S ( r ) S ( r )
0
R0
, S ( ) Sr ( )
r 1
where R0 is the radius of a circle centered at the origin.
S(r) and S(), that constitute a spectral-energy description of

texture for an entire image or region under consideration.
Furthermore, descriptors of these functions themselves can be
computed
in
order
to
characterize
their
behavior

Course 12
quantitatively. Descriptors typically used for this purpose are

the location of the highest value, the mean and variance of
both the amplitude and axial variations, and the distance
between the mean and the highest value of the function.

Course 13

Course 13
Recognition of Image Patterns

Once an image is segmented, the next task is to recognize the
segmented objects or regions in the scene. Hence, the
objective in pattern recognition is to recognize objects in the
scene from a set of measurements of the objects.
Each object is a pattern and the measured values are the
features of the pattern. A set of similar objects possessing
more or less identical features are said to belong to a certain
pattern class.

Course 13
Pattern recognition is an integral part of machine vision and

image processing and finds its applications in biometric and
biomedical image diagnostics to document classification,
remote sensing, and many other fields .
There are many types of features and each feature has a
specific technique for measurement.
As an example, each letter in the English alphabet is
composed of a set of features like horizontal, vertical, slant
straight lines, as well as some curvilinear line segments.

Course 13
While the letter A is described by two slant lines and one

horizontal line, letter B has a vertical line with two
curvilinear segments, joined in a specific structural format.
Some of the features of a two- or three-dimensional object
pattern are the area, volume, perimeter, surface, etc. which
can be measured by counting pixels. Similarly the shape of an
object may be characterized by its border. Some of the
attributes to characterize the shape of an object pattern are

Course 13
Fourier descriptors, invariant moments, medial axis of the

object, and so on.
The color of an object is an extremely important feature,
which can be described in various color spaces. Also various
types of textural attributes characterize the surface of an
object. The techniques to measure the features are known as
feature extraction techniques. Patterns may be described by a
set of features, all of which may not have enough
discriminatory power to discriminate one class of patterns

Course 13
from another. The selection and extraction of appropriate

features from patterns is the first major problem in pattern
recognition.
Decision Theoretic Pattern Classification

The classification of an unknown pattern is decided based on
some deterministic or statistical or even fuzzy set theoretic
principles. The block diagram of a decision theoretic pattern
classifier is shown in the below figure:

Course 13
Test
Pattern
Feature
Extraction
Classifier
Sample
Pattern
Feature
Extraction
Learning
Classified
Output
Block diagram of a decision theoretic pattern classifier
The decision theoretic pattern recognition techniques are

mainly of two types:

Course 13
1. Classification methods based on supervised learning,

2. Classification methods using unsupervised techniques.
The supervised classification algorithms can further be
classified as
Parametric classifiers
Nonparametric classifiers
In parametric supervised classification, the classifier is
trained with a large set of labeled training pattern samples in
order to estimate the statistical parameters of each class of

Course 13
patterns such as mean, variance, etc. By the term labeled

pattern samples, we mean the set of patterns whose class
memberships are known in advance. The input feature vectors
obtained during the training phase of the supervised
classification are assumed to be Gaussian in nature.
The minimum distance classifier and the maximum likelihood
classifier are some of the frequently used supervised
algorithms.

Course 13
On the other hand, the parameters are not taken into

consideration in the nonparametric supervised classification
techniques. Some of the nonparametric techniques are
k-nearest neighbor, Parzen window technique, etc.
In unsupervised case, the machine partitions the entire data
set based on some similarity criteria. This results in a set of
clusters, where each cluster of patterns belong to a specific
class.

Course 13
Bayesian Decision Theory

Assume that there are N classes of patterns C1, C2, . . . , CN,
and an unknown pattern x in a d-dimensional feature space
x = [x1, x2,, xd]. Hence the pattern is characterized by d
number of features. The problem of pattern classification is to
compute the probability of belongingness of the pattern x to
each class Ci, i = 1 , 2 , . . . , N . The pattern is classified to
the class Ck if probability of its belongingness to Ck is
maximum.

Course 13
While classifying a pattern based on Bayesian decision

theory, we distinguish two kinds of probabilities: (1) Apriori
probability, and (2) Aposteriori probaility. The apriori
probability indicates the probability that the pattern should
belong to a class, say Ck, based on the prior belief or evidence
or knowledge. This probability is chosen even before making
any measurements, i.e., even before selection or extraction of
a feature. Sometimes this probability may be modeled using
Gaussian distribution, if the previous evidence suggests it. In

Course 13
cases where there exists no prior knowledge about the class

membership of the pattern, usually a uniform distribution is
used to model it. For example, in a four class problem, we
may choose the apriori probability as 0.25, assuming that the
pattern is equally likely to belong to any of the four classes.
The aposteriori probability P(Ci|x), on the other hand,
indicates the final probability of belongingness of the pattern
x to a class Ci . The aposteriori probability is computed based
on the

Course 13
feature vector of the pattern,

class conditional probability density functions p(x|Ci) for
each class Ci,
apriori probability P(Ci) of each class Ci.
Bayesian decision theory states that the aposteriori probability
of a pattern belonging to a pattern class Ck is given by:
P (C k | x )
p( x | C k ) P ( C k )
N
p( x | C ) P ( C )
i 1

Course 13
p( x | C i )
1
( x i )T i 1 ( x i )
2
(2 )2 det i
where i is the mean feature vector of the patterns in class Ci

and i is the covariance matrix for class Ci. If the chosen
features are statistically independent covariance matrix is a
diagonal matrix which simplifies computations.
The pattern x belongs to class Cp when:
P (C p | x ) max{ P (C1 | x ), P (C 2 | x ), ..., P (C N | x )} .

Course 13
Minimum Distance Classification

Distance functions are used to measure the similarity or
dissimilarity between two classes of patterns. The smaller the
distance between two classes of patterns, the larger is the
similarity between them. The minimum distance classification
algorithm is computationally simple and commonly used.
The classifier finds the distances from a test input data
vectors to all the mean vectors representative of the target
classes. The unknown pattern is assigned to that class from

Course 13
which its distance is smaller than its distances to all other

classes.
Let us consider an N class problem. If the class Ci, contains a
single prototype pattern i (the mean vector) and the
unknown pattern is x = [x1, x2,, xd], then pattern belongs to
the class Ck if:
Dk min{d ( x , i ); i 1, 2,..., d }
where d is a distance.

Course 13
Minkowski Distance
d p ( y, z )
y
i 1
zi
p=1 city block or Manhattan distance

p=2 Euclidean distance
Mahalanobis Distance
If the parameters of the distribution of a specific pattern class
are assumed to be Gaussian with mean feature vector and
the covariance matrix , then the Mahalanobis distance

Course 13
between the test pattern with the feature vector x and that
pattern class C is given by
d ( x , C ) ( x )T 1 ( x ) .
Bounded Distance
In many pattern classification problems, it may be useful to
work with a bounded distance function, which lies in the
range [0,1]. Any given distance function D(x,y) may be
transformed into a bounded distance function d(x,y) , where:

Course 13
D( x , y )
.
d ( x, y)
D( x , y ) 1
Nonparametric Classification
The nonparametric classification strategies are not dependent
on the estimation of parameters.
k-Neareast-Neighbor Classification
In many situations we may not have the complete statistical
knowledge about the underlying joint distribution of the

Course 13
observation or feature vector x and the true class C, to which

the pattern belongs. For an unknown test sample, k-nearest
rule suggests that it should be assigned to the class to which
the majority of its k-nearest neighbors belong.
There are, however, certain problems in classifying an
unknown pattern using nearest neighbor rule. If there are N
number of sample patterns, then to ascertain the nearest
neighbor, we need to compute N distances from the test
pattern to each of the sample points. Also it is important to

Course 13
store all these N sample points. This leads to increase of the

computational as well as storage complexity of the k-nearest
neighbor problem. As the number of features increases, we
require more number of training data samples and hence it
increases the storage and computational complexities as well.
To reduce these complexities various researchers have taken
different measures:
Remove the redundant data from the data set, which

will reduce the storage complexity.

Course 13
The training samples need to be sorted to achieve

better data structure for reducing the computational
complexities.
The distance measure to be used for computation

should be simple.

Course 13
Linear Discriminant Analysis

An image can be described by a set of local features, these
features can be extracted at each pixel of the image. Let f k(p)
denotes the k-th feature at pixel p. If each pixel in an image is
associated with d number of features, we have a matrix
F = { f1 , . . . , fd }
of dimension n d, where n is the total number of pixels in
the image. It may be noted here that this matrix contains lot of
local information of the entire image, much of which is

Course 13
redundant. The discriminant analysis is employed to find

which variables discriminate between two classes and is
essentially analogous to the analysis of variance. In
discriminant analysis, we assume that the discriminant
function is linear, i.e.,
g ( x ) w T x x0 0
is a hyperplane, which partitions the feature space in two
subspaces. In Fisher's linear discriminat approach, the

Course 13
d-dimensional patterns x are projected onto a line, such that

the projection of data
y wT x
are well separated. The measure of this separation can be
chosen as
2
(
m
m
)
J ( w T ) 12 22
S1 S2

Course 13
where ml and m2 are the projection means for classes C1 and
C2 and S 12 and S 22 are the within class variances of the

projected data.
S i2
2
(
y
m
)
y C i
gives a measure of scatter of the projected set of data points y.

The objective function J ( w T ) is maximized for the weight w
such that:
1 m
2 ) , W 1 2 .
w W1 ( m

Course 13
The Fisher linear discriminant function is widely used for

identifying the linear separating vector between pattern
classes. The procedure uses the maximization of between
class scatter while minimizing the intra-class variances.
Unsupervised Classification Strategied Clustering

In a clustering problem, we have a set of patterns, that have to
be partitioned in a set of clusters such that the patterns within
a cluster are more similar to each other than the patterns from

Course 13
other clusters or partitions. Thus central to the goals of cluster

analysis lies the notion of similarity. There are a couple of
methods of clustering. We can divide these methods into the
following three classes:
1. Hierarchical methods
2 . K-means methods
3. Graph theoretic methods
In hierarchical algorithms, the data set is partitioned in a
number of clusters in a hierarchical fashion. The hierarchical

Course 13
clustering methods may again be subdivided into the

following two categories.
1.
Agglomerative clustering: In agglomerative clustering,
we start with a set of singleton clusters, which are merged

in each step, depending on some similarity criterion, and
finally we get the appropriate set of clusters.
2.
Divisive clustering: In divisive clustering, as the name
suggests, the whole set of patterns initially is assumed to

Course 13
belong to a single cluster, which subsequently is divided

in several partitions in each step.
The
hierarchical
clustering
may
be
represented
by
dendograms, a tree structure which demonstrates the merging

(fusion) or division of points in each step of hierarchical
partitioning. Agglomerative clustering is the bottom up
clustering procedure where each singleton pattern (leaf nodes
at the bottom of the dendogram) merges with other patterns,
according to some similarity criterion. In divisive algorithm,

Course 13
on the other hand, starting with the root node S, we

recursively partition the set of patterns until singleton patterns
are reached at the bottom of the tree.
Single Linkage Clustering

The single linkage or nearest neighbor agglomerative
clustering technique involves grouping of patterns based on a
measure of intercluster (distance between two clusters).

Course 13
Assuming two clusters P1 and P2, each containing finite

number of patterns, in single linkage method, the distance
between P1 and P2 is given by:
Dmin ( P1 , P2 ) min{d ( pi1 , p 2j ); pi1 P1 , p 2j P2 }

Course 13
Complet Linkage Clustering

In complete linkage clustering, distance between two clusters
is defined as the distance between the most distant pair of
patterns, each pattern belonging to one cluster. This method
may thus be called the farthest-neighbor method.
In complete linkage method, the distance between P1 and P2 is
given by:
Dmax ( P1 , P2 ) max{d ( pi1 , p 2j ); pi1 P1 , p 2j P2 }

Course 13
Average Linkage Clustering

In average linkage clustering, distance between two clusters is
the average of all distances between all pairs of patterns.
In this method, the distance between P1 and P2 is given by:
Davg ( P1 , P2 ) average{d ( pi1 , p 2j ); pi1 P1 , p 2j P2 }

If there are ni patterns in cluster Pi , i=1,2 then
1
Davg ( P1 , P2 )
2
1
2
d
(
p
,
p
i j)
i, j
n1n2

Course 13
K-Means Clustering Algorithm

In K-means clustering approach, we partition the set of input
patterns S into a set of K partitions, where K is known in
advance. The method is based on the identification of the
centroids of each of the K clusters. Thus, instead of
computing the pairwise interpattern distances between all the
patterns in all the clusters, here the distances may be
computed only from the centroids. The method thus

Course 13
essentially reduces to searching for a best set of K centroids

of the clusters as follows:
Step 1: Select K initial cluster centers C1, C2,. . . , CK.

Step 2: Assign each pattern X S to a cluster Ci (1 i K) ,
whose centroid is nearest to pattern X.
Step 3: Recompute the centroids in each cluster Cj (1 j K)

in which there has been any addition or deletion of pattern
points.
Step 4: Jump to Step 2, until convergence is achieved.

Course 13
The major problem is the selection of initial cluster

configurations. It is possible either to select the first K
samples as the initial cluster centers or to randomly select K
samples from the pool of patterns as the cluster centers. A
rough partition in K clusters may, however, yield a better set
of initial cluster centers.

Course 13
Syntactic Pattern Classification

It may be noted that there exists an inherent structure inside a
pattern and there is a positive interrelationship among the
primitive elements which form a pattern. The interrelationship
between pattern elements called primitives and the articulated
description of a pattern in terms of such relations provide a
basis of structural or linguistic approach to pattern
recognition.

Course 13
In syntactic pattern recognition each pattern is characterized

by a string of primitives and the classification of a pattern in
this approach is based on analysis of the string with respect to
the grammar defining that pattern class.
The syntactic approach to pattern recognition involves a set of
processes:

Course 13
1.
Selection and extraction of a set of primitives
(segmentation problem);
2.
Analysis of pattern description by identification of
the interrelationship among the primitives;

3.
Recognition of the allowable structures defining
the interrelationship between the pattern primitives.

Course 13
Primitive Selection Strategies

Segmentation of patterns poses the first major problem in
syntactic pattern recognition. A pattern may be described by a
string of subpatterns or primitives, which may easily be
identified. If each subpattern is complex in structure, each of
them may again be described by simpler subpatterns which
are easily identifiable.
Various approaches to primitive selection have been
suggested in the literature. One of the most frequently used

Course 13
schemes of boundary descriptions is the chain code method

by Freeman. Under this approach, a rectangular grid is
overlaid on a two-dimensional pattern and straight line
segments are used to connect the adjacent grid points
covering the pattern.
Let us consider a sequence of n points { p1 , p2 . . . . ,pn}
which describe a closed curve. Here the point pi is a neighbor
of pi-1 and pi+1 when i < n, the point pn is the neighbor of pn-1
and the point p0 and also p0 is the neighbor of p1 and pn. The

Course 13
Freeman chain code contains n vectors pi pi-1 and each of

these vectors is represented by an integer m = 0,1, . . . ,7 as
shown in the figure:

Course 13
Each line segment is assigned an octal digit according to its

slope and the pattern is represented by a chain of octal digits.
This type of representation yields patterns composed of a
string of symbolic valued primitives.
This method may be used for coding any arbitrary twodimensional figures composed of straight line or curved
segments and has been widely used in many shape
recognition applications. The major limitation of this

Course 13
procedure is that the patterns need adequate preprocessing

for ensuring proper representation.
Once a satisfactory solution to the primitive selection and
extraction problem is available, the next step is the
identification of structural interrelationship among the
extracted pattern primitives. A pattern may be described as
sets of strings or sentences belonging to specific pattern
classes. First order logic may be used for describing the
primitive interrelationship where a pattern is described by

Course 13
certain predicates and objects occurring in the pattern may be

defined using the same predicates. When the patterns are
represented as strings of primitives they may be considered as
sentences of a regular, context-free, or context-sensitive
languages. Thus suitable grammars may be defined for
generating pattern languages by specifying a set of production
rules which generate the sentences in the said pattern
language. The corresponding computing machines known as

Course 13
automata have the capability of recognizing whether a string

of primitives belongs to a specific pattern class.
High-Dimensional Pattern Grammars

The string representation of patterns is quite adequate for
structurally simpler forms of patterns. The classical string
grammars are, however, weak in handling noisy and
structurally complex pattern classes. This is because the only
relationship
supported
by
string
grammars
is
the
concatenation relationship between the pattern primitives.

Course 13
Here each primitive element is attached with only two other

primitive elements-one to its right and the other to its left.
Such a simple structure thus may not be sufficient to
characterize more complex patterns, which may require better
connectivity relationship for their description. An appropriate
extension of string grammars has been suggested in the form
of high-dimensional grammars. These grammars are more
powerful as generators of language and are capable of

Course 13
generating complex patterns like chromosome patterns,

nuclear bubble chamber photographs, and so on.
In a string grammar each primitive symbol is attached with
only two other primitive elements, one to the right and the
other to the left of the element. A class of grammars was
suggested by Fedder, where a set of primitive elements may
be used with multiple connectivity structure. These grammars
are known as PLEX grammars. PLEX grammar involving
primitive structures called n-attaching point entity (NAPE)

Course 13
and a set of identifiers associated with each NAPE has been

used for pattern generation. The n-attaching point entities are
primitive elements in which there are n number of specified
points on the primitive elements where other attaching
elements may be connected. Thus this class of grammars have
more generating capabilities compared to the string
grammars.

Course 13
Syntactic Inference
A key problem in syntactic Pattern Recognition is inferring
an appropriate grammar using a set of samples belonging to
different pattern classes.
In syntactic pattern recognition, the problem of grammatical
inference is one of central importance. This approach is based
on the underlying assumption of the existence of at least one
grammar characterizing each pattern class. The identification
and extraction of the grammar characterizing each pattern

Course 13
class forms the core problem in the design of a syntactic

pattern classifier. The problem of grammatical inference
involves development of algorithms to derive grammars using
a set of sample patterns which are representatives of a pattern
class under study. This may thus be viewed as a learning
procedure using a finitely large and growing set of training
patterns. In syntactic pattern classification, the strings
belonging to a particular pattern class may be considered to
form sentences belonging to the language corresponding to

Course 13
the pattern class. A machine is said to recognize a pattern

class if for every string belonging to that pattern class, the
machine decides that it is a member of the language and for
any string not in the pattern class, it either rejects or loops
forever. A number of interesting techniques have been
suggested for the automated construction of automaton which
accepts the strings belonging to a particular pattern class.

Course 13
Symbolic Projection Method

Here we will present a scene interpretation scheme based on a
work by Jungert. The structure is called symbolic projections.
The basic idea is to project the positions of all objects in a
scene or image along each coordinate axis and then generate a
string corresponding to each one of the axes. Each string
contains all the objects in their relative positions, that is, one
object is either equal to or less than any of the others.

Course 13

Course 13
Figure 1 shows how simple objects can be projected along the

X- and Y-coordinate axis. The two operators used are equal
to and less then. The strings are called U- and the V-strings,
where the U-string corresponds to the projections of the
objects along the X-axis, and the V-string to the Y-axis. The
symbolic projections are best suited for describing relative
positions of objects, which is important in spatial reasoning in
images of our discussion.

Course 13
One may use several spatial relational operators, such as

equal, less then, greater than, etc., as follows:
Equal (=): Two objects A and B are said to be equal in

spatial dimension, i.e., A = B if and only if centroid of A
is same as the centroid of B.
Less than (<): Two objects A and B separated by a

distance may be spatially related by A < B if and only if
max(Ax) < min(Bx), where max(Ax) (or min(Bx))

indicates the maximum (or minimum) values of the

Course 13
projection of all the pixels in object A (or object B) along

the X-direction. Similar relationships can be defined along
the Y-axis.
Greater than (>): Two objects A and B separated by a

distance may be spatially related by A > B if and only if
min(Ax) > max(Bx).

Top and Bottom: Two objects A and B separated by a
distance may be spatially related by A top of B if and only
if min(Ay) > max(By).

Course 13
In Figure 1 the object A is to the left of object B, A < B, and

object A is on top of object B.
Neural Networks
The approaches discussed untill now are based on the use of
sample patterns to estimate statistical parameters of each
pattern class (mean vector of each class,covariance matrix).
The patterns (of known class membership) used to estimate
these parameters usually are called training patterns, and a

Course 13
set of such patterns from each class is called a training set.

The process by which a training set is used to obtain decision
functions is called learning or training.
The training patterns of each class are used to compute the
parameters of the decision function corresponding to that
class. After the parameters in question have been estimated,
the structure of the classifier is fixed, and its eventual
performance will depend on how well the actual pattern
populations satisfy the underlying statistical assumptions

Course 13
made in the derivation of the classification method being

used.
The statistical properties of the pattern classes in a problem
often are unknown or cannot be estimated. In practice, such
decision-theoretic problems are best handled by methods that
yield the required decision functions directly via training.
Then,
making
probability
assumptions
density
functions
regarding
or
the
other
underlying
probabilistic

Course 13
information about the pattern classes under consideration is

unnecessary.
Background
The idea of neural networks is the use of a multitude of
elemental nonlinear computing elements (called neurons)
organized as networks reminiscent of the way in which
neurons are belihed to be interconnected in the brain. The
resulting models are referred to as neural networks.

Course 13
We use these networks as vehicles for adaptively developing

the coefficients
of decision function via successive
presentations of training sets of patterns.
Perceptron for two pattern classes

In its most basic form, the perceptron learns a linear decision
function that dichotomizes two linearly separable training
sets. The perceptron model for two pattern classes. The

Course 13
response of this basic device is based on a weighted sum of

its inputs; that is,
n
d ( x ) wi xi wn 1
i 1
which is a linear decision function with respect to the

components of the pattern vectors. The coefficients wi called
weights, modify the inputs before they are summed and fed
into the threshold element. In this sense, weights are
analogous to synapses in the human neural system. The

Course 13
function that maps the output of the summing junction into

the final output of the device sometimes is called the
activation function.
When d(x) > 0, the threshold element causes the output of the
perceptron to be + 1, indicating that the pattern x was
recognized as belonging to class C1, the reverse is true when
d(x) < 0. When d(x) = 0, x lies on the decision surface

separating the two pattern classes, giving an indeterminate
condition.

Course 13
O
1
if
w x
i 1
wn1
wn1
if
w x
i 1
n1
d ( y ) w i yi y T w ,
i 1
y ( y1 , y2 ,..., yn ,1)T augmented pattern vector

w ( w1 , w2 ,..., wn , wn1 )T weight vector

Course 13
Training algorithms
Linearly separable classes: A simple, iterative algorithm for
obtaining a solution weight vector for two linearly separable
training sets follows. For two training sets of augmented
pattern vectors belonging to pattern classes C1 and C2,
respectively, let w(l) represent the initial weight vector, which
may be chosen arbitrarily. Then, at the kth iterative step:

Course 13
w ( k ) cy( k ) if y( k ) C1 and w T ( k ) y( k ) 0
w ( k 1) w ( k ) cy( k ) if y( k ) C 2 and w T ( k ) y( k ) 0
w(k )
otherwise
where c is a positive correction increment.

This algorithm makes a change in w only if the pattern being
considered at the kth step in the training sequence is
misclassified. The correction increment c is assumed to be
positive and, for now, to be constant. This algorithm

Course 13
sometimes is referred to as the fixed increment correction

rule.
Nonseparable classes: In practice, linearly separable pattern
classes are the (rare) exception, rather than the rule.
We describe in the following the original delta rule, known
alsoe as the Widrow-Hoff, or least-mean-square (LMS) delta
rule for training perceptrons, the method minimizes the error
between the actual and desired response at any training step.
Consider the function

Course 13
1
J ( w ) ( r w T y )2
2
Where r is the desired response (r=+1 if y belongs to C1 and
r=-1 if y belongs to C2). The task is to find w which

minimizes J(w). We have the following iterative method:
w ( k 1) w ( k ) r ( k ) w T ( k ) y( k ) y( k ), w (1) arbitrary.

Course 13
Multilayer Perceptron
The most popular neural network model is the multilayer
perceptron (MLP), which is an extension of the single layer
perceptron proposed by Rosenblatt. Multilayer perceptrons, in
general, are feedforward network, having distinct input,
output, and hidden layers. The architecture of multilayered
perceptron with error backpropagation network is shown in
the figure below.

Course 13

Course 13
In an M-class problem where the patterns are N-dimensional,

the input layer consists of N neurons and the output layer
consists of M neurons. There can be one or more middle or
hidden layer(s). We will consider here a single hidden layer
case, which is extendable to any number of hidden layers. Let
the hidden layer consists of p neurons. The output from each
neuron in the input layer is fed to all the neurons in the hidden
layer. No computations are performed at the input layer
neurons. The hidden layer neurons sum up the inputs, passes

Course 13
them through the sigmoid non-linearity and fan-out multiple

connections to the output layer neurons.
In feed forward activation, neurons of the first hidden layer
compute their activation and output values and pass these on
to the next layer as inputs to the neurons in the output layer,
which produce the networks actual response to the input
presented to neurons at the input layer. Once the activation
proceeds forward from the input to the output neurons, the
network's response is compared to the desired output

Course 13
corresponding to each set of labeled pattern samples

belonging to each specific class, there is a desired output. The
actual response of the neurons at the output layer will deviate
from the desired output which may result in an error at the
output layer. The error at the output layer is used to compute
the error at the hidden layer immediately preceding the output
layer and the process continues.
In view of the above, the net input to the j-th hidden neuron
may be expressed as

Course 13
N
I wijh xi jh .
h
j
i 1
The output of the j-th hidden layer neuron is:
1
Oj f (I )
1 exp( I hj )
h
j
h
j
where x1,. . . , xN is the input pattern vector, weights wijh

represents the weight between the hidden layer and the input
layer, and jh is the bias term associated with each neuron in
the hidden layer. Identical equations with change of

Course 13
subscripts hold good for the output layer. These calculations

are known as forward pass. In the output layer, the desired or
target output is set as Tk and the actual output obtained from
the network is Ok. The error (Tk - Ok) between the desired
signal and the actual output signal is propagated backward
during the backward pass. The equations governing the
backward pass are used to correct the weights. Thus the
network learns the desired mapping function by back
propagating
the
error
and
hence
the
name
error

Course 13
backpropagation. The generalized delta rule originates from

minimizing the sum of squared error between the actual
network output and desired output responses (Tk ) over all
the patterns. The average error E is a function of weight as
shown:
1 M
E ( w jk ) (Tk Ok )2
2 k 1
w (jknew ) w (jkold ) j O j
where is the learning rate of the hidden layer neurons.

Course 13
j O j (1 O j )(T j O j )
where Tj is the ideal response.

DIP 2013 All in One

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DIP 2013 All in One

Загружено:

Авторское право:

Доступные форматы

Digital Image Processing

Digital Image Processing

M. Petrou, C. Petrou, Image Processing: the fundamentals,

Digital Image Processing

Image Processing Toolbox (http://www.mathworks.com/products/image/)

Processing: A Practical Approach with Examples in Matlab,

Digital Image Processing

Digital Image Processing

Digital Image Processing

Digital Image Processing

Digital Image Processing

Digital Image Processing

Distinction between image processing , image analysis , computer vision :

Digital Image Processing

Digital Image Processing

The Origins of DIP

Digital Image Processing

Digital Image Processing

1964, Jet Propulsion Laboratory (Pasadena, California) processed pictures of the

The first picture of the

Digital Image Processing

1960-1970 image processing techniques were used in medical image,

Digital Image Processing

Digital Image Processing

Examples of Fields that Use DIP

Digital Image Processing

Electromagnetic waves can be thought as propagating sinusoidal

Digital Image Processing

Digital Image Processing

Gamma Ray Imaging

Digital Image Processing

Examples of gamma-ray imaging

Digital Image Processing

Digital Image Processing

Digital Image Processing

Examples of X-ray imaging

Digital Image Processing

Imaging in the Ultraviolet Band

Digital Image Processing

Imaging in the Visible and Infrared Bands

Digital Image Processing

Examples of light microscopy

Digital Image Processing

a a circuit board controller

Digital Image Processing

Digital Image Processing

Imaging in the Microwave Band

Digital Image Processing

Imaging in the Radio Band

Digital Image Processing

MRI images off a human knee (left)

Digital Image Processing

Images of the Crab Pulsar covering the electromagnetic spectrum

Digital Image Processing

Other Imaging Modalities

Digital Image Processing

Fundamental Steps in DIP

Digital Image Processing

image filtering and enhancement

color image processing

wavelets and multiresolution processing

Digital Image Processing

Outputs are attributes