Вы находитесь на странице: 1из 822

Digital Image Processing

Course 1

Digital Image Processing

Course 1

Bibliography
R.C. Gonzales, R.E. Woods, Digital Image Processing, Prentice Hall,
2008, 3rd ed.
R.C. Gonzales, R.E. Woods, S.L. Eddins, Digital Image Processing
Using MATLAB,
MATLAB Prentice Hall,
Hall 2003
http://www.imageprocessingplace.com/

M. Petrou, C. Petrou, Image Processing: the fundamentals,


John Wiley, 2010, 2nd ed.
W.
W Burger,
B
M.J.
M J Burge,
B
Di it l Image
Digital
I
Processing,
P
i
A Al
An
Algorithmic
ith i
Introduction Using Java, Springer, 2008

Digital Image Processing

Course 1

Image Processing Toolbox (http://www.mathworks.com/products/image/)


C. Solomon, T. Breckon, Fundamentals of Digital Image

Processing: A Practical Approach with Examples in Matlab,


Matlab
Wiley-Blackwell, 2011
W.K. Pratt, Digital
g
Image
g Processing,
g Wiley-Interscience,
y
2007

Digital Image Processing

Course 1

Evaluation
MATLAB image processing test (50%)
Articles/Books presentations (50%)

Digital Image Processing

Course 1

Meet Lena!
The First Ladyy of the Internet

Digital Image Processing

Course 1

Lenna
e a Sode
Soderberg
be g (Sjb
(Sjblom)
o )a
and
d
Jeff Seideman taken in May 1997
Imaging Science & Technology
Conference

Digital Image Processing

Course 1

Wh t is
What
i Digital
Di it l Image
I
Processing?
P
i ?

f : D / 3
f(x,y) = intensity, gray level of the image at spatial point (x,y)
x, y, f(x,y) finite, discrete quantities digital image
Digital Image Processing = processing digital images by means of a digital
computer
A digital image is composed of a finite number of elements (location, value of
intensity):

( xi , y j , f ij )
These elements are called picture elements, image elements, pels, pixels

Digital Image Processing

Course 1

Image processing is not limited to the visual band of the electromagnetic (EM)
spectrum
Image processing : gamma to radio waves, ultrasound, electron microscopy,
computer-generated images
image processing , image analysis , computer vision ?
Image processing = discipline in which both the input and the output of a
process are images
C
Computer
t Vi
Vision
i
= use computer
t tto emulate
l t h
human vision
i i (AI)
learning, making inferences and take actions
based on visual inputs
Image analysis (image understanding) = segmentation, partitioning images
into regions or objects
(link between image processing and image analysis)

Digital Image Processing

Course 1

Distinction between image processing , image analysis , computer vision :


low-level, mid-level, high-level processes
L
Low-level
l
l processes: iimage preprocessing
i tto reduce
d
noise,
i
contrast
t t enhancement,
h
t
image sharpening; both input and output are images
Mid-level p
processes: segmentation,
g
p
partitioning
g images
g into regions
g
or objects,
j
description of the objects for computer processing,
classification/recognition of individual objects;
inputs are generally images, outputs are attributes extracted
from the input image (e
(e.g.
g edges
edges, contours
contours,
identity of individual objects)
High-level processes: making sense of a set of recognized objects;
performing the cognitive functions associated with vision

Digital Image Processing

Course 1

Di it l IImage P
Digital
Processing
i (G
(Gonzalez
l +W
Woods)
d )=
processes whose inputs and outputs are images +
processes that extract attributes from images,
recognition of individual objects
(low and mid
(lowmid-level
level processes)
Example:
automated analysis of text =
acquiring an image containing text,
preprocessing the image (enhancement, sharpening),
extracting (segmenting) the individual characaters
characaters,
describing the characters in a form suitable for computer processing,
recognition of individual characters

Digital Image Processing

Course 1

The Origins of DIP


Newspaper industry: pictures were sent by submarine cable between
London and New York
Before Bartlane cable picture transmission system (early 1920s) 1 week
With Bartlane system: less than 3 hours
Specialized printing equipment coded pictures for cable transmission and
reconstructed them at the receiving end
(1920s -5 distict levels of gray, 1929 15 levels)
This example is not DIP , the computer is not involved
DIP is linked and devolps in the same rhythm as digital computers
(data storage, display and transmission)

Digital Image Processing

Course 1

A digital picture
produced in 1921
from a coded tape
by a telegraph
printer with
special type faces
(McFarlane)

A digital picture
made in 1922
from a tape
punched after the
signals had
crossed the Atlantic twice
(McFarlane)

Digital Image Processing

Course 1

1964, Jet Propulsion Laboratory (Pasadena, California) processed pictures of the


moon transmitted by Ranger 7 (corrected image distortions)

The first picture of the


moon by a U.S. spacecraft.
Ranger 7 took this image
July 31, 1964, about
17 minutes before
impacting the lunar
surface.
(Courtesy of NASA)

Digital Image Processing

Course 1

1960-1970 image processing techniques were used in medical image,


remote Earth resources observations, astronomy
1970s invention of CAT (computerized axial tomography)
http://www.virtualmedicalcentre.com/videos/cat-scans/793
CAT is a process in which a ring of detectors encircles an object (pacient),
and a X-ray source, concentric with the detector ring, rotates about the object.
The X-ray passes through the patient and are collected at the opposite end
b th
by
the d
detectors.
t t
A
As th
the source rotates
t t the
th procedure
d
is
i repeted.
t d
Tomography consists of algorithms that use the sense data to construct an
image that represent a slice through the object. Motion of the object in a
perpendiculare
p
to the ring
g of detectors p
produces a set of slices
direction p
which can be assembled in a 3D information of the inside of the object.

Digital Image Processing

Course 1

g
geographers
g p
use DIP to study
yp
pollution p
patterns from aerial and satellite imagery
g y
archeology DIP allowed restoring blurred pictures that recorded rare artifacts
lost or damaged after being photographed
physics enhance images of experiments (high-energy plasmas,
electron microscopy)
astronomy, biology, nuclear medicine, law enforcement, industry
DIP used in solving problems dealing with machine perception
extracting from an image information suitable for computer processing
(statistical moments, Fourier transform coefficients, )
automatic character recognition, industrial machine vision for product
assembly
bl and
d iinspection,
i
military
ili
recognizance,
i
automatic
i processing
i off
fingerprints, machine processing of aerial and satellite imagery for
weather prediction, Internet

Digital Image Processing

Course 1

Examples of Fields that Use DIP


Images can be classified according to their sources (visual, X-ray, )
Energy sources for images :
electromagnetic energy spectrum,
acoustic,
ultrasonic,
electronic,
computer- generated

Digital Image Processing

Course 1

Electromagnetic waves can be thought as propagating sinusoidal


waves of different wavelength, or as a stream of massless particles,
each moving in a wavelike pattern with the speed of light.
light Each
massless particle contains a certain amount (bundle) of energy. Each
bundle of energy is called a photon.
photon If spectral bands are grouped
according to energy per photon we obtain the spectrum shown in the
image above, ranging from gamma
gamma-rays
rays (highest energy) to radio
waves (lowest energy).

Digital Image Processing

Course 1

Digital Image Processing

Course 1

Gamma Ray Imaging


Gamma-Ray
Nuclear medicine , astronomical observations
Nuclear medicine
the approach is to inject a patient with a radioactive isotope that emits
gamma rays as it decays
decays.
Images are produced from the emissions collected by gamma-ray detectors
Images of this sort are used to locate sites of bone pathology (infections, tumors)
PET (positron emision tomography) the patient is given a radioactive
isotope that emits positrons as it decays

Digital Image Processing

Course 1

Examples of gamma-ray imaging

Bone scan

PET image

Digital Image Processing

Course 1

X-ray imaging
Medical diagnostic,industry, astronomy
A X-ray tube is a vacuum tube with a cathode and an anode.
The cathode is heated, causing free electrons to be released.
The electrons flows at high
g speed
p
to the p
positively
y charged
g anode.
When the electrons strike a nucleus, energy is released in the
form of a X-ray radiation. The energy (penetreting power) of the
X-rays is controlled by a voltage applied across the anode, and
by a curent applied to the filament in the cathode
cathode.
The intensity of the X-rays is modified by absorbtion as they pass
through the patient and the resulting energy falling develops it
much in the same way that light develops photographic film.

Digital Image Processing

Course 1

A i
Angiography
h = contrast-enhancement
t t h
t radiography
di
h
Angiograms = images of blood vessels
A catheter is inserted into an artery or vin in the groin. The catheter is threaded
into the blood vessel and guided to the area to be studied. When it reaches
the area to be studied, a X-ray contrast medium is injected through the catheter.
This enhances contrast of the blood vessels and enables radiologist to see any
irregularities or blockages.
X-rays are used in CAT (computerized axial tomography)
X-rays used in industrial processes (examine circuit boards for flows in manifacturing)
Industrial CAT scans are useful when the parts can be penetreted by X-rays

Digital Image Processing

Course 1

Examples of X-ray imaging

Chest X-ray
Aortic angiogram

Head CT

Cygnus Loop

Circuit boards

Digital Image Processing

Course 1

Imaging in the Ultraviolet Band


Litography, industrial inspection, microscopy, biological imaging,
astronomical observations
Ultraviolet light is used in fluorescence microscopy
Ultraviolet light is not visible to human eye but when a photon of ultraviolet
radiation collides with an electron in an atom of a fluorescent material it
elevates the electron to a higher energy level. After that the electron
relaxes to a lower level and emits light in the form of a lower-energy
photon
h t in
i th
the visible
i ibl ((red)
d) lilight
ht region.
i
Fluorescence = emission of light by a substance that has absorbed light or
other electromagentic radiation of a different wavelength
p = uses an excitation light
g to irradiate a p
prepared
p
specimen
p
Flourescence microscope
and then it separates the much weaker radiating
fluorescent light from the brighter excitation light.

Digital Image Processing

Course 1

Imaging in the Visible and Infrared Bands


Light
Li ht microscopy,
i
astronomy,
t
remote
t sensing,
i
iindustry,
d t llaw enforcement
f
t
LANDSAT satellite obtained and transmitted images of the Earth from space
for p
purpose
p
of monitoring
g environmental conditions
on the planet
Weather observations and prediction produce major applications of multispectral
image from satellites

Digital Image Processing

Course 1

Examples of light microscopy

Taxol
(anticancer agent)
magnified 250X

Nickel oxide
thin film
(600X)

C o este o
Cholesterol
(40X)

Surface of
audio CD
(1750X)

Microprocessor
(60X)

Organic
superconductor
(450X)

Digital Image Processing

Course 1

A t
Automated
t d visual
i
l iinspection
ti off manufactured
f t d goods
d

a b
c d
e f

a a circuit board controller


b packaged
k
d pilles
ill
c bottles
d air bubbles in clear-plastic
product
e cereall
f image of intraocular implant

Digital Image Processing

Course 1

Digital Image Processing

Course 1

Imaging in the Microwave Band


The dominant aplicationof imageing in the microwave band radar
radar has the ability to collect data over virtually any region at any time
time,
regardless of weather or ambient light conditions
some radar waves can penetrate clouds, under certain conditions can
penetrate vegetation, ice, dry sand
sometimes radar is the only way to explore inaccessible regions of the
Earths
Earth
s surface
An imaging radar works like a flash camera : it provides its own illumination
(microwave pulses) to light an area on the ground and take a snapshot image.
I t d off a camera lens,
Instead
l
a radar
d uses an antenna
t
and
d a digital
di it l d
device
i tto
record the images. In a radar image one can see only the microwave energy
that was reflected back toward the radar antenna

Digital Image Processing

Course 1

Imaging in the Radio Band


medicine, astronomy
MRI = Magnetic Resonance Imaging
This technique places the patient in a powerful magnet and passes short pulses
of radio waves through his or her body
body.
Each pulse causes a responding pulse of radio waves to be emited
by the patinet tissues.
The location from which these signals originate and their strength are
determined by a computer, which produces a 2D picture of a section of the patient

Digital Image Processing

Course 1

MRI images off a human knee (left)


( f ) and spine (right)
(
)

Digital Image Processing

Course 1

Images of the Crab Pulsar covering the electromagnetic spectrum

Gamma

X-ray

Optical

Infrared

Radio

Digital Image Processing

Course 1

Other Imaging Modalities


acoustic imaging, electron microscopy, synthetic
((computer-generated)
p
g
) imaging
g g
Imaging using sound geological explorations, industry, medicine
Mineral and oil exploration
For image acquisition over land one of the main approaches is to use a
large truck an a large flat steel plate.
The plate is pressed on the ground by the truck and the truck is vibrated
through a frequency spectrum up to 100 Hz.
The strength and the speed of the returning sound waves are determined
by the composition of the Earth below the surface
surface.
These are analysed by a computer and images are generated from the
resulting analysis.

Digital Image Processing

Course 1

Fundamental Steps in DIP


methods whose input and output are images
methods whose inputs are images but whose outputs are
attributes extracted from those images

Digital Image Processing

Course 1

O t t are images
Outputs
i

image acquisition

image filtering and enhancement

image restoration

color image processing

wavelets and multiresolution processing


p
g

compression

morphological processing

Digital Image Processing

Course 1

Outputs are attributes


morphological processing
segmentation
representation and description
object recognition

Digital Image Processing

Course 1

I
Image
acquisition
i iti
- may involve
i
l preprocessing
i such
h as scaling
li
Image enhancement
manipulating an image so that the result is more suitable than
the original for a specific operation
enhancement is problem oriented
there is no general theory of image enhancement
enhancement use subjective methods for image emprovement
enhancement is based on human subjective preferences regarding
what is a good
good enhancement result

Digital Image Processing

Course 1

Image restoration
improving the appearance of an image

j
- the techniques
q
for restoration are based on
restoration is objective
mathematical or probabilistic models of image degradation

Color image processing


fundamental concept in color models

basic color processing in a digital domain

Wavelets and multiresolution processing


representing images in various degree of resolution

Digital Image Processing

Course 1

Compression
reducing the storage required to save an image or the bandwidth
required
i d tto ttransmit
it it
Morphological processing
tools for extracting image components that are useful in the representation
and description of shape
a transition from processes that output images to processes that output
image attributes

Digital Image Processing

Course 1

Segmentation
partitioning an image into its constituents parts or objects
autonomous segmentation is one of the most difficult tasks of DIP
the more accurate the segmentation , the more likley recognition is to succeed
Representation and description (almost always follows segmentation)
segmentation
t ti produces
d
either
ith th
the b
boundary
d
off a region
i or allll th
the poits
it iin th
the
region itself
converting
g the data p
produced by
y segmentation
g
to a form suitable for
computer processing

Digital Image Processing

Course 1

boundary representation: the focus is on external shape characteristics such


as corners or inflections
complete
l t region:
i
th
the ffocus iis on iinternal
t
l properties
ti such
h as ttexture
t
or
skeletal shape
description
p
is also called feature extraction extracting
g attributes that result
in some quantitative information of interest or are basic for differentiating
one class of objects from another
Object recognition
the process of assigning a label (e.g. vehicle) to an object based on its
descriptors
Knowledge database

Digital Image Processing

Course 1

Simplified diagram
of a cross section
of the human eye

Digital Image Processing

Course 1

Digital Image Processing

Course 1

Three membranes enclose the eye:


the cornea and sclera outer cover
the choroid
the retina
The cornea is a tough, transparent tissue that covers the anterior surface of the eye.
Continuous with the cornea, the sclera is an opaque membrane that encloses
the remainder of the optic globe.
The choroid lies directly below the sclera. This membrane contains a network of
blood vessels (major nutrition of the eye). The choroid is pigmented and help reduce
the amount of light entering the eye
The choroid is divided (at its anterior extreme) into the ciliary body and the iris.
Th iris
The
i i contracts
t t and
d expands
d tto control
t l th
the amountt off light
li ht

Digital Image Processing

Course 1

The lens
Th
l
i made
is
d up off concentric
t i llayers off fib
fibrous cells
ll and
d iis suspended
d db
by
fibers that attach to ciliary body (60-70% water, 6% fat, protein). The lens is
colored in slightly yellow. The lens absorbs approximatively 8% of the visible
g ((infrared and ultraviolet light
g are absorbed by
yp
proteins in lens))
light
The innermost membrane is the retina. When the eye is proper focused,
light from an object outside the eye is imaged on the retina.
Vision is possible because of the distribution of discrete light receptors on the
surface of the retina: cones and rods (6-7 milion cones, 75-150 milion rods),
Cones: located in the central part of the retina (fovea), they are sensitive to
colors, vision of detail, each cone is link to its own nerve
cone vision = photopic or bright-light vision
Fovea = the place where the image of the object of interest falls on

Digital Image Processing

Course 1

Rods : distributed over al the retina surface, several rods are contected to
a single nerve, not specialized in detail vision,
serve to give a general, overall picture of the filed of view
not involved in color vision
sensitive to low level of illumination
Blind spot: region without receptors

Digital Image Processing

Course 1

Distribution of
rods and cones
in the retina

Digital Image Processing

Course 1

I
Image
formation
f
ti in
i the
th eye
Ordinary photographic camera: the lens has fixed focal length, focusing at
various distances is done by modifying the distance between the lens and the
image plane (were the film or imaging chip are located)
Human eye: the distance between the lens and the retina (the imaging region)
i fifixed,
is
d th
the focal
f
l length
l
th needed
d d to
t achieve
hi
proper ffocus iis obtained
bt i d b
by varying
i
the shape of the lens (the fibers in the ciliary body accomplish this,
flattening or thickening the lens for distant or near objects, respectively.
distance between lens and retina along visual axix = 17 mm
range of focal length = 14 mm to 17 mm

Digital Image Processing

Course 1

Digital Image Processing

Course 1

Illustration of Mach band effect


Percieved intensity
is not a simple function
of actual intensity

Digital Image Processing

Course 1

All the inner squares have the same intensity, but they appear progressively darker as the
background becomes lighter

Digital Image Processing

Course 1

Optical illusions

Digital Image Processing

Course 1

Digital Image Processing


Course 2

Digital Image Processing


Course 2

- achromatic or monochromatic light - light that is void of color


- the attribute of such light is its intensity, or amount.
- gray level is used to describe monochromatic intensity because
it ranges from black, to grays, and to white.
- chromatic light spans the electromagnetic energy spectrum
from approximately 0.43 to 0.79 mm.
- quantities that describe the quality of a chromatic light source:
o radiance
the total amount of energy that flows from the light
source, and it is usually measured in watts (W)
o luminance
measured in lumens (lm), gives a measure of the
amount of energy an observer perceives from a light
source.

Digital Image Processing


Course 2

For example, light emitted from a source operating in the far


infrared region of the spectrum could have significant energy
(radiance), but an observer would hardly perceive it; its luminance
would be almost zero.
o brightness
a subjective descriptor of light perception that is
practically impossible to measure. It embodies the
achromatic notion of intensity and is one of the key
factors in describing color sensation.

Digital Image Processing


Course 2

Digital Image Processing


Course 2

f : D , f ( x , y ) the physical meaning is determined by the source of the image


Image generated from a physical process f(x,y) proportional to the energy radiated
by the physical source.
0 f ( x, y)
f(x,y) characterized by two components:
1. i(x,y) = illumination component, the amount of source illumination incident on the
scene being viewed ;
2. r(x,y) = reflectance component, the amount of illumination reflected by the objects
in the scene;
f ( x, y) i( x, y) r ( x, y)
0 i( x, y)

0 r ( x, y) 1

Digital Image Processing


Course 2

r(x,y)=0 - total absorption

r(x,y)=1 - total reflectance

i(x,y) determined by the illumination source


r(x,y) determined by the characteristics of the imaged objects
Lmin l f ( x0 , y0 ) Lmax , Lmin imin rmin , Lmax imax rmax
Lmin 10 , Lmax 1000 indoor values without additional illumination

Lmin , Lmax is called gray (or intensity) scale


In practice: Lmin 0 , Lmax L 1 0, L 1 , l 0 black , l L 1 white

Digital Image Processing


Course 2

Image Sampling and Quantization


converting a continuous image f to digital form

- digitizing (x,y) is called sampling


- digitizing f(x,y) is called quantization

Digital Image Processing


Course 2

Continuous image projected onto a sensor array Result of image sampling and quantization

Digital Image Processing


Course 2

Representing Digital Images


(x,y) x = 0,1,,M-1 , y = 0,1,,N-1 spatial variables or spatial coordinates
f (0,0)
f (1,0)
f ( x, y)

f ( M 1,0)

a0,0
a
1,0
A

a M 1,0

a0,1
a1,1

a M 1,1

f (0,1)

f (1,1)

f ( M 1,1)

f (0, N 1)
f (1, N 1)

f ( M 1, N 1)

a0, N 1
a f ( x i , y j ) f (i , j )
a1, N 1
MN , i , j

ai , j image element, pixel

a M 1, N 1

f(0,0) the upper left corner of the image

Digital Image Processing


Course 2

M, N 0, L=2k
ai , j , ai , j [0, L 1]
Dynamic range of an image = the ratio of the maximum measurable intensity to the
minimum detectable intensity level in the system
Upper limit determined by saturation, lower limit - noise

Digital Image Processing


Course 2

Number of bits required to store a digitized image:

b M N k ,

for M N , b N 2 k

When an image can have 2k intensity levels, the image is referred as a k-bit image
256 discrete intensity values 8-bit image

Digital Image Processing


Course 2

Spatial and Intensity Resolution


Spatial resolution the smallest discernible detail in an image

Measures: line pairs per unit distance, dots (pixels) per unit distance
Image resolution = the largest number of discernible line pairs per unit distance
(e.g. 100 line pairs per mm)
Dots per unit distance are commonly used in printing and publishing
In U.S. the measure is expressed in dots per inch (dpi)
(newspapers are printed with 75 dpi, glossy brochures at 175 dpi)
Intensity resolution the smallest discernible change in intensity level

The number of intensity levels (L) is determined by hardware considerations


L=2k most common k = 8
Intensity resolution, in practice, is given by k (number of bits used to quantize intensity)

Digital Image Processing


Course 2

Fig.1 Reducing spatial resolution: 1250 dpi(upper left), 300 dpi (upper right)
150 dpi (lower left), 72 dpi (lower right)

Digital Image Processing


Course 2
Reducing the number of gray levels: 256, 128, 64, 32

Digital Image Processing


Course 2
Reducing the number of gray levels: 16, 8, 4, 2

Digital Image Processing


Course 2

Image Interpolation
- used in zooming, shrinking, rotating, and geometric corrections
Shrinking, zooming image resizing image resampling methods
Interpolation is the process of using known data to estimate values at unknown locations
Suppose we have an image of size 500 500 pixels that has to be enlarged 1.5 times to
750750 pixels. One way to do this is to create an imaginary 750 750 grid with the
same spacing as the original, and then shrink it so that it fits exactly over the original
image. The pixel spacing in the 750 750 grid will be less than in the original image.
Problem: assignment of intensity-level in the new 750 750 grid

Nearest neighbor interpolation: assign for every point in the new grid (750 750) the
intensity of the closest pixel (nearest neighbor) from the old/original grid (500 500).

Digital Image Processing


Course 2

This technique has the tendency to produce undesirable effects, like severe distortion of
straight edges.
Bilinear interpolation assign for the new (x,y) location the following intensity:
v( x, y ) a x b y c x y d
where the four coefficients are determined from the 4 equations in 4 unknowns that can
be written using the 4 nearest neighbors of point (x,y).
Bilinear interpolation gives much better results than nearest neighbor interpolation, with a
modest increase in computational effort.
Bicubic interpolation assign for the new (x,y) location an intensity that involves the 16
nearest neighbors of the point:
3

v ( x , y ) ci , j x i y j
i 0 j0

The coefficients ci,j are obtained solving a 16x16 linear system:

Digital Image Processing


Course 2
3

c
i 0 j 0

i j
x
y intensity levels of the 16 nearest neighbors of ( x , y )
i, j

Generally, bicubic interpolation does a better job of preserving fine detail than the
bilinear technique. Bicubic interpolation is the standard used in commercial image editing
programs, such as Adobe Photoshop and Corel Photopaint.
Figure 2 (a) is the same as Fig. 1 (d), which was obtained by reducing the resolution of
the 1250 dpi in Fig. 1(a) to 72 dpi (the size shrank from 3692 2812 to 213 162) and
then zooming the reduced image back to its original size. To generate Fig. 1(d) nearest
neighbor interpolation was used (both for shrinking and zooming).
Figures 2(b) and (c) were generated using the same steps but using bilinear and bicubic
interpolation, respectively. Figures 2(d)+(e)+(f) were obtained by reducing the resolution
from 1250 dpi to 150 dpi (instead of 72 dpi)

Digital Image Processing


Course 2

Fig. 2 Interpolation examples for zooming and shrinking (nearest neighbor, linear, bicubic)

Digital Image Processing


Course 2

Neighbors of a Pixel
A pixel p at coordinates (x,y) has 4 horizontal and vertical neighbors:
horizontal: ( x 1, y ) , ( x 1, y ) ; vertical: ( x , y 1) , ( x , y 1)
This set of pixels, called the 4-neighbors of p, denoted by N4 (p).
The 4 diagonal neighbors of p have coordinates:
( x 1, y 1) , ( x 1, y 1) , ( x 1, y 1) , ( x 1, y 1)
and are denoted ND(p).
The horizontal, vertical and diagonal neighbors are called the 8-neighbors of p, denoted
N8 (p).
If (x,y) is on the border of the image some of the neighbor locations in ND(p) and N8(p)
fall outside the image.

Digital Image Processing


Course 2

Adjacency, Connectivity, Regions, Boundaries


Denote by V the set of intensity levels used to define adjacency.
- in a binary image V {0,1} (V={0} , V={1})
- in a gray-scale image with 256 possible gray-levels, V can be any subset of {0,255}
We consider 3 types of adjacency:
(a) 4-adjacency : two pixels p and q with values from V are 4-adjacent if q N 4 ( p )
(b) 8-adjacency : two pixels p and q with values from V are 8-adjacent if q N 8 ( p )
(c) m-adjacency (mixed adjacency) : two pixels p and q with values from V are
m-adjacent if :
q N 4 ( p ) or
q N D ( p ) and the set N 4 ( p ) N 4 (q ) has no pixels whose values are from V.

Mixed adjacency is a modification of 8-adjacency. It is introduced to eliminate the


ambiguities that often arise when 8-adjacency is used. Consider the example:

Digital Image Processing


Course 2

V {1} binar image

1
1
0

, 0

1 1

1
0


0
1

, 0

1 1

1
0

0
1

The three pixels at the top (first line) in the above example show multiple (ambiguous)
8-adjacency, as indicated by the dashed lines. This ambiguity is removed by using
m-adjacency.

Digital Image Processing


Course 2

A (digital) path (or curve) from pixel p with coordinates (x,y) to q with coordinates (s,t)
is a sequence of distinct pixels with coordinates:

( x0 , y0 ) ( x , y ) , ( x1 , y1 ), ... , ( xn , yn ) ( s , t )
( xi 1 , yi 1 ) and ( xi , yi ) are adjacent, i 1, 2,..., n
The length of the path is n. If ( x0 , y0 ) ( xn , yn ) the path is closed.
Depending on the type of adjacency considered the paths are: 4-, 8-, or m-paths.
Let S denote a subset of pixels in an image. Two pixels p and q are said to be connected
in S if there exists a path between them consisting only of pixels from S.
S is a connected set if there is a path in S between any 2 pixels in S.
Let R be a subset of pixels in an image. R is a region of the image if R is a connected set.
Two regions R1 and R2 are said to be adjacent if R1 R2 form a connected set. Regions
that are not adjacent are said to be disjoint. When referring to regions only 4- and
8-adjacency are considered.

Digital Image Processing


Course 2

Suppose that an image contains K disjoint regions, Rk , k 1,..., K , none of which


touches the image border.
Ru

R
k 1

, ( Ru )c the complement of Ru

We call al the points in Ru the foreground of the image and the points in ( Ru )c the
background of the image.

The boundary (border or contour) of a region R is the set of point that are adjacent to
points in the complement of R, (R)c. The border of an image is the set of pixels in the
region that have at least one background neighbor. This definition is referred to as the
inner border to distinguish it from the notion of outer border which is the corresponding

border in the background.

Digital Image Processing


Course 2

Distance measures
For pixels p, q, and z, with coordinates (x,y), (s,t) and (v,w) respectively, D is a distance
function or metric if:

(a) D(p,q) 0 , D(p,q) = 0 iff p=q


(b) D(p,q) = D(q,p)
(c) D(p,z) D(p,q) + D(q,z)
The Euclidean distance between p and q is defined as:
1
2

De ( p, q ) ( x s ) ( y t ) ( x s )2 ( y t )2
2

The pixels q for which De ( p, q ) r are the points contained in a disk of radius r
centered at (x,y).

Digital Image Processing


Course 2

The D4 distance (also called city-block distance) between p and q is defined as:
D4 ( p, q ) | x s | | y t |
The pixels q for which D4 ( p, q ) r form a diamond centered at (x,y).
2
2 1
D4 2

2 1 0 1
2 1

The pixels with D4 = 1 are the 4-neighbors of (x,y).


The D8 distance (called the chessboard distance) between p and q is defined as:
D8 ( p, q ) max{| x s | ,| y t |}
The pixels q for which D8 ( p, q ) r form a square centered at (x,y).

Digital Image Processing


Course 2

2 2 2 2 2
D8 2

2 1 1 1

2 1 0 1

2 1 1 1

2 2 2 2 2

The pixels with D8 = 1 are the 8-neighbors of (x,y).


D4 and D8 distances are independent of any paths that might exist between p and q
because these distances involve only the coordinates of the point. If we consider
m-adjacency, the distance Dm is defined as:
Dm(p,q)= the shortest m-path between p and q
Dm depends on the values of the pixels along the path as well as the values of their
neighbors. Consider the following example:

Digital Image Processing


Course 2

p3 {0,1}
p1 {0,1}

p4 1

p2 1

p1

Consider V={1}.
If p1 = p3 = 0 then Dm(p , p4) = 2.
If p1 = 1 , then p2 and p are no longer m-adjacent then Dm(p , p4) = 3 (p, p1, p2, p4).
If p1 = 0, p3 = 1 then Dm(p , p4) = 3.
If p1 = p3 = 1 then Dm(p , p4) = 4 (p, p1, p2, p3, p4).

Digital Image Processing


Course 2

Array versus Matrix Operations

An array operation involving one or more images is carried out on a pixel-by-pixel basis.
a11
a
21

a12
a22

b11
b
21

b12
b22

Array product:
a11
a
21

a12 b11
a22 b21

b12
a11b11

a b
b22
21 21

a12b12
a22b21

Matrix product:
a11
a
21

a12 b11
a22 b21

b12
a11b11 a12b21

a b a b
b22
22 21
21 11

We assume array operations unless stated otherwise!

a11b12 a12b21
a21b12 a22b22

Digital Image Processing


Course 2

Linear versus Nonlinear Operations

One of the most important classifications of image-processing methods is whether it is


linear or nonlinear.
H f ( x , y ) g ( x , y )
H is said to be a linear operator if:
H a f1 ( x , y ) b f 2 ( x , y ) a H f1 ( x , y ) b H f 2 ( x , y )
a , b , f1 , f 2 images

Example of nonlinear operator:


H f max{ f ( x , y )} the maximum value of the pixels of image f
0 2
6 5
f1
, f 2 4 7 , a 1, b 1
2
3

Digital Image Processing


Course 2

0 2
6 3
6 5
maxa f1 b f 2 max 1

1)

max

4 7
2
2
3

0 2
6 5
1 max

1)

max

3 ( 1)7 4

2 3
4 7

Arithmetic Operations in Image Processing

Let g(x,y) denote a corrupted image formed by the addition of noise:


g( x , y ) f ( x , y ) ( x , y )
f(x,y) the noiseless image ; (x,y) the noise, uncorrelated and has 0 average value.

Digital Image Processing


Course 2

For a random variable z with mean m, E[(z-m)2] is the variance (E( ) is the expected
value). The covariance of two random variables z1 and z2 is defined as E[(z1-m1) (z2-m2)].
The two random variables are uncorrelated when their covariance is 0.
Objective: reduce noise by adding a set of noisy images gi ( x , y ) (technique frequently
used in image enhancement)

1
g( x, y)
K

g ( x, y)
i 1

If the noise satisfies the properties stated above we have:


E g ( x, y) f ( x, y)

g2 ( x , y )

1 2
( x, y)
K

E ( g ( x , y )) is the expected value of g , and g2 ( x , y ) and 2( x , y ) are the variances of


g and , respectively. The standard deviation (square root of the variance) at any point in
the average image is:

Digital Image Processing


Course 2

g( x, y)

1
( x, y)
K

As K increases, the variability (as measured by the variance or the standard deviation) of
the pixel values at each location (x,y) decreases. Because E g ( x , y ) f ( x , y ) , this
means that g ( x , y ) approaches f(x,y) as the number of noisy images used in the
averaging process increases.
An important application of image averaging is in the field of astronomy, where imaging
under very low light levels frequently causes sensor noise to render single images
virtually useless for analysis. Figure 2.26(a) shows an 8-bit image in which corruption
was simulated by adding to it Gaussian noise with zero mean and a standard deviation of
64 intensity levels. Figures 2.26(b)-(f) show the result of averaging 5, 10, 20, 50 and 100
images, respectively.

Digital Image Processing


Course 2

a b c
d e f

Fig. 3 Image of Galaxy Pair NGC 3314 corrupted by additive Gaussian noise (left corner); Results of averaging 5, 10, 20, 50,
100 noisy images

Digital Image Processing


Course 2

A frequent application of image subtraction is in the enhancement of differences between


images.

(a)

(b)

(c)

Fig. 4 (a) Infrared image of Washington DC area; (b) Image obtained from (a) by setting to zero the least
significant bit of each pixel; (c) the difference between the two images

Figure 4(b) was obtained by setting to zero the least-significant bit of every pixel in
Figure 4(a). The two images seem almost the same. Figure 4(c) is the difference between

Digital Image Processing


Course 2

images (a) and (b). Black (0) values in Figure (c) indicate locations where there is no
difference between images (a) and (b).

Mask mode radiography

g ( x , y ) f ( x , y ) h( x , y )
h(x,y) , the mask, is an X-ray image of a region of a patients body, captured by an
intensified TV camera (instead of traditional X-ray film) located opposite an X-ray
source. The procedure consists of injecting an X-ray contrast medium into the patients
bloodstream, taking a series of images called live images (denoted f(x,y)) of the same
anatomical region as h(x,y), and subtracting the mask from the series of incoming live
images after injection of the contrast medium.
In g(x,y) we can find the differences between h and f, as enhanced detail.

Digital Image Processing


Course 2

Images being captured at TV rates, we obtain a movie showing how the contrast medium
propagates through the various arteries in the area being observed.

a b
c d
Fig. 5 Angiography subtraction
example
(a) mask image; (b) live image ;
(c) difference between (a) and (b);
(d) - image (c) enhanced

Digital Image Processing


Course 2

An important application of image multiplication (and division) is shading correction.


Suppose that an imaging sensor produces images in the form:

g ( x , y ) f ( x , y ) h( x , y )
f(x,y) the perfect image , h(x,y) the shading function
When the shading function is known:

f ( x, y)

g( x , y )
h( x , y )

h(x,y) is unknown but we have access to the imaging system, we can obtain an
approximation to the shading function by imaging a target of constant intensity. When the
sensor is not available, often the shading pattern can be estimated from the image.

Digital Image Processing


Course 2

(a)

(b)

(c)

Fig. 6 Shading correction (a) Shaded image of a tungsten filament, magnified 130 ; (b) - shading pattern ; (c) corrected image

Digital Image Processing


Course 2

Another use of image multiplication is in masking, also called region of interest (ROI),
operations. The process consists of multiplying a given image by a mask image that has
1s (white) in the ROI and 0s elsewhere. There can be more than one ROI in the mask
image and the shape of the ROI can be arbitrary, but usually is a rectangular shape.

(a)

(b)

(c)

Fig. 7 (a) digital dental X-ray image; (b) - ROI mask for teeth with fillings; (c) product of (a) and (b)

Digital Image Processing


Course 2

In practice, most images are displayed using 8 bits the image values are in the range

[0,255].
TIFF, JPEG images conversion to this range is automatic. The conversion depends on
the system used.
Difference of two images can produce image with values in the range [-255,255]
Addition of two images range [0,510]
Many software packages simply set the negative values to 0 and set to 255 all values
greater than 255.
A more appropriate procedure: compute

f m f min( f )
which creates an image whose minimum value is 0, then we perform the operation:
fs K

fm
0 , K
max( f m )

( K 255 )

Digital Image Processing


Course 2

Spatial Operations
- are performed directly on the pixels of a given image.
There are three categories of spatial operations:
single-pixel operations
neighborhood operations
geometric spatial transformations

Single-pixel operations
- change the values of intensity for the individual pixels

s T (z)
where z is the intensity of a pixel in the original image and s is the intensity of the
corresponding pixel in the processed image. Fig. 2.34 shows the transformation used to
obtain the negative of an 8-bit image

Digital Image Processing


Course 2

Intensity transformation
function for the
complement of an 8-bit
image

Original digital mammogram

Negative image of the mammogram

Digital Image Processing


Course 2

Neighborhood operations
Let Sxy denote a set of coordinates of a neighborhood centered on an arbitrary point (x,y)
in an image, f. Neighborhood processing generates an new intensity level at point (x,y)
based on the values of the intensities of the points in Sxy. For example, if Sxy is a
rectangular neighborhood of size m n centered in (x,y), we can assign the new value of
intensity by computing the average value of the pixels in Sxy.

g( x , y )

1
f (r , c)

m n ( r ,c )S xy

The net effect is to perform local blurring in the original image. This type of process is
used, for example, to eliminate small details and thus render blobs corresponding to the
largest region of an image.

Digital Image Processing


Course 2

Aortic angiogram

Result of applying an averaging filter


(m=n=41)

Digital Image Processing


Course 3

Digital Image Processing


Course 3

Spatial Operations
- are performed directly on the pixels of a given image.
There are three categories of spatial operations:
single-pixel operations
neighborhood operations
geometric spatial transformations

Single-pixel operations
- change the values of intensity for the individual pixels
s T (z)
where z is the intensity of a pixel in the original image and s is the intensity of the
corresponding pixel in the processed image.

Digital Image Processing


Course 3

Neighborhood operations
Let Sxy denote a set of coordinates of a neighborhood centered on an arbitrary point (x,y)
in an image, f. Neighborhood processing generates an new intensity level at point (x,y)
based on the values of the intensities of the points in Sxy. For example, if Sxy is a
rectangular neighborhood of size m x n centered in (x,y), we can assign the new value of
intensity by computing the average value of the pixels in Sxy.

g( x , y )

1
f (r , c)
m n ( r ,c )S xy

The net effect is to perform local blurring in the original image. This type of process is
used, for example, to eliminate small details and thus render blobs corresponding to the
largest region of an image.

Digital Image Processing


Course 3

Geometric spatial transformations and image registration


- modify the spatial relationship between pixels in an image
- these transformations are often called rubber-sheet transformations (analogous to
printing an image on a sheet of rubber and then stretching the sheer according to a
predefined set of rules.
A geometric transformation consists of 2 basic operations:
(a) a spatial transformation of coordinates
(b) intensity interpolation that assign intensity values to the spatial transformed
pixels
The coordinates transformation:

( x , y ) T [(v , w )]
(v,w) pixel coordinates in the original image
(x,y) pixel coordinates in the transformed image

Digital Image Processing


Course 3

v w
T [( v , w )] ( , ) shrinks the original image half its size in both spatial directions
2 2

Affine transform

t11
[ x , y ,1] [v , w ,1]T [v , w ,1] t 21
t 31

t12
t 22
t 32

0
0
1

x t11v t 21 w t 31
y t12v t 22 w t 33

(AT)

This transform can scale, rotate, translate, or sheer a set of coordinate points, depending
on the elements of the matrix T. If we want to resize an image, rotate it, and move the
result to some location, we simply form a 3x3 matrix equal to the matrix product of the
scaling, rotation, and translation matrices from Table 1.

Digital Image Processing


Course 3

Affine transformations

Digital Image Processing


Course 3

The preceding transformations relocate pixels on an image to new locations. To complete


the process, we have to assign intensity values to those locations. This task is done by
using intensity interpolation (like nearest neighbor, bilinear, bicubic interpolation).
In practice, we can use equation (AT) in two basic ways:

forward mapping : scanning the pixels of the input image and, at each location (v,w),
computing the spatial location (x,y) of the corresponding in the image using (AT)
directly;
Problems:
- intensity assignment when 2 or more pixels in the original image are transformed to the
same location in the output image,
- some output locations have no correspondent in the original image (no intensity
assignment)

Digital Image Processing


Course 3

inverse mapping: scans the output pixel locations, and at each location, (x,y),
computes the corresponding location in the input image (v,w)
( v , w ) T 1 ( x , y )

It then interpolates among the nearest input pixels to determine the intensity of the output
pixel value.
Inverse mapping are more efficient to implement than forward mappings and are used in
numerous commercial implementations of spatial transformations (MATLAB for ex.).

Digital Image Processing


Course 3

Digital Image Processing


Course 3

Image registration align two or more images of the same scene


In image registration, we have available the input and output images, but the specific
transformation that produced the output image from the input is generally unknown.
The problem is to estimate the transformation function and then use it to register the two
images.
- it may be of interest to align (register) two or more image taken at approximately the
same time, but using different imaging systems (MRI scanner, and a PET scanner).
- align images of a given location, taken by the same instrument at different moments of
time (satellite images)
Solving the problem: using tie points (also called control points), which are
corresponding points whose locations are known precisely in the input and reference
image.

Digital Image Processing


Course 3

How to select tie points?


- interactively selecting them
- use of algorithms that try to detect these points
- some imaging systems have physical artifacts (small metallic objects) embedded in the
imaging sensors. These objects produce a set of known points (called reseau marks)
directly on all images captured by the system, which can be used as guides for
establishing tie points.
The problem of estimating the transformation is one of modeling. Suppose we have a set
of 4 tie points both on the input image and the reference image. A simple model based on
a bilinear approximation is given by:

x c1v c2 w c3v w c4
y c5v c6 w c7 v w c8
(v,w) and (x,y) are the coordinates of the tie points (we get a 8x8 linear system for {ci })

Digital Image Processing


Course 3

When 4 tie points are insufficient to obtain satisfactory registration, an approach used
frequently is to select a larger number of tie points and using this new set of tie points
subdivide the image in rectangular regions marked by groups of 4 tie points. On the
subregions marked by 4 tie points we applied the transformation model described above.
The number of tie points and the sophistication of the model required to solve the register
problem depend on the severity of the geometrical distortion.

Digital Image Processing


Course 3

ab
cd
(a) reference image
(b) geometrically distorted image
(c) - registered image
(d) difference between (a) and (c)

Digital Image Processing


Course 3

Probabilistic Methods

zi = the values of all possible intensities in an MxN digital image, i=0,1,,L-1


p(zk) = the probability that the intensity level zk occurs in the given image
p( z k )

nk
MN

nk = the number of times that intensity zk occurs in the image (MN is the total number of
pixels in the image)
L 1

p( z
k 0

)1

The mean (average) intensity of an image is given by:


L 1

m zk p( zk )
k 0

Digital Image Processing


Course 3

The variance of the intensities is:


L 1

( zk m )2 p( zk )
2

k 0

The variance is a measure of the spread of the values of z about the mean, so it is a
measure of image contrast. Usually, for measuring image contrast the standard deviation
( ) is used.
The n-th moment of a random variable z about the mean is defined as:
L 1

n ( z ) ( z k m ) n p( z k )
k 0

( 0 ( z ) 1 , 1 ( z ) 0 , 2 ( z ) 2 )

3 ( z ) 0 the intensities are biased to values higher than the mean ;


( 3 ( z ) 0 the intensities are biased to values lower than the mean) ;

Digital Image Processing


Course 3

3 ( z ) 0 the intensities are distributed approximately equally on both side of the


mean

Fig.1

(a)

Low contrast

(b) medium contrast

(c) high contrast

Figure 1(a) standard deviation 14.3 (variance = 204.5)


Figure 1(b) standard deviation 31.6 (variance = 998.6)
Figure 1(c) standard deviation 49.2 (variance = 2420.6)

Digital Image Processing


Course 3

Intensity Transformations and Spatial Filtering


g ( x , y ) T f ( x , y )
f(x,y) input image , g(x,y) output image , T an operator on f defined over a
neighborhood of (x,y).
- the neighborhood of the point (x,y), Sxy usually is rectangular, centered on (x,y), and
much smaller in size than the image

Digital Image Processing


Course 3

- spatial filtering, the operator T (the neighborhood and the operation applied on it) is
called spatial filter (spatial mask, kernel, template or window)

S xy {( x , y )} T becomes an intensity (gray-level or mapping) transformation function


s T (r )
s and r are denoting, respectively, the intensity of g and f at (x,y).

Fig. 2
Intensity transformation functions.
left - contrast stretching
right - thresholding function

Figure 2 left - T produces an output image of higher contrast than the original, by
darkening the intensity levels below k and brightening the levels above k this technique
is called contrast stretching.

Digital Image Processing


Course 3

Figure 2 right - T produces a binary output image. A mapping of this form is called
thresholding function.

Some Basic Intensity Transformation Functions


Image Negatives
The negative of an image with intensity levels in [0 , L-1] is obtain using the function

s T (r ) L 1 r
- equivalent of a photographic negative
- technique suited for enhancing white or gray detail embedded in dark regions of an
image

Digital Image Processing


Course 3

Fig. 3
Left original digital mammogram
Right negative transformed image

Digital Image Processing


Course 3

Log Transformations
s T ( r ) c log(1 r ) , c - constant , r 0

Some basic intensity transformation functions

Digital Image Processing


Course 3

This transformation maps a narrow range of low intensity values in the input into a wider
range. An operator of this type is used to expand the values of dark pixels in an image
while compressing the higher-level values. The opposite is true for the inverse log
transformation. The log functions compress the dynamic range of images with large
variations in pixel values.

ab
(a) Fourier spectrum
(b) log transformation
applied to (a), c=1
Fig. 4

Figure 4(a) intensity values in the range 0 to 1.5 x 106


Figure 4(b) = log transformation of Figure 4(a) with c=1 range 0 to 6.2

Digital Image Processing


Course 3

Power-Law (Gamma) Transformations


s T ( r ) c r , c , - positive constants ( s c( r ) )

Plots of gamma transformation for different values of (c=1)

Digital Image Processing


Course 3

Power-law curves with 1 map a narrow range of dark input values into a wider range
of output values, with the opposite being true for higher values of input values. The
curves with 1 have the opposite effect of those generated with values of 1 .

c 1 - identity transformation.
A variety of devices used for image capture, printing, and display respond according to a
power law. The process used to correct these power-law response phenomena is called
gamma correction.

Digital Image Processing


Course 3

ab
cd
(a) aerial image
(b) (d) results of applying gamma
transformation with c=1 and
=3.0, 4.0 and 5.0 respectively

Digital Image Processing


Course 3

Piecewise-Linear Transformations Functions


Contrast stretching
- a process that expands the range of intensity levels in an image so it spans the full
intensity range of the recording tool or display device
ab
cd
Fig.5

Digital Image Processing


Course 3

s1
r
r
1
s2 ( r r1 ) s1 ( r2 r )
T (r )

( r2 r1 )
( r2 r1 )
s2 ( L 1 r )

( L 1 r2 )

r [0, r1 ]
r [r1 , r2 ]
r [r2 , L 1]

Digital Image Processing


Course 3

r1 s1 , r2 s2 identity transformation (no change)


r1 r2 , s1 0 , s2 L 1 thresholding function
Figure 5(b) shows an 8-bit image with low contrast.
Figure 5(c) - contrast stretching, obtained by setting the parameters r1 , s1 rmin ,0 ,

r2 , s2 rmax , L 1

where rmin and rmax denote the minimum and maximum gray levels

in the image, respectively. Thus, the transformation function stretched the levels linearly
from their original range to the full range [0, L-1].
Figure 5(d) - the thresholding function was used r1 , s1 m ,0 , r2 , s2 m , L 1
where m is the mean gray level in the image.
The original image on which these results are based is a scanning electron microscope
image of pollen, magnified approximately 700 times.

Digital Image Processing


Course 3

Intensity-level slicing
- highlighting a specific range of intensities in an image
There are two approaches for intensity-level slicing:
1. display in one value (white, for example) all the values in the range of interest and in
another (say, black) all other intensities (Figure 3.11 (a))
2. brighten (or darken) the desired range of intensities but leaves unchanged all other
intensities in the image (Figure 3.11 (b)).

Digital Image Processing


Course 3

Highlights intensity range [A ,B]


and reduces all other intensities to a
lower level

Highlights range [A,B] and


preserves all other intensities

Figure 6 (left) aortic angiogram near the kidney. The purpose of intensity slicing is to
highlight the major blood vessels that appear brighter as a result of injecting a contrast
medium. Figure 6(middle) shows the result of applying technique 1. for a band near the
top of the scale of intensities. This type of enhancement produces a binary image which is
useful for studying the shape of the flow of the contrast substance (to detect blockages)

Digital Image Processing


Course 3

In Figure 3.12(right) the second technique was used: a band of intensities in the midgray image around the mean intensity was set to black, the other intensities remain
unchanged.

Fig. 6 - Aortic angiogram and intensity sliced versions

Digital Image Processing


Course 3

Bit-plane slicing
For a 8-bit image, f(x,y) is a number in [0,255], with 8-bit representation in base 2
This technique highlights the contribution made to the whole image appearances by each
of the bits. An 8-bit image may be considered as being composed of eight 1-bit planes
(plane 1 the lowest order bit, plane 8 the highest order bit)

Digital Image Processing


Course 3

Digital Image Processing


Course 3

The binary image for the 8-th bit plane of an 8-bit image can be obtained by processing
the input image with a threshold intensity transformation function that maps all the
intensities between 0 and 127 to 0 and maps all levels between 128 and 255 to 1.
The bit-slicing technique is useful for analyzing the relative importance of each bit in the
image helps in determining the proper number of bits to use when quantizing the image.
The technique is also useful for image compression.

Digital Image Processing


Course 3

Histogram processing
The histogram of a digital image is with intensity levels in [0 , L-1]:
h( rk ) nk , k 0,1,..., L 1
rk the k -th intensity level
nk the number of pixels in the image with intensity rk

Normalized histogram for an M x N digital image:


p( rk )

nk
, k 0,1,..., L 1
MN

p( rk ) = an estimate of the probability of occurrence of intensity level rk in the image


L 1

p(r ) 1
k 0

Digital Image Processing


Course 3

Fig. 8 dark and light images, low-contrast,


and high-contrast images and their
histograms

Digital Image Processing


Course 3

Histogram Equalization
- determine a transformation function that seeks to produce an output image that has a
uniform histogram

s T (r ) , 0 r L 1
(a) T(r) monotonically increasing
(b) 0 T ( r ) L 1 for 0 r L 1

T(r) monotonically increasing guarantees that intensity values in output image will not
be less than the corresponding input values
Relation (b) requires that both input and output images have the same range of intensities

Digital Image Processing


Course 3

Histogram equalization or histogram linearization transformation

( L 1) k
sk T ( rk ) L 1 pr ( rj )
nj

M
N

j 0
j 0
k

The output image is obtained by mapping each pixel in the input image with intensity rk
into a corresponding pixel with intensity sk in the output image.
Consider the following example: 3-bit image (L=8), 64x64 image (M=N=64, MN=4096)

Intensity distribution and histogram values for a 3-bit 6464 digital image

Digital Image Processing


Course 3
0

s0 T ( r0 ) 7 pr ( rj ) 7 pr ( r0 ) 1.33
j 0

s1 T ( r1 ) 7 pr ( rj ) 7 pr ( r0 ) 7 pr ( r1 ) 3.08
j 0

s2 4.55 , s3 5.67 ,

s4 6.23 ,
s0 1.33 1
s1 3.08 3
s2 4.55 5
s3 5.67 6

s5 6.65 ,

s4
s5
s6
s7

6.23 6
6.65 7
6.86 7
7.00 7

s6 6.86 , s7 7.00

Digital Image Processing


Course 3

Digital Image Processing


Course 3

Digital Image Processing


Course 3

Histogram Matching (Specification)


Sometimes is useful to be able to specify the shape of the histogram that we wish the
output image to have. The method used to generate a processed image that has a specified
histogram is called histogram matching or histogram specification.
Suppose {zq;q=0,,L-1} are the new values of histogram we desire to match.
Consider the histogram equalization transformation for the input image:

( L 1) k
sk T ( rk ) L 1 pr ( rj )
n j , k 0,1,..., L 1

M
N

j 0
j0
k

Consider the histogram equalization transformation for the new histogram:


q

G ( zq ) L 1 pz ( zi ) , q 0,1,..., L 1
i 0

T ( rk ) sk G ( zq ) for some value of q


z q G 1 ( sk )

(2)

(1)

Digital Image Processing


Course 3

Histogram-specification procedure:
1) Compute the histogram pr(r) of the input image, and compute the histogram
equalization transformation (1). Round the resulting values sk to integers in [0, L-1]
2) Compute all values of the transformation function G using relation (2), where pz(zi)
are the values of the specified histogram. Round the values G(zq) to integers in the
range [0, L-1] and store these values in a table
3) For every value of sk ,k=0,1,,L-1 use the table for the values of G to find the
corresponding value of zq so that G(zq) is closest to sk and store these mappings
from s to z. When more than one value of zq satisfies the property (i.e., the mapping
is not unique), choose the smallest value by convention.
4) Form the histogram-specified image by first histogram-equalizing the input image
and then mapping every equalized pixel value, sk , of this image to the corresponding
value zq in the histogram-specified image using the mappings found at step 3).

Digital Image Processing


Course 3

The intermediate step of equalizing the input image can bin skipped by combining the
two transformation functions T and G-1.
Reconsider the above example:

Fig. 9

Digital Image Processing


Course 3

Figure 9(a) shows the histogram of the original image. Figure 9 (b) is the new histogram
to achieve.
The first step is to obtain the scaled histogram-equalized values:
s0 1
s1 3
s2 5
s3 6

s4
s5
s6
s7

6
7
7
7

Then we compute the values of G:


0

G ( z0 ) 7 pz ( zi ) 0.00 , G ( z1 ) G ( z2 ) 0.00 , G ( z3 ) 1.05 1


i 0

G ( z4 ) 2.45 2 , G ( z5 ) 4.55 5, G ( z6 ) 5.95 6 , G ( z7 ) 7.00 7

Digital Image Processing


Course 3

The results of performing step 3) of the procedure are summarized in the next table:

In the last step of the algorithm, we use the mappings in the above table to map every
pixel in the histogram equalized image into a corresponding pixel in the newly-created

Digital Image Processing


Course 3

histogram-specified image. The values of the resulting histogram are listed in the third
column of Table 3.2, and the histogram is sketched in Figure 9(d).
r0 0
r1 1
r2 2
r3 3
r4 4
r5 5
r6 6
r7 7

790
1023
850
656
329
245
122
81

s0 1

790

zq 3

s1 3

1023

zq 4

s2 5

850

zq 5

s3 s4 6

656 329

zq 6

s5 s6 s7 7 245 122 81

zq 7

Digital Image Processing


Course 3

Local Histogram Processing


The histogram processing techniques previously described are easily adaptable to local
enhancement. The procedure is to define a square or rectangular neighborhood and move
the center of this area from pixel to pixel. At each location, the histogram of the points in
the neighborhood is computed and either a histogram equalization or histogram
specification transformation function is obtained. This function is finally used to map the
gray level of the pixel centered in the neighborhood. The center of the neighborhood
region is then moved to an adjacent pixel location and the procedure is repeated.
Updating the histogram obtained in the previous location with the new data introduced at
each motion step is possible.

Digital Image Processing


Course 3

Digital Image Processing


Course 3

Using Histogram Statistics for Image Enhancement


Let r denote a discrete random variable representing discrete gray-levels in [0, L-1], and
let p(ri) denote the normalized histogram component corresponding to the i-th value of
r. The n-th moment of r about its mean is defined as:
L 1

n ( r ) ( ri m )n p( ri )
i 0

m is the mean (average intensity) value of r:


L 1

m ri p( ri ) - measure of average intensity


i 0

L 1

2 ( r ) ( ri m )2 p( ri ) , measure of contrast
2

i 0

Sample mean and sample variance:

1
m
MN

M 1 N 1

x 0 y0

f ( x, y) ,

1

MN
2

M 1 N 1

f ( x, y) m
x 0 y0

Digital Image Processing


Course 3

Spatial Filtering
The name filter is borrowed from frequency domain processing, where filtering means
accepting (passing) or rejecting certain frequency components. Filters that pass low
frequency are called lowpass filters. A lowpass filter has the effect of blurring
(smoothing) an image. The filters are also called masks, kernels, templates or windows.

The Mechanics of Spatial Filtering


A spatial filter consists of:
1) a neighborhood (usually a small rectangle)
2) a predefined operation performed on the pixels in the neighborhood
Filtering creates a new pixel with the same coordinates as the pixel in the center of the
neighborhood, and whose intensity value is modified by the filtering operation.

Digital Image Processing


Course 3

If the operation performed on the image pixels is linear, the filter is called linear spatial
filter, otherwise the filter is nonlinear.

Fig. 10 Linear spatial filtering with a 3 3 filter mask

Digital Image Processing


Course 3

In Figure 10 is pictured a 3 3 linear filter:

g ( x , y ) w ( 1, 1) f ( x 1, y 1) w ( 1,0) f ( x 1, y )
w (0,0) f ( x , y ) w (1,1) f ( x 1, y 1)
For a mask of size m n, we assume m=2a+1 and n=2b+1, where a and b are positive
integers. The general expression of a linear spatial filter of an image of size M N with a
filter of size m n is:

g( x , y )

s a

t b

w( s, t ) f ( x s, y t )

Spatial Correlation and Convolution


Correlation is the process of moving a filter mask over the image and computing the sum

of products at each location. Convolution is similar with correlation, except that the filter
is first rotated by 180.

Digital Image Processing


Course 3

Correlation

w ( x , y ) f ( x , y )

s a

t b

s a

t b

w( s, t ) f ( x s, y t )

Convolution
w ( x , y ) f ( x , y )

w( s, t ) f ( x s, y t )

A function that contains a single 1 and the rest being 0s is called a discrete unit
impulse. Correlating a function with a discrete unit impulse produces a rotated
version of the filter at the location of the impulse.
Linear filters can be found in DIP literature also as: convolution filter,
convolution mask or convolution kernel.

Digital Image Processing


Course 3

Vector Representation of Linear Filtering


mn

R w1 z1 w2 z2 wmn zmn wk zk w T z
k 1

Where the w-s are the coefficients of an mn filter and the z-s are the corresponding
image intensities encompassed by the filter.

R w1 z1 w2 z2 w9 z9 wk zk w T z , w , z 9
k 1

Digital Image Processing


Course 3

Smoothing Linear Filters


A smoothing linear filter computes the average of the pixels contained in the
neighborhood of the filter mask. These filters are called sometimes averaging filters or
lowpass filters.

The process of replacing the value of every pixel in an image by the average of the
intensity levels in the neighborhood defined by the filter mask produces an image with
reduced sharp transitions in intensities. Usually random noise is characterized by such
sharp transitions in intensity levels smoothing linear filters are applied for noise reduction.
The problem is that edges are also characterized by sharp intensity transitions, so
averaging filters have the undesirable effect that they blur edges.
A major use of averaging filters is the reduction of irrelevant detail in an image (pixel
regions that are small with respect to the size of the filter mask).

Digital Image Processing


Course 3

There is the possibility of using weighted average: the pixels are multiplied by different
coefficients, thus giving more importance (weight) to some pixels at the expense of other.

A general weighted averaging filter of size m n (m and n are odd) for an MN image is
given by the expression:
a

g( x , y )

w( s, t ) f ( x s, y t )

s a t b

w( s, t )

s a t b

x 0,1,..., M 1 , y 0,1,..., N 1

Digital Image Processing


Course 3

ab
cd
ef
(a) original image 500500
(b) (f) results of smoothing with square averaging filters
of size m=3,5,9,15, and 35, respectively
The black squares at the top are of size 3, 5, 9, 15, 25, 35,
45, 55. The letters at the bottom range in size from 10 cu 24
points. The vertical bars are 5 pixels wide and 100 pixels
high, separated bu 20 pixels. The diameter of the circles is
25 pixels, and their borders are 15 pixels apart. The noisy
rectangles are 50120 pixels.

Digital Image Processing


Course 3

An important application of spatial averaging is to blur an image for the purpose of


getting a gross representation of objects of interest, such that the intensity of smaller
objects blends with the background and larger object become blob like and easy to
detect. The size of the mask establishes the relative size of the objects that will
disappear in the background.

Left image from the Hubble Space Telescope, 528485; Middle Image filtered with a 1515 averaging mask;
Right result of averaging the middle image

Digital Image Processing


Course 4

Digital Image Processing


Course 4

Order-Statistic (Nonlinear) Filters


Order-statistic filters are nonlinear spatial filters based on
ordering (ranking) the pixels contained in the image area defined
by the selected neighborhood and replacing the value of the center
pixel with the value determined by the ranking result. The best
known filter in this class is the median filter, which replaces the
value of a pixel by the median of the intensity values in the
neighborhood of that pixel (the original value of the pixel is
included in the computation of the median). Median filters provide

Digital Image Processing


Course 4

excellent noise-reduction capabilities, and are less blurring than


linear smoothing filters of similar size. Median filters are
particularly

effective

against

impulse

noise

(also

called

salt-and-pepper noise).
The median,, of a set of values is such that half the values in
the set are less than or equal to , and half are greater than or equal
to .
For a 3 3 neighborhood with intensity values (10, 15, 20, 20,
30, 20, 20, 25, 100) the median is =20.

Digital Image Processing


Course 4

The effect of median filter is to force points with distinct intensity


levels to be more like their neighbors. Isolated clusters of pixels
that are light or dark with respect to their neighbors, and whose
m2
are eliminated by an m m median filter
area is less than
2

(eliminated means forced to the median intensity of the neighbors).


Max/min filter is the filter which replaces the intensity value of the
pixel with the max/min value of the pixels in the neighborhood.
The max/min filter is useful for finding the brightest/darkest points
in an image.

Digital Image Processing


Course 4

Min filter 0% filter


Median filter 50% filter
Max filter 100% filter

(a)
(b)
(c)
(a) X-ray image of circuit board corrupted by salt&pepper noise
(b) noise reduction with a 33 averaging filter
(c) noise reduction with a 33 median filter

Digital Image Processing


Course 4

Sharpening Spatial Filters


The principal objective of sharpening is to highlight transitions in
intensity. These filters are applied in electronic printing, medical
imaging, industrial inspection, autonomous guidance in military
systems.
Averaging analogous to integration
Sharpening spatial differentiation
Image differentiation enhances edges and other discontinuities
(noise, for example) and deemphasizes areas with slowly varying
intensities.

Digital Image Processing


Course 4

For digital images, discrete approximation of derivatives are used

f
f ( x 1) f ( x )
x
2 f
f ( x 1) 2 f ( x ) f ( x 1)
2
x

Digital Image Processing


Course 4

Illustration of the first and second derivatives of a 1-D digital function

Digital Image Processing


Course 4

Using the Second Derivative for Image Sharpening


the Laplacian
Isotropic filters the response of this filter is independent of the
direction of the discontinuities in the image. Isotropic filters are
rotation invariant, in the sense that rotating the image and then
applying the filter gives the same result as applying the filter to the
image and then rotating the result.
The simplest isotropic derivative operator is the Laplacian:
2
2

f
2
f
2
2
x
y

Digital Image Processing


Course 4

This operator is linear.


2 f
f ( x 1, y ) 2 f ( x , y ) f ( x 1, y )
2
x

2 f
f ( x , y 1) 2 f ( x , y ) f ( x , y 1)
2
y
2 f ( x , y ) f ( x 1, y ) f ( x 1, y ) f ( x , y 1) f ( x , y 1) 4 f ( x , y )

Digital Image Processing


Course 4

Filter mask that approximate the Laplacian

The Laplacian being a derivative operator highlights gray-level


discontinuities in an image and deemphasizes regions with slowly
varying gray levels. This will tend to produce images that have

Digital Image Processing


Course 4

grayish edge lines and other discontinuities, all superimposed on a


dark, featureless background. Background features can be
recovered while still preserving the sharpening effect of the
Laplacian operation simply by adding the original and Laplacian
images.
The basic way to use the Laplacian for image sharpening is given
by:
g ( x , y ) f ( x , y ) c 2 f ( x , y )

The (discrete) Laplacian can contain both negative and positive


values it needs to be scaled.

Digital Image Processing


Course 4

Blurred image of the North Pole of the Moon; Lapalce filtered image

Sharpening with c=1 and c=2

Digital Image Processing


Course 4

Unsharp Masking and Highboost Filtering


- process used in printing and publishing industry to sharpen
images
- subtracting an unsharp (smoothed) version of an image from
the original image
1.Blur the original image
2.Subtract the blurred image from the original (the resulting
difference is called the mask)
3.Add the mask to the original

Digital Image Processing


Course 4

Let

f ( x, y)

be the blurred image. The mask is given by:


gmask ( x , y ) f ( x , y ) f ( x , y )

g ( x , y ) f ( x , y ) k gmask ( x , y )
k = 1 unsharp masking
k > 1 highboost filtering

Digital Image Processing


Course 4

original image

blurred image (Gaussian filter 55, =3)

mask difference between the above images

unsharp masking result

highboost filter result (k=4.5)

Digital Image Processing


Course 4

The Gradient for (Nonlinear) Image Sharpening


f
g x x
f grad ( f )
g y f
y

- it points in the direction of the greatest rate of change of f at


location (x,y).
The magnitude (length) of the gradient is defined as:
M ( x , y ) mag(f )

g x2 g 2y

Digital Image Processing


Course 4

M(x,y) is an image of the same size as the original called the


gradient image (or simply as the gradient). M(x,y) is rotation
invariant (isotropic) (the gradient vector

is not isotropic). In

some application the following formula is used:


M ( x, y) gx g y

(not isotropic)

Different ways of approximating g x and g y produce different filter


operators.

Digital Image Processing


Course 4

Roberts cross-gradient operator (1965)


g x f ( x 1, y 1) f ( x , y ) 1
g y f ( x , y 1) f ( x 1, y ) 2

M ( x , y ) 12 22
M ( x, y) 1 2

Digital Image Processing


Course 4

Sobel operators
g x f ( x 1, y 1) 2 f ( x , y 1) f ( x 1, y 1)

f ( x 1, y 1) 2 f ( x , y 1)

f ( x 1, y 1)

g y f ( x 1, y 1) 2 f ( x 1, y ) f ( x 1, y 1)

f ( x 1, y 1) 2 f ( x 1, y )

f ( x 1, y 1)

Digital Image Processing


Course 4

Roberts cross gradient operators

Sobel operators

Digital Image Processing


Course 4

Filtering in the Frequency Domain


Filter: a device or material for suppressing or minimizing waves or
oscillations of certain frequencies
Frequency: the number of times that a periodic function repeats the
same sequence of values during a unit variation of the independent
variable

Fourier series and Transform


Fourier in a memoir in 1807 and published in 1822 in his book La
Thorie Analitique de la Chaleur states that any periodic function
can be expressed as the sum of sines and/or cosines of different

Digital Image Processing


Course 4

frequencies, each multiplied by a different coefficient (called now


a Fourier series). Even function that are not periodic (but whose
area under the curve is finite) can be expressed as the integral of
sines and/or cosines multiplied by a weighing function the
Fourier transform. Both representation share the characteristic that
a function expressed in either a Fourier series or transform, can be
reconstructed (recovered) completely via an inverse process, with
no loss of information. It allows us to work in the Fourier
domain and then return to the original domain of the function
without losing any information.

Digital Image Processing


Course 4

Complex Numbers
C Ri I ,

R, I , i 1 , R - real part , C imaginary part

C R i I the conjugate of the complex number C


C C ( cos i sin ) , C R 2 I 2 complex number in polar coordinates

e i cos i sin Euler's formula

C C e i

Digital Image Processing


Course 4

Fourier series
f(t) a periodic function ( f ( t T ) f ( t ) t )
n

ce

f (t )

1
cn
T

T
2
T

f ( t )e

2 n
t
T

2 n
t
T

dt n 0, 1, 2,...

Impulses and the Sifting Property


A unit impulse located at t=0, denoted (t) is defined as:

(t )
0

if t 0
if t 0

satisfying

( t )dt 1

Digital Image Processing


Course 4

Physically, an impulse may be interpreted as a spike of infinity


amplitude and zero duration, having unit area. An impulse has the
sifting property with respect to integration:

f ( t ) ( t )dt f (0) , f continuous in t 0

f ( t ) ( t t0 )dt f ( t0 ) , f continuous in t0

The unit discrete impulse, (x) is defined as:

1
( x)
0

if x 0
if x 0

satisfying

( x) 1

Digital Image Processing


Course 4

The sifting property:

f ( x ) ( x ) f (0)

f ( x ) ( x x0 ) f ( x0 )

The impulse train, sT (t) :

s T ( t )

( t n T )

Digital Image Processing


Course 4

The Fourier Transform of Function of One Continuous Variable

The Fourier transform of a continuous function f(t) of a


continuous variable t is:
F f ( t ) F ( )

f ( t )e i 2 t dt

Conversely, given F ( ) , we can obtain f(t) back using the inverse


Fourier transform, f ( t ) F 1 F ( ), given by:

f ( t ) F ( ) e i 2 t d

Digital Image Processing


Course 4

F ( )

f ( t ) cos(2 t ) i sin(2 t ) dt

The sinc function:

sin( x )
sinc( x )
, sinc(0) 1
x

Digital Image Processing


Course 4

The Fourier transform of the unit impulse:

F ( ) ( t )e i 2 t dt 1

F ( ) ( t t0 )e i 2 t dt cos(2 t0 ) i sin(2 t0 )

The Fourier series for the impulse train, sT:


1
s T ( t )
T

2 n
t
T

i 2Tn t
n

F e

Digital Image Processing


Course 4

The Fourier transform of the periodic impulse train, S() is also an


impulse train:
1
S( )
T

Convolution

f ( t )h( t )

f ( s ) g ( t s ) ds , f , h continuous functions

F f ( t )h( t ) H ( )F ( )
F f ( t ) h( t ) H ( )F ( )

Digital Image Processing


Course 4

Convolution

in

the

frequency

domain

is

analogous

to

multiplication in the spatial domain.


The convolution theorem is the foundation for filtering in the
frequency domain.

Sampling and the Fourier Transform of Sampled Functions


Continuous functions have to be converted into a sequence of
discrete valuees in order to be used by a computer. Consider a
continuous function, f(t), that we wish to sample at uniform

Digital Image Processing


Course 4

intervals (T). We assume that the function extends from - to .


One way to model sampling is by using an impulse train function:

f ( t ) f ( t ) sT ( t )

f ( t ) ( t n T ) , f ( t ) sampled function

The value fk of an arbitrary sample in the sequence is given by:


fk

f ( t ) ( t k T )dt f ( k T )

Digital Image Processing


Course 4

Digital Image Processing


Course 4

The Fourier Transform of a Sampled Function


Let F () be the Fourier transform of a continuous function f (t)
and let

f ( t )

the sampled function. The Fourier transform of the

sampled function is:

F ( ) F f ( t ) = F f ( t ) sT ( t ) F ( )S ( ) ,
1
S( )
T

1
F ( ) F ( )S ( )
T

Digital Image Processing


Course 4

The Fourier transform

F ( )

of the sampled function

f ( t )

is an

1
infinite, periodic sequence of copies of F(), the period is
.
T
The Sampling Theorem
Consider the problem of establishing the conditions under which a
continuous function can be recovered uniquely from a set of its
samples.
A function f(t) is called band-limited if its Fourier transform is 0
outside the interval [-max ,max].

Digital Image Processing


Course 4

We can recover f(t) from its sampled version if we can isolate a


copy of F() from the periodic sequence of copies of this function
contained in F ( ) , the transform of the sampled function f ( t ) .

Digital Image Processing


Course 4

1
Recall that F ( ) is continuous, periodic with period
. All we
T
need is one complete period to characterize the entire transform.
This implies that we can recover f(t) from that single period by
using the inverse Fourier transform.
Extracting from F ( ) a single period that is equal to F() is
possible if the separation between copies is sufficient, i.e.,

1
max
2 T

1
2 max
T

Digital Image Processing


Course 4

Sampling Theorem
A continuous, band-limited function can be recovered completely
from a set of its samples if the samples are acquired at a rate
exceeding twice the highest frequency content of the function.
The number 2 max is called Nyquist rate.

Digital Image Processing


Course 4

Digital Image Processing


Course 4

To see how the recovery of F() from is possible F ( ) will


proceed as follows (see Figure 4.8).

T
H ( )
0

max max
otherwise

F ( ) H ( )F ( )

f ( t ) F ( ) e i 2 t d

Digital Image Processing


Course 4

Function H() is called a lowpass filter because it passes


frequencies at the low end of frequency range but it eliminates
(filter out) all higher frequencies. It is also called an ideal lowpass
filter.

The Discrete Fourier Transform (DFT) of One Variable


Obtaining the DFT from the Continuous Transform of a
Sampled Function
The Fourier transform of a sampled, band-limited in [-, ]
function is continuous, periodic in [-, ]. In practice, we work

Digital Image Processing


Course 4

with a finite number of samples, and the objective is to derive the


DFT corresponding to such sample sets.
F ( )

f ( t )e

i 2 t

dt

f ( t ) ( t n T ) e

i 2 t

dt

f n e i 2 n T

(1)
What is the discrete version of F ( ) ? All we need to characterize

F ( ) is one period, and sampling one period is the basis of DFT.


Suppose that we want to obtain M equally spaced samples of F ( )
1
taken over the period 0,
. Consider:

Digital Image Processing


Course 4

m
M T

m 0,1,..., M 1

substitute it in (1):
Fm

M 1

fe
n 0

2 m n
M

m 0,1, 2,..., M 1

(2)

This expression is the discrete Fourier transform.


Given a set { fn } of M samples of f(t), equation (2) yields a sample
set { Fm } of M complex discrete values corresponding to the
discrete Fourier transform of the input sample.

Digital Image Processing


Course 4

Conversely, given { Fm } , we can recover the sample set { fn } by


using the inverse discrete Fourier transform (IDFT):
1
fn
M

M 1

m 0

2 m n
M

n 0,1, 2,..., M 1

Digital Image Processing


Course 4

Digital Image Processing


Course 4

Extension to Functions of Two Variables


The 2-D Impulse and Its Sifting Property
Continuous case

(t , z )
0

if t z 0
otherwise

( t , z ) dt dz 1

Sifting property

f ( t , z ) ( t , z ) dt dz f (0,0)

f ( t , z ) ( t t0 , z z0 ) dt dz f ( t0 , z0 )

Digital Image Processing


Course 4

Discrete case
1
( x, y)
0

if x y 0
otherwise

f ( x , y ) ( x , y ) f (0,0)

x y

x y

f ( x , y ) ( x x0 , y y0 ) f ( x0 , y0 )

Digital Image Processing


Course 4

The 2-D Continuous Fourier Transform Pair


F ( , )
f (t , z )

f ( t , z )e i 2 ( t z ) dt dz

F ( , )e i 2 ( t z ) d d

Two Dimensional Sampling and the 2-D Sampling Theorem


2-D impulse train

s T Z ( t , z )

(t mT , z nZ )

m n

f(t,z) is band-limits if its Fourier transform is 0 outside the


rectangle defined by the intervals max , max and max , max :

Digital Image Processing


Course 4

F ( , ) 0 for max and max


The two-dimensional sampling theorem states that a continuous,
band-limited function f(t,z) can be recovered with no error from a
set of its sample if the sampling intervals are:

T
Z

1
2 max
1
2 max

1
2 max
T

1
2 max
Z

Digital Image Processing


Course 4

The 2-D Discrete Fourier Transform and Its Inverse


F ( u, v )

M 1 N 1

f ( x , y )e

ux v y
i 2

M N

x 0 y 0

f(x,y) is a digital image of size M N.


Given the transform F(u,v) we can obtain f(x,y) by using the
inverse discrete Fourier transform (IDFT):
1
f ( x, y)
MN

M 1 N 1

F (u, v )e
u 0 v 0

ux v y
i 2

M N

, x 0,1,..., M 1 , y 0,1,..., N 1

Digital Image Processing


Course 4

Some Properties of the 2-D Discrete Fourier Transform


Relationships Between Spatial and Frequency Intervals
A digital image f(x,y) consists of M N samples taken at T and

Z distances. The separation between the corresponding discrete,


frequency domain variables are given by:
1
u
M T
1
v
N Z

Digital Image Processing


Course 4

Translation and Rotation


f ( x , y )e

u x v y
i 2 0 0
M N

f ( x x 0 , y y0 )

F ( u u0 , v v0 )
F ( u, v )e

u x v y
i 2 0 0
M N

Using polar coordinates

x r cos

y r sin ,

u cos

v sin

we get the rotating f(x,y) by an angle 0, the same happens with the
Fourier transform, F:

f ( r , 0 )

F ( , 0 )

Digital Image Processing


Course 4

Periodicity
F ( u, v ) F ( u k1 M , v ) F ( u, v k2 N ) F ( u k1 M , v k2 N )
f ( x , y ) f ( x k1 M , y ) f ( x , y k2 N ) f ( x k1 M , y k2 N ) ,
k1 , k2 integers
f ( x , y )( 1)

x y

M
N
F (u
,v )
2
2

This last relation shifts the data so that F(0,0) is at the center of the
frequency rectangle defined by the intervals [0,M-1] and [0,N-1].

Digital Image Processing


Course 4

Symmetry Properties
Odd and even part of a function:

w ( x , y ) we ( x , y ) wo ( x , y )
w( x , y ) w( x , y )
we ( x , y )
2
w( x, y ) w( x, y )
wo ( x , y )
2
we ( x , y ) we ( x , y ) symmetric
wo ( x , y ) wo ( x , y ) antisymmetric

Digital Image Processing


Course 4

For digital images, evenness and oddness become:


we ( x , y ) we ( M x , N y )
wo ( x , y ) wo ( M x , N y )
M 1 N 1

w ( x , y )w ( x , y ) 0
x 0 y 0

Digital Image Processing


Course 4

Digital Image Processing


Course 4

Fourier Spectrum and Phase Angle


Express the Fourier transform in polar coordinates:
F ( u , v ) F ( u , v ) e i ( u ,v ) ,
F ( u, v ) R 2 ( u, v ) I 2 ( u, v ) is called Fourier or frequency spectrum

I ( u, v )
( u, v ) arctan
is the phase angle

R( u , v )
P ( u, v ) F ( u, v ) R 2 ( u, v ) I 2 ( u, v ) - the power spectrum
2

F ( u, v ) F ( u, v )

( u, v ) ( u, v )

Digital Image Processing


Course 4

F (0,0)

M 1 N 1

f ( x, y)
x 0 y 0

1
F (0,0) MN f , f
MN

M 1 N 1

f ( x, y)

the average value of the image f

x0 y0

F (0,0) MN f

Because MN usually is large, |F(0,0)| is the largest component of


the spectrum by a factor that can be several orders of magnitude
larger than other terms.

Digital Image Processing


Course 4

F(0,0) sometimes is called the dc component of the transform.


(dc=direct current current of zero frequency)
The 2-D Convolution Theorem
2-D circular convolution:
f ( x , y )h( x , y )

M 1 N 1

f (m , n)h( x m , y n), x 0,1,..., M 1, y 0,1,..., N 1

m 0 n 0

The 2-D convolution theorem


f ( x , y )h( x , y )
f ( x , y )h( x , y )

F ( u, v ) H ( u , v )
F ( u, v )H ( u, v )

Digital Image Processing


Course 4

Digital Image Processing


Course 4

Digital Image Processing


Course 4

Digital Image Processing


Course 4

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Filtering in the Frequency Domain


Let f(x,y) be a digital image and F(u,v) its (discrete) Fourier
transform. Usually it is not possible to make direct
associations between specific components of an image and its
transform. We know that F(0,0) is proportional to the average
intensity of the image. Low frequencies correspond to the
slowly varying intensity components of an image, the higher
frequencies correspond to faster intensity change in an image
(edges, for ex.).

Digital Image Processing


Course 5

The 2-D Discrete Fourier Transform and Its Inverse

F ( u, v )

M 1 N 1

f ( x , y )e

ux v y
i 2

M N

x 0 y0

f(x,y) is a digital image of size M N.


Given the transform F(u,v) we can obtain f(x,y) by using the
inverse discrete Fourier transform (IDFT):

1
f ( x, y)
MN

M 1 N 1

F (u, v )e

ux v y
i 2

M
N

, x 0,1,..., M 1,

u 0 v 0

y 0,1,..., N 1

Digital Image Processing


Course 5

F (0,0) MN f

1
f
MN

M 1 N 1

f ( x, y)
x 0 y0

f the average value of the image f


f ( x x 0 , y y0 )

f ( r , 0 )

F ( u, v )e

x y y v
i 2 0 0
N
M

F ( , 0 )

The spectrum is insensitive to image translation, and it rotates


by the same angle as the image rotates.

Digital Image Processing


Course 5

image

centered Fourier spectrum

Fourier spectrum

log transformed centered Fourier spectrum

Digital Image Processing


Course 5

translated image

45 rotated image

Fourier spectrum

Fourier spectrum

Digital Image Processing


Course 5

The magnitude of the 2-D DFT is an array whose components


determine the intensities in the image, the corresponding
phase is an array of angles that carry the information about
where discernible objects are located in the image.

Digital Image Processing


Course 5

Woman

Rectangle

phase angle

reconstruction only with phase angle

rectangle spectrum+phase angle woma rectangle phase angle + spectrum woman

Digital Image Processing


Course 5

The 2-D Convolution Theorem


2-D circular convolution:

f ( x , y )h( x , y )

M 1 N 1

f (m , n)h( x m , y n) ,

m 0 n 0

x 0,1,..., M 1 , y 0,1,..., N 1

The 2-D convolution theorem


f ( x , y )h( x , y )

F ( u, v ) H ( u , v )

f ( x , y )h( x , y )

F ( u, v )H ( u, v )

Digital Image Processing


Course 5

Digital Image Processing


Course 5

If we use the DFT and the convolution theorem to obtain the


same result in the left column of Figure 4.28, we must take
into account the periodicity inherent in the expression for the
DFT. The problem which appears in Figure 4.28 is commonly
referred to as wraparound error. The solution to this problem
is simple. Consider two functions f and h composed of A and
B samples. It can be shown that if we append zeros to both

functions so that they have the same length, denoted by P,


then wraparound is avoided by choosing:

Digital Image Processing


Course 5

P A B 1
This process is called zero padding.
Let f(x,y) and h(x,y) be two image arrays of size AB and
CD pixels, respectively. Wraparound error in their circular

convolution can be avoided by padding these functions with


zeros:
f ( x, y)
f p ( x, y)
0

0 x A 1 and 0 y B 1
A x P and B y Q

Digital Image Processing


Course 5

h( x , y )
hp ( x , y )
0

0 x C 1 and 0 y D 1
C x P and D y Q

P A C 1 ( P 2 M 1)

Q B D 1 (Q 2 N 1)

Digital Image Processing


Course 5

Frequency Domain Filtering Fundamentals

Given a digital image f(x,y) of size MN, the basic filtering


equation has the form:

g ( x , y ) F 1 H ( u, v ) F ( u, v )

(1)

Where F 1 is the inverse discrete Fourier transform (IDFT),


F(u,v) is the discrete Fourier transform (DFT) of the input

image, H(u,v) is a filter function (also called filter or the filter


transfer function) and g(x,y) is the filtered (output) image.
F, H, and g are arrays of the same size as f, MN.

Digital Image Processing


Course 5

H(u,v) symmetric about its center simplifies the

computations and also requires that F(u,v) to be centered.


In order to obtain a centered F(u,v) the image f(x,y) is
multiplied by (-1)x+y before computing its transform.
0
H ( u, v )
1

u M / 2, v N / 2( u v 0)
elsewhere

This filter rejects the dc term (responsible for the average


intensity of an image) and passes all other terms of F(u,v).

Digital Image Processing


Course 5

This filter will reduce the average intensity of the output


image to zero.
Low frequencies in the transform are related to slowly
varying intensity components in an image (such as walls in a
room, or a cloudless sky) and high frequencies are caused by
sharp transitions in intensity, such as edges or noise. A filter

H(u,v) that attenuates high frequencies while passing low


frequencies (i.e. a lowpass filter) would blur an image while a
filter with the opposite property (highpass filter) would

Digital Image Processing


Course 5

enhance sharp detail, but cause a reduction of contrast in the


image.

Image of damaged integrated circuit

Fourier spectrum

F(0,0)=0

Digital Image Processing


Course 5

Digital Image Processing


Course 5

The DFT is a complex array of the form:

F ( u , v ) R( u, v ) i I ( u , v )

g( x , y ) F 1 H ( u, v ) R( u, v ) i H ( u, v ) I ( u, v )
The phase angle is not altered by filtering in this way. Filters
that affect the real and the imaginary parts equally, and thus
have no effect on the phase are called zero-phase-shift filters.
Even small changes in the phase angle can have undesirable
effects on the filtered output.

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Main Steps for Filtering in the Frequency Domain


1. Given an input image f(x,y) of size MN, obtain the
padding parameters P and Q (usually P=2M , Q=2N)
2. Form a padded image fp(x,y), of size PQ by
appending the necessary numbers of zeros to f(x,y) (f is
in the upper left corner of fp)
3. f p ( x , y ) ( 1) x y f p ( x , y ) - to center its transform
4. Compute the DFT, F(u,v), of the image obtain from 3.

Digital Image Processing


Course 5

5. Generate a real, symmetric filter function H(u,v) of

P Q
size PQ with center at coordinates , . Compute
2 2
the array product G ( u, v ) H ( u, v )F ( u, v )
6. Obtain the processed image:

g p ( x , y ) real F 1 G ( u, v ) ( 1) x y
The real part is selected in order to ignore parasitic
complex components resulting from computational
inaccuracies.

Digital Image Processing


Course 5

7. Obtain the output, filtered image, g(x,y) by extracting


the MN region from the top, left corner of gp(x,y).

Digital Image Processing


Course 5

Correspondence between Filtering in the Spatial and


Frequency Domains
The link between filtering in the spatial domain and
frequency domain is the convolution theorem.
Given a filter H(u,v), suppose that we want to find its
equivalent representation in the spatial domain.

f ( x , y ) ( x , y ) F ( u, v ) 1
g ( x , y ) F 1 H ( u, v )F ( u, v ) h( x , y ) F 1 H ( u, v )

Digital Image Processing


Course 5

The inverse transform of the frequency domain filter, h(x,y) is


the corresponding filter in the spatial domain.
Conversely, given a spatial filter, h(x,y) we obtain its
frequency domain representation by taking the Fourier
transform of the spatial filter:

h( x , y ) H ( u, v )
h(x,y) is sometimes called as the (finite) impulse response
(FIR) of H(u,v).

Digital Image Processing


Course 5

One way to take advantage of the properties of both domains


is to specify a filter in the frequency domain, compute its
IDFT, and then use the resulting, full-size spatial filter as a
guide for constructing smaller spatial filter masks.
Let H(u) denote the 1-D frequency domain Gaussian filter:

H ( u) Ae

u2

2 2

, the standard deviation

The corresponding filter in the spatial domain is obtained by


taking the inverse Fourier transform of H(u):

Digital Image Processing


Course 5

h( x ) 2 Ae

2 2 2 x 2

which is also a Gaussian filter. When H(u) has a broad profile


(large value of ), h(x) has a narrow profile and vice versa.
As approaches infinity, H(u) tends toward a constant
function and h(x) tends towards an impulse, which implies no
filtering in the frequency and spatial domains.

Digital Image Processing


Course 5

Image Smoothing Using Frequency Domain Filters


Smoothing (blurring) is achieved in the frequency domain by
high-frequency attenuation that is by lowpass filtering. We
consider three types of lowpass filters:

ideal ,

Butterworth ,

Gaussian

The Butterworth filter has a parameter called the filter order.


For high order values, the Butterworth filter approaches the
ideal filter and for low values is more like a Gaussian filter.

Digital Image Processing


Course 5

All filters and images in these sections are consider padded


with zeros, thus they have size PQ. The Butterworth filter
may be viewed as providing a transition between the other
two filters.

Ideal Lowpass Filters (ILPF)


1
H ( u, v )
0

if D( u, v ) D0
if D( u, v ) D0

Where D0 0 is a positive constant and D(u,v) is the distance


between (u,v) and the center of the frequency rectangle:

Digital Image Processing


Course 5
2

P
Q

D ( u, v ) u v
2
2

(DUV)

The name ideal indicates that all frequencies on or inside the


circle of radius D0 are passed without attenuation, whereas all
frequencies outside the circle are completely eliminated
(filtered out).
For an ILPF cross section, the point of transition between

H(u,v)=1 and H(u,v)=0 is called the cutoff frequency. The


sharp cutoff frequencies of an ILPF cannot be realized with

Digital Image Processing


Course 5

electronic components, but they can be simulated in a


computer.
We can compare the ILPF by studying their behavior with
respect to the cutoff frequencies.

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Butterworth Lowpass Filter (BLPF)


The transfer function of a Butterworth lowpass filter of order

n and with cutoff frequency at distance D0 from the origin is:


H ( u, v )

1
D ( u, v )
1

D0

2n

where D(u,v) is given by the relation (DUV).

Digital Image Processing


Course 5

The BLPF transfer function does not have a sharp discontinuity


that gives a clear cutoff

between passed and filtered

frequencies. For filters with smooth transfer functions, defining


a cutoff frequency locus is made at points for which H(u,v) is
down to a certain fraction of its maximum value.

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Gaussian Lowpass Filter (GLPF)


H ( u, v ) e

D 2 ( u ,v )
2

D 2 ( u ,v )
2 D02

D0 is the cutoff frequency. When D(u,v) = D0 the GLPF is

down to 0.607 of its maximum value.

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Image Sharpening Using Frequency Domain Filters

Edges and other abrupt changes in intensities are associated


with high-frequency components, image sharpening can be
achieved in the frequency domain by highpass filters, which

Digital Image Processing


Course 5

attenuates the low-frequency components without changing


the high-frequency information in the Fourier transform.
A highpass filter is obtained from a given lowpass filter
using the equation:
H HP ( u, v ) 1 H LP ( u, v )

where HLP(u,v) is the transfer function of a lowpass filter.

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Ideal Highpass Filter

A 2-D ideal highpass filter (IHPF) is defined as:


0
H ( u, v )
1

if D( u, v ) D0
if D( u, v ) D0

where D0 is the cutoff frequency and D(u,v) is given by


equation (DUV). As for ILPF, the IHPF is not physically
realizable.

Digital Image Processing


Course 5

Butterworth Highpass Filter (BHPF)

The transfer function of a Butterworth highpass filter of order


n and with cutoff frequency at distance D0 from the origin is:

Digital Image Processing


Course 5

H ( u, v )

1
D0
1

(
,
)
D
u
v

2n

Digital Image Processing


Course 5

Gaussian Highpass Filter (GLPF)


H ( u, v ) 1 e

D 2 ( u ,v )
2 D02

Digital Image Processing


Course 5

Digital Image Processing


Course 5

Figure 4.57(a) is a 1026962 image of a thumb print in which


smudges are present. A keystep in automated figerprint
recognition is enhancement of print ridges and the reduction
of smudges. In this example a highpass filter was used to
enhance ridges and reduce the effects of smudging.
Enhancement of the ridges is accomplished by the fact that
they contain high frequencies, which are unchanged by a
highpass filter. This filter reduces low frequency components

Digital Image Processing


Course 5

which correspond to slowly varying intensities in the image,


such as background and smudges.
Figure 4.57(b) is the result of using a BHPF of order n=4,
with a cutoff frequency D0=50.
Figure 4.57(c) is the result of setting to black all negative
values and to white all positive values in Figure 4.57(b) (a
threshold intensity transformation)

Digital Image Processing


Course 5

The Laplacian in the Frequency Domain

The Laplacian can be implemented in the frequency domain


using the filter:

H ( u, v ) 4 2 u 2 v 2

The centered Laplacian is:


2
2

P
Q
2
H ( u, v ) 4 u v 4 2 D 2 ( u, v )
2
2

The Laplacian image is obtained as:

2 f ( x , y ) F 1 H ( u, v )F ( u, v )

Digital Image Processing


Course 5

Enhancement is obtained with the equation:

g( x , y ) f ( x , y ) 2 f ( x , y )

(1)

Computing 2 f ( x , y ) with the above relation introduces


DFT scaling factors that can be several orders of magnitude
larger than the maximum value of f. To fix this problem, we
normalize the values of f(x,y) to the range [0,1] (before
computing its DFT) and divide 2 f ( x , y ) by its maximum
value which will bring it to [-1,1].

Digital Image Processing


Course 5

g ( x , y ) F 1 F ( u, v ) H ( u, v )F ( u, v )
F

1 4

D ( u, v ) F ( u, v )
2

(2)

The above formula is simple but has the same scaling


problems as those mentioned above. Between (1) and (2), the
former is preferred.

Digital Image Processing


Course 6

Digital Image Processing


Course 6

Unsharp Masking, Highboost Filtering and


High-Frequency-Emphasis Filtering

gmask ( x , y ) f ( x , y ) f LP ( x , y )
f LP ( x , y ) F 1 H LP ( u, v )F ( u, v )
HLP(u,v) is a lowpass filter. Here fLP(x,y) is a smoothed image
analogous to f ( x , y ) from the spatial domain.
g ( x , y ) f ( x , y ) k gmask ( x , y )
k=1 unsharp masking, k>1 highboost filtering
g ( x , y ) F 1 1 k H HP ( u, v ) F ( u, v )

Digital Image Processing


Course 6

The factor 1 k H HP ( u, v ) is called high-frequency-emphasis


filter. Highpass filter set the dc term to zero, thus reducing the
average intensity in the filtered image to 0. The high-frequencyemphasis filter does not have this problem. The constant k gives
control over the proportion of high frequencies that influence the
final result. A more general high-frequency-emphasis filter:

g ( x , y ) F 1 k1 k2 H HP ( u, v ) F ( u, v )

k1 0 controls the offset from the origin, k2 0 controls the


contribution of high frequencies.

Digital Image Processing


Course 6

Homomorphic Filtering
An image can be expressed as the product of its ilumination
i(x,y) and reflectance r(x,y):

f ( x , y ) i ( x , y )r ( x , y )
Because F f ( x , y ) F i ( x , y ) F r ( x , y ) , consider:

z ( x , y ) ln f ( x , y ) ln i ( x , y ) ln r ( x , y )
Taking the Fourier transform of this relation we have:

Z ( u, v ) Fi ( u, v ) Fr ( u, v )

Digital Image Processing


Course 6

where Z, Fi , Fr are the Fourier transform of z(x,y), ln i(x,y),

ln r(x,y), respectively.
We can filter Z(u,v) using a filter H(u,v) so that

S ( u, v ) H ( u, v ) Z ( u, v ) H ( u, v )Fi ( u, v ) H ( u, v )Fr ( u, v )
The filtered image in the spatial domain is:
s( x , y ) F 1 S ( u, v ) F 1 H ( u, v )Fi ( u, v ) F 1 H ( u, v )Fr ( u, v )
Define:
i ( x , y ) F 1 H ( u, v )Fi ( u, v )
r ( x , y ) F 1 H ( u, v )Fr ( u, v )

Digital Image Processing


Course 6

Because z(x,y)=ln f(x,y), we reverse the process to produce


the output (filtered) image:
s ( x , y ) i ( x , y ) r ( x , y )

g ( x , y ) e s ( x , y ) e i ( x , y )e r ( x , y ) i0 ( x , y )r0 ( x , y )
i0 ( x , y ) e

i( x , y )

illumination of the output image,

r0 ( x , y ) e r ( x , y ) reflectance of the output image

Digital Image Processing


Course 6

The illumination component of an image generally is


characterized by slow spatial variations, while the reflectance
component tends to vary abruptly, particularly at the junction
of dissimilar objects. These characteristics lead to associating
the low frequencies of the Fourier transform of the logarithm
of an image with illumination and the high frequencies with
reflectance.

Digital Image Processing


Course 6

Selective Filtering
There are applications in which it is of iterest to process
specific bands of frequencies (bandreject or bandpass filters)
or small regions of the frequency rectangle (notch filters)

Bandreject and Bandpass Filters


Ideal bandreject filter

0
H ( u, v )
1

W
W
if D0
D( u, v ) D0
2
2
otherwise

Digital Image Processing


Course 6

Butterworth Bandreject Filter

H ( u, v )

1
W D ( u, v )
1 2
2
D
u
v
D
(
,
)

2n

Gaussian Bandreject Filter

H ( u, v ) 1 e

D 2 ( u ,v ) D02

W D ( u ,v )

In the above bandreject filters (ideal, Butterworth and


Gaussian) D(u,v) is the distance from the center of the

Digital Image Processing


Course 6

rectangle given by (DUV), D0 is the radial center of the band,


and W is the width of the band.
A bandpass filter is obtained from a bandreject filter using
the formula:

H BP ( u, v ) 1 H BR ( u, v )

Digital Image Processing


Course 6

Notch Filters
A notch filter rejects (or passes) frequencies in a predefined
neighborhood about the center of the frequency rectangle.
Zero-phase-shift filters must be symmetric about the origin,
so a notch filter with center at (u0,v0) must have a
corresponding notch at location (-u0,-v0).
Notch reject filters are constructed as products of highpass
filters whose centers have been translated to the center of the
notches. The general form is:

Digital Image Processing


Course 6
Q

H NR ( u, v ) H k ( u, v ) H k ( u, v )
k 1

Where Hk(u,v) and H-k(u,v) are highpass filters whose centers


are at (uk,vk) and (-uk,-vk), respectively. These centers are
specified with respect to the center of the frequency rectangle

M N
2 , 2 . The distance computations for each filter are made

using the expressions:

Digital Image Processing


Course 6
2

M
N

Dk ( u, v ) u
uk v vk
2
2

M
N

uk v vk
D k ( u, v ) u
2
2

A Butterworth notchreject filter of order n with 3 notch pairs:


3

H NR ( u, v ) {
k 1

1
D0 k
1

(
,
)
D
u
v
k

2n

}{

1
D0 k
1

(
,
)
D
u
v
k

2n

Digital Image Processing


Course 6

A notch pass filter is obtained from a notch reject filter using


the expression:

H NP ( u, v ) 1 H NR ( u, v )
One of the applications of notch filtering is for selectively
modifying local regions of the DFT. This type of processing
is done interactively, working directly on DFTs obtained
without padding.

Digital Image Processing


Course 6

Digital Image Processing


Course 6

Figure 4.65(a) shows an image of part of the rings


surrounding Saturn. The vertical sinusoidal pattern was
caused by an AC signal superimposed on the video camera
signal just prior to digitizing the image. Figure 4.65(b) shows
the DFT spectrum. The white vertical lines which appears in
the DFT corresponds to the nearly sinusoidal interference.
The problem was solved by using a narrow notch rectangle
filter shown in Figure 4.65(c).

Digital Image Processing


Course 6

Image Restoration and Reconstruction


Restoration attempts to recover an image that has been
degraded supposing we have some knowledge of the
degradation phenomenon. Restoration techniques are oriented
toward modeling the degradation and applying the inverse
process in order to recover the original image. This involves
formulating a criterion of goodness

that will produce an

optimal estimate of the desired result. Enhancement


techniques basically are heuristic procedures designed to

Digital Image Processing


Course 6

manipulate the image in order to satisfy some demands


required by the human vision system. Contrast stretching is
considered an enhancement technique (it is done to please in
some sense the viewer), removal of image blur by applying a
debluring function is considered a restoration technique.

A Model of the Image Degradation/Restoration Process


We consider the case when the degraded image, g(x,y) is
obtained from the original, f(x,y), by applying a degradation
function together with an additive noise term.

Digital Image Processing


Course 6

g ( x , y ) H [ f ( x , y )] ( x , y )
Given g(x,y), some knowledge about the degradation function

H, and some knowledge about the additive noise term, (x,y),


the objective of restoration is to obtain an estimate f ( x , y ) of

Digital Image Processing


Course 6

the original image. The more we know about H and the


closer f ( x , y ) will be to f(x,y).

Noise Models
g( x , y ) f ( x , y ) ( x , y ) ( H I )
The main sources of noise in digital images arise during
image

acquisition

and/or

transmission

(environmental

conditions during image acquisition, the quality of the


sensors).

Digital Image Processing


Course 6

Parameters that define the spatial characteristics of the


noise and whether the noise is correlated with the image are
important properties to be studied. We assume that the noise
is independent of spatial coordinates and that it is
uncorrelated with the image itself (i.e. there is no correlation
between pixel values and the values of noise components).

Digital Image Processing


Course 6

Some Important Noise Probability Density Functions


The

noise

may

be

considered

random

variable,

characterized by a probability denisty function (pdf).

Gaussian noise

p( z )

( z z )2
2 2

where z represents intensity, z is the mean value, and is its


standard deviation, 2 is called variance of z.

Digital Image Processing


Course 6

Rayleigh noise
( z a )
2

( z a )e b
p( z ) b
0

The mean and variance for this pdf are:


z a

b
4

b(4 )

4
2

for z a
for z a

Digital Image Processing


Course 6

Erlang (gamma) noise


a b z b 1 az
e

p( z ) (b 1)!
0

for z 0
for z 0
b
z
a
b
2
a
2

a , b 0, b

Digital Image Processing


Course 6

Exponential noise
ae az
p( z )
0

for z 0
,
for z 0
1
z
a
1
2
a
2

(Erlang b=1)

a0

Digital Image Processing


Course 6

Uniform noise

p( z ) b a
0

for a z b
otherwise

ab
z
2
2
(
b

a
)
2
12

Digital Image Processing


Course 6

Impulse (salt-and-pepper) noise

The pdf of (bipolar) impulse noise is given by


Pa

p( z ) Pb
0

for z a
for z b
otherwise

b > a intensity b appear as a light dot in the image


b < a intensity a appear as a dark dot in the image
Pa = 0 or Pb = 0 the impule noise is called unipolar

Digital Image Processing


Course 6

Pa Pb - impulse noise values will ressemble salt and pepper

granules randomly distributed over the image. For this reason,


bipolar impulse noise is called also salt-and-pepper noise.
Noise impulses can be negative or positive. Because impulse
corruption usually is large compared with the strength of the
image signal, impulse noise generally is digitized as extreme
(pure black or white) values in an image. Thus, the
assumption is that a and b are equal to the minimum and
maximum allowed values in the digitized image. As a result,

Digital Image Processing


Course 6

negative impulses appear as black (pepper) points in an


image, and positive impulses appear as white (salt) points.

Digital Image Processing


Course 6

Periodic noise

Periodic noise arises from electrical or electromechanical


interference during image acquisition. This type of noise is
spatially dependent and can be reduced significantly via
frequency domain filtering.

Digital Image Processing


Course 6

Figure 5.5(a) is corrupted by sinusoidal noise of various


frequencies. The Fourier transform of a pure sinusoid is a pair
of conjugate impulses located at the conjugate frequencies of
the sine wave. If the amplitude of a sine wave in the spatial
domain is strong enough, we would expect to see in the
spectrum of the image a pair of impulses for each sine wave
in the image. In Figure 5.5(b) we can see the impulses
appearing in a circle.

Digital Image Processing


Course 6

Estimation of Noise Parameters

The parameters of periodic noise are estimated by inspection


of the Fourier spectrum of the image. Sometimes it is possible
to deduce the periodicity of noise just by looking at the
image.
The parameters of noise pdfs may be known partially from
sensors specifications. If the image system is available, one
simple way to study the characteristics of system noise is to
capture a set of images of flat environments (in the case of

Digital Image Processing


Course 6

an optical sensor, this means taking images of a solid gray


board that is illuminated uniformly). The resulting images are
good indicators of system noise.
When only images already generated by a sensor ar available,
frequently it is possible to estimate the parameters of the pdf
from small portions of the image that are of constant
background intensity.
The simplest use of the data from the image strips is for
calculating the mean and the variance of intensity levels.

Digital Image Processing


Course 6

Consider a subimage S and let pS(zi), i=0,1,2,...,L-1 denote the


probability estimates (normalized histogram values) of the
intensities of the pixels in S, where L is the number of
possible intensities in the entire image. We estimate the mean
and the variance of the pixels in S:
L 1

z z i pS ( z i )
i 0

L 1

2 ( z i z ) 2 pS ( z i )
i 0

Digital Image Processing


Course 6

The shape of the histogram identifies the closest pdf match. If


the shape is almost Gaussian then the mean and the variance
are all we need. For the other shapes, we use the mean and the
variance to solve for parameters a and b.
Impulse noise is handled differently because the estimate
needed is of the actual probability of occurrence of white and
black pixels. Obtaining this estimate requires that both black
and white pixels be visible, so a midgray, relatively constant
area is needed in the image in order to be able to compute a

Digital Image Processing


Course 6

histogram. The heights of the peaks corresponding to black


and white pixels are the estimates of Pa and Pb.

Digital Image Processing


Course 6

Restoration in the Presence of Noise Only Spatial Filetring

g( x , y ) f ( x , y ) ( x , y )

(1)

G ( u , v ) F ( u, v ) N ( u , v )

(2)

The noise terms are unknown. In the case of periodic noise,


usually it is possible to estimate N(u,v) from the spectrum of
G(u,v). In this case, an estimate of the original image is given
by:

f ( x , y ) F 1 G ( u, v ) N e ( u, v )

Digital Image Processing


Course 6

Spatial filtering is the method of choice in situations when only


additive random noise is present.

Mean Filters

Suppose Sxy represent a recatngular neighborhood of m n


size centered at point (x,y).
Arithmetic mean filter
f ( x , y )

1
g( s, t )

m n ( s ,t )S xy

Digital Image Processing


Course 6

A mean filter smooths local variations of an image and noise


is reduced as a result of blurring.
Geometric mean filter

f ( x , y ) g ( s , t )
( s ,t )S xy

1
mn

A geometric mean filter achieves smoothing comparable to


the arithmetic mean filter, but it tends to lose less image
detail.

Digital Image Processing


Course 6

Harmonic mean filter


mn
g( s, t )

f ( x , y )

( s , t )S xy

Harmonic mean filter works well for salt noise, but fails for
pepper noise. It also works well on Gaussian noise.
Contraharmonic mean filter

f ( x , y )

g ( s , t )Q 1

( s , t )S xy

( s , t )S xy

g( s, t )

, Q the order of the filter

Digital Image Processing


Course 6

This filter is good for reducing or virtually eliminating the


effects of salt-and-pepper noise.
For Q > 0 the filter eliminates pepper noise, for Q < 0 the filter
eliminates salt noise, but it cannot do both simultaneously.
Q = 0 arithmetic mean filter, Q = -1 harmonic mean filter

Digital Image Processing


Course 6

Digital Image Processing


Course 6

Digital Image Processing


Course 6

Order-Statistic Filters
Median filter
f ( x , y ) median{ g ( s , t );( s , t ) S xy }

Median filters have excellent noise-reduction capabilities,


with less blurring than linear smoothing filters. Median filters

Digital Image Processing


Course 6

are particularly effective in the presence of bipolar and


unipolar impulse noise.
Max and min filters
f ( x , y ) max{ g ( s , t );( s , t ) S xy }

This filter is useful for finding the brightest points in an


image. This filter reduces pepper noise.
f ( x , y ) min{ g ( s , t );( s , t ) S xy }

This filter is useful for finding the darkest points in an image.


This filter reduces salt noise.

Digital Image Processing


Course 6

Midpoint filter
f ( x , y ) 1 max{ g ( s , t );( s , t ) S } min{ g ( s , t );( s , t ) S }
xy
xy
2

It works best for randomly distributed noise, like Gaussian or


uniform noise.

Digital Image Processing


Course 6

Linear, Position-Invariant Degradations


g ( x , y ) H f ( x , y ) ( x , y )

Assume that H is linear:


H a f1 ( x , y ) b f 2 ( x , y ) a H f1 ( x , y ) b H f 2 ( x , y ) ,
a , b , f1 , f 2 images

The operator H[f(x,y)] = g(x,y) is said to be position (or


space) invariant if:

H f ( x , y ) g ( x , y )

, , , f

Digital Image Processing


Course 6

This definition indicates that the response at any point in the


image depends only on the value of the input at that point, not
on its position.
Let (,) be the impulse function, the impulse response of
H is:
h( x , , y , ) H ( x , y )

The function h(x,,y,) is also called the point spread


function.

Digital Image Processing


Course 6

We have the following relations:


g ( x , y ) h( x , y ) f ( x , y ) ( x , y )

or in the frequency domain:


G ( u , v ) H ( u, v ) F ( u, v ) N ( u, v )

A linear, spatially-invariant degradation system with additive


noise can be modeled in the spatial domain as the convolution
of the degradation (point spread) function with an image,
followed by the addition of noise. In the frequency domain
the transformation is given as the product of the transforms

Digital Image Processing


Course 6

of the image and degradation, followed by the addition of the


trasform of the noise.
Because degradations are modeled as being the result of
convolution, and restoration is the reverse process, the term
image deconvolution is used for linear image restoration, and

the filters are referred as deconvolution filters.

Digital Image Processing


Course 6

Estimating the Degradation Function

There are 3 ways to estimate the degradation function in


image restoration:
1.

observation

2.

experimentation

3.

mathematical modelling

Estimation by Image Observation

Suppose that we have a degraded image without any


knowledge about the degradation function. Assuming that the

Digital Image Processing


Course 6

image was degraded by a linear, position-invariant process,


one way to estimate H is to gather information from the
image itself.
If the image is blurred, we can study a small rectangular
section of the image containing sample structures (part of an
object and the background). In order to reduce the effects of
noise, we would look for an area in which the signal content
is strong (e.g. an area of high contrast). The next step is to

Digital Image Processing


Course 6

process the subimage in order to unblur it as much as it is


possible (by using a sharpening filter, by example).
Let gs(x,y) denote the observed subimage, and fs ( x , y ) be the
processed subimage. Assuming that the effect of noise is
negligible (because of the choice of a strong-signal area) it
follows that:
G s ( u, v )
H s ( u, v )
.
F ( u, v )
s

Digital Image Processing


Course 6

Based on the assumption of position invariance, we can


deduce from the above function the characteristics of the
complete degradation function H.
Estimation by Experimentation

Suppose is available equipment similar to the the equipment


used to acquire the degraded image. Images similar to the
degraded image can be acquired with various system settings
until they are degraded as closely as possible to the image we
want to restore. The idea is to obtain the impulse response of

Digital Image Processing


Course 6

the degradation by imaging an impulse (a smoll dot of light)


using the same system settings. We know that a linear,
space-invariant system is characterized completely by its
impulse response.
An impulse is simulated by a bright dot of light, as bright as
possible to reduce the effect of noise almost to zero. Using
the relation:
G ( u , v ) H ( u, v ) F ( u, v ) N ( u, v )

where F(u,v)=A (the Fourier transform of the inpulse) we get:

Digital Image Processing


Course 6

G ( u, v )
H ( u, v )
A
G(u,v) is the Fourier transform of the observed image and A

is a constant describing the strength of the impulse.

Digital Image Processing


Course 6

Estimation by Modelling

A degradation model proposed by Hufnagel and Stanley is


based on the physical characteristics of atmospheric
turbulence:
H ( u, v ) e k ( u

5
2 6
v )

where k is a constant that depends on the nature of the


turbulence.

Digital Image Processing


Course 6

Digital Image Processing


Course 6

Another approach in modeling is to derive a mathematical


model from basic principles.
Suppose that an image has been blurred by uniform linear
motion between the image and the sensor during image
acquisition. Suppose that an image f(x,y) undergoes planar
motion and that x0(t) and y0(t) are the time varying
components of motion in the x- and y-directions, respectively.

Digital Image Processing


Course 6

Assuming that shutter opening and closing takes place


instantaneously and that the optical imaging process is
perfect, we can simulate the effect of image motion.
If T is the duration of the exposure, we have:
T

g ( x , y ) f ( x x0 ( t ), y y0 ( t ))dt
0

g(x,y) is the blurred image. We can compute the Fourier

transform of g with respect to the Fourier transform of the


unblurred image f:

Digital Image Processing


Course 6
T

G ( u, v ) F ( u, v ) e

i 2 ux0 ( t ) vy0 ( t )

H ( u, v ) e

i 2 ux0 ( t ) vy0 ( t )

dt

dt

G ( u , v ) H ( u, v ) F ( u , v )
If the motion variables x0(t) and y0(t)

are known, the

transfer function H(u,v) can be computed using the formula


above.
at
T
, y0 ( t ) 0 H ( u , v )
sin( ua )e i ua
x0 ( t )
ua
T

Digital Image Processing


Course 6

at
bt
, y0 ( t )
x0 ( t )
T
T

T
sin ( ua vb ) e i ( ua vb )
H ( u, v )
( ua vb )

Digital Image Processing


Course 6

Inverse Filtering

The simplest approach to restoration is direct inverse filtering,


where we compute an estimate F ( u, v ) of the transform of the
original image simply by dividing the transform of the
degraded image G(u,v) by the degradation function:
G ( u, v )

F ( u, v )
array operation
H ( u, v )
G ( u , v ) H ( u, v ) F ( u, v ) N ( u, v )

Digital Image Processing


Course 6

N ( u, v )

F ( u, v ) F ( u, v )
H ( u, v )

Even if we know the degradation function we cannot recover


the undegraded image exactly because N(u,v) is not known.
Another problem appears when the degradation function has
N ( u, v )
dominates the
zero or very small values, the term
H ( u, v )

estimate F ( u, v ) .

Digital Image Processing


Course 6

Minimum Mean Square Error (Wiener) Filtering

This approach treats both the degradation function and


statistical characteristics of the noise into the restoration
process. The method is founded on considering images and
noise as random variables, and the objective is to find an
estimate f of the uncorrupted image f such that the mean
square error between them is minimized. The error measure is
given by:

e 2 E ( f f )2

(1)

Digital Image Processing


Course 6

It is assumed that:
- the noise and the image are uncorrelated;
- the noise or the image has zero mean;
- the intensity levels in the estimate are a linear function of
the levels in the degraded image
From relation (1) we get:

Digital Image Processing


Course 6

H ( u, v )
1

G ( u, v ) (2)

F ( u, v )
S ( u, v )
H ( u, v )
2
H ( u, v )

S
(
u
,
v
)
f

H(u,v) degradation function


S(u,v)=|N(u,v)|2 power spectrum of the noise
Sf(u,v)=|F(u,v)|2 power spectrum of the undegraded image

Digital Image Processing


Course 6

Relation (2) is known as Wiener filter. The part of (2) inside


the brackets is referred to as minimum mean square error
filter or the least square error filter.

A number of useful measures are based on the power


spectra of noise and of the undegraded image.
Signal-to-noise ratio
M 1 N 1

SNR

2
|
F
(
u
,
v
)
|

u 0 v 0
M 1 N 1

2
N
u
v
|
(
,
)
|

u 0 v 0

Digital Image Processing


Course 6

This ratio gives a measure of the level of information bearing


signal power (i.e. of the original , undegraded image) to the
level of noise power. Images with low noise tend to have
high SNR, and conversely, high level of noise implies low

SNR. This ratio is an important metric used to characterize


the performance of restoration algorithms.
Mean square error (approximation of (1))
1
MSE
MN

M 1 N 1

f ( x , y ) f ( x , y )
x 0 y0

Digital Image Processing


Course 6

If the restored image is considerred to be signal and the


difference between this image an the original is noise, we can
define a signal-to-noise ratio in the spatial domain as:
M 1 N 1

SNR

f ( x , y )2

x 0 y0

M 1 N 1

( x , y )

f
x
y
f

(
,
)

x 0 y0

The closer f and f are, the larger this ratio will be.

Digital Image Processing


Course 6

When we are dealing with spectrally white noise


(|N(u,v)|2=const.) relation (2) simplifies. However, the power
spectrum of the undegraded image is rarely known.
approach used frequently in this case is:
2
1

H
(
u
,
v
)
F ( u, v )
G ( u, v )
2
H ( u, v ) H ( u, v ) K

An

Digital Image Processing


Course 7

Digital Image Processing


Course 7

Color Image Processing Color Image Processing


Color Image Processing
Color is very important characteristic of an image that in
most cases simplifies object identification and extraction
form a scene. Human eye can discern thousands of color
shades and intensities and only two dozen shades of gray.
Color image processing is divided in 2 major areas: full-color
(images acquired with a full-color sensor) and pseudocolor
(gray images for which color is assigned) processing.
2

Digital Image Processing


Course 7

In 1666, Sir Isaac Newton discovered that when a beam of


sunlight passes through a glass prism, the emerging beam of
light is not white but consists instead of a continuous
spectrum of colors ranging from violet at one end to red at
the other. The color spectrum may be divided into 6 broad
regions: violet, blue, green, yellow, orange, and red.

Digital Image Processing


Course 7

The colors that humans can perceive in an object are


determinde by the nature of the light reflected from the
object.Visible light is composed of a relatively narrow band
of frequencies in the electromagnetic spectrum (390nm
to750nm). A body that reflects light that is balanced in all
visible wavelengths appears white to the observer. A body
that favors reflectance in a limited range of the visible
spectrum exhibits some shades of color.

Digital Image Processing


Course 7

For example, blue objects reflect light with wavelengths from


450 to 475 nm, while absorbing most of the energy of other
wavelengths.

Digital Image Processing


Course 7

How to characterize light? If the light is achromatic (void of


color) its only attribute is its intensity (or amount)
determined by levels of gray (black-grays-white).
Chromatic light spans the electromagnetic spectrum from
approximately 400 to 720 nm. Three basic quantities are used
to describe the quality of a chromatic light source: radiance,
luminance, and brightness.
- Radiance is the total amount of energy that flows from the
light source (usually measured in watts).
6

Digital Image Processing


Course 7

- Luminance (measured in lumens lm) gives a measure of


the amount of energy an observer percieves from a light
source. For example, the light emitted from a source
operating in the infrared region of the spectrum could have
significant energy (radiance), but an observer would hardly
perceive it (the luminance is almost zero).
- Brightness is a subjective descriptor, that cannot be
measured, it embodies the achromitic notion of intensity
and is a factor describing color sensation.
7

Digital Image Processing


Course 7

Cones are the sensors in the eye responsible for color vision.
It has been established that the 6 to 7 million cones in the
human eye can be devided into three principal sensing
categories, corresponding roughly to red, green, and blue.
Approximately 65% of all cones are sensitive to red light,
33% are sensitive to green light, an only about 2% are
sensitive to blue (but the blue cones are the most sensitive).

Digital Image Processing


Course 7

Due to these absorbtion characteristics of the human eye,


colors are seen as variable combinations of the so-called
primary colors : red (R), green (G), and blue (B).

Digital Image Processing


Course 7

For the purpose of standardization, the CIE (Commission


Internationale de lEclairage) designated in 1931 the
following specific wavelength values to the three primary
colors: blue= 435.8 nm, green = 546.1 nm, and red=700 nm.
The CIE standards correspond only approximately with
experimental data.
These three standard primary colors, when mixed in various
intensity proportions, can produce all visible colors.

10

Digital Image Processing


Course 7

The primary colors can be added to produce the secondary


colors of light magenta (red+blue), cyan (green+blue), and
yellow (red+green). Mixing the three primaries, or a
secondary with its opposite primary color in the right
intensities produces white light.
We must differentiate between the primary colors of light
and the primary colors of pigments. A primary color for
pigments is one that substracts or absorb a primary color of
light and reflects or transmits the other two. Therefore, the
11

Digital Image Processing


Course 7

primary colors of pigments are magenta, cyan, and yellow,


and the secondary colors are red, green, and blue.

12

Digital Image Processing


Course 7

The characteristics usually used to distinguish one color from


another are brightness, hue, and saturation. Brightness
embodies the achromatic notion of intensity. Hue is an
attribute associated with the dominant wavelength in a
mixture of light waves. Hue represents dominat color as
percieved by an observer (when we call an object to be red,
orange or yellow we refer to its hue).

Saturation refers to

the relative purity or the amount of white light mixed with a


hue. The pure spectrum colors are fully saturated. Color such
13

Digital Image Processing


Course 7

as pink (red+white) and lavander (violet+white) are less


saturated, with the degree of saturation being inversely
proportional to the amount of white light added.
Hue and saturation taken together are called chromaticity,
and therefore a color may be characterized by its brightness
and chromaticity.
The amounts of red, green, and blue needed to form any
particular color are called the tristimulus values and are

14

Digital Image Processing


Course 7

denoted X, Y and Z, respectively. A color is specified by its


trichromatic coefficients, defined as:

X
x
X Y Z
Y
y
X Y Z

Z
z
X Y Z

x y z 1
15

Digital Image Processing


Course 7

For any wavelength of light in the visible spectrum, the


tristimulus values needed to produce the color coresponding
to that wavelength can be obtained from the existing curves
or tables.
Another approach for specifying colors is to use the CIE
chromaticity diagram, which shows color compositin as a
function of x (red) and y (green); z (blue) is obtained from
relation z = 1-x-y.

16

Digital Image Processing


Course 7

17

Digital Image Processing


Course 7

The positions of the various spectrum colors (from violet at


380 nm to red at 780 nm) are indicated around the boundary
of the tongue-shaped chromaticity diagram.
The chromaticity diagram is useful for color mixing because a
straight-line segment joining any two points in the diagram
defines all the different color variation that can be obtained
by combining these two colors. This procedure can be
extended to three colors: to triangle determined by the three

18

Digital Image Processing


Course 7

color-points on the diagram embodies all the possible colors


that can be obtained by mixing the three colors.

19

Digital Image Processing


Course 7

Color Models
A color model (color space or color system) is a
specification of a coordinate system and a subspace within
that system where each color is represented by a single point.
http://www.colorcube.com/articles/models/model.htm
Most color models in use today are oriented either toward
hardware (color monitors or printers) or toward applications
where color manipulation is a goal.

20

Digital Image Processing


Course 7

The most commonly used hardware-oriented model is RGB


(red-green-blue) for color monitors, color video cameras.
The

CMY

(cyan-magenta-yellow)

and

CMYK

(cyan-magenta-yellow-black) models are in use for color


printing.
The HSI (hue-saturation-intensity) model corespond with
the way humans describe and interpret colors. The HSI model
has the advantage that it decoupes the color and gray-scale

21

Digital Image Processing


Course 7

information in an image, making it suitable for using the


gray-scale image processing techniques.

The RGB Color Model


In the RGB model, each color appears decomposed in its
primary color components: red, green, blue. This model is
based on a Cartesian coordinate system. The color subspace
of interest is the unit cube (Figure 6.7), in which the primary

22

Digital Image Processing


Course 7

and the seconadary colors are at the corners; black is at the


origin, and white is at the corner farthest from the origin.

The gray scale (point of equal RGB values) extends from


black to white along the line joining these two points. The
23

Digital Image Processing


Course 7

different colors in this model are points on or inside the cube,


and are defined by vectors extending from the origin.

Images represented in the RGB color model consist of three


component images, one for each primary color. The number
of bits used to represent each pixel in RGB space is called the
24

Digital Image Processing


Course 7

pixel depth. Consider an RGB image in which each of the red,


green, and blue images are an 8-bit image. In this case, each
RGB color pixel has a depth of 24 bits. The term full-color
image is used often to denote a 24-bit RGB color image. The
total number of colors in a 24-bit RGB image is

2
8

16.777.216

A convenient way to view these colors is to generate color


planes (faces or cross sections of the cube).

25

Digital Image Processing


Course 7

A color image can be acquired by using three filters, sensitive


to red, green, and blue.

26

Digital Image Processing


Course 7

Because of the variety of systems in use, it is of considerable


interest to have a subset of colors that are likely to be
reproduced faithfully, resonably independently of viewer
hardware capabilities. This subset of colors is called the set of
safe RGB colors, or the set of all-systems-safe colors. In
Internet applications, they are called safe Web colors or safe
browser colors.
We assume that 256 colors is the minimum number of colors
that can be reproduced faithfully by any system. Forty of
27

Digital Image Processing


Course 7

these 256 colors are known to be processed differently by


varoius operating system. We have 216 colors that are
common to most systems, and are the safe colors, especially
in Internet applications. Each of the 216 safe colors has a
RGB representation with:

R, G , B 0,51,102,153, 204, 255


We have (6)3=216 possible color values. It is costumary to
express these values in the hexagonal number system.

28

Digital Image Processing


Course 7

Each safe color is formed from three of the two digit hex
numbers from the above table. For example purest red if
FF0000. The values 000000 and FFFFFF represent black and
white respectively.
Figure 6.10(a) shows the 216 safe colors, organized in
descending RGB values. Figure 6.10(b) shows the hex codes
for all the possible gray colors in the 216 safe color system.
Figure 6.11 shows the RGB safe-color cube.
29

Digital Image Processing


Course 7

http://www.techbomb.com/websafe/
30

Digital Image Processing


Course 7

The CMY and CMYK Color Models


Cyan, magenta, and yellow are the secondary colors of light
but the primary color of pigments. For example, when a
surface coated with yellow pigment is illuminated with white
light, no blue light is reflected from the surface. Yellow
substracts blue light from reflected white light (which is
composed of equal amounts of red, green, and blue light).
Most devices that deposit color pigments on paper, such as
color printers and copiers, require CMY data input and
31

Digital Image Processing


Course 7

perform RGB to CMY conversion. Assuming that the color


values were normalized to range [0,1], this conversion is:

C 1 R
M 1 G

Y 1 B
From this equation we can easily deduce, that pure cyan does
not reflect red, pure magenta does not reflect green, and pure
yellow does not reflect blue.

32

Digital Image Processing


Course 7

Equal amount of pigments primary, cyan, magenta, and


yellow should produce black. In practice, combining these
colors for printing produces a muddy-looking black. In order
to produce true black (which is the predominant color in
printing), a fourth color, black, is added, giving rise to the

CMYK color model.

33

Digital Image Processing


Course 7

34

Digital Image Processing


Course 7

The HSI Color Model


The RGB, CMY, and other similar color models are not well
suited for describing colors in terms that are practical for
human interpretation.
We (humans) describe a color by its hue, saturation and
brightness. Hue is a color attribute that describes a pure color,
saturation gives a measure of the degree to which a pure color
is diluted by white light and brightness is a subjective
descriptor that embodies the achromatic notion of intensity.
35

Digital Image Processing


Course 7

The HSI (hue, saturation, intensity) color model, decouples


the intensity component from the color information (hue and
saturation) in a color image.
What is the link between the RGB color model and HSI color
model? Consider again the RGB unit cube. The intensity axis
is the line joining the black and the white vertices. Consider a
color point in the RGB cube. Let P be a plane perpedicular to
the intensity axis and containing the color point. The
intersection of this plane with the intensity axis gives us the
36

Digital Image Processing


Course 7

intensity of the color point. The saturation (purity) of the


considered color point increases as a function of distance
from the intensity axis (the saturation of the point on the
intensity axis is zero).
In order to determine how hue can be linked to a given RGB
point, consider a plane defined by black, white and cyan. The
intensity axis is also included in this plane. The intersection
of this plane with the RGB-cube is a triangle. All point
contained in this triangle would have the same hue (i.e. cyan).
37

Digital Image Processing


Course 7

The HSI space is represented by a vertical intensity axis and


the locus of color points that lie on planes perpedicular to this
axis. As the planes move up and down the intensity axis, the
boundary defined by the intersection of this plane with the
faces of the cube have either triangular or hexagonal shape.
38

Digital Image Processing


Course 7

39

Digital Image Processing


Course 7

In the plane shown in Figure 6.13(a) primary colors are


separated by 120. The secondary colors are 60 from the
primaries. The hue of the point is determined by an angle
from some reference point. Usually (but not always) an angle
of 0 from the red axis designates 0 hue, and the hue increases
countercloclwise from there. The saturation (distance from
the vertical axis) is the length of the vector from the origin to
the point. The origin is defined by the intersection of the color
plane with the vertical intensity axis.
40

Digital Image Processing


Course 7

Converting colors from RGB to HSI



H
360

if B G
if B G

( R G ) ( R B )

2
arccos
1
( R G )2 ( R B )(G B ) 2

3
S 1
min{ R, G , B }
RG B
1
I R G B
3
41

Digital Image Processing


Course 7

It is assumed that the RGB values have been normalized to


the range [0,1] and that angle is measured with respect to
the red axis of the HSI space in Figure 6.13. Hue can be
normalized to the range [0,1] by dividing it to 360. The
other two HSI components are in this range if the RGB values
are in the interval [0,1].

R=100, G=150, B=200 H=210, S=1/3, I=150/255=0.588

42

Digital Image Processing


Course 7

Converting colors from RGB to HSI


Given values of HSI in the interval [0,1] we now want to find
the corresponding RGB values in the same range.

RG sector (0 H < 120)

B I (1 S )

S cos H
R I 1

cos(60
H
)

G 3I ( R B)
43

Digital Image Processing


Course 7

GB sector (120 H < 240)


R I (1 S )

S cos H
H H 120 , G I 1

cos(60
)
H

B 3I ( R B)

BR sector (120 H < 240)


G I (1 S )

S cos H
H H 240 , B I 1

cos(60
)
H

R 3I ( R B)

44

Digital Image Processing


Course 7

Pseudocolor Image Processing


Pseudocolor (also called false color) image processing
consists of assigning colors to gray values based on a
specified criterion. The main use of pseudocolor is for human
visualization and interpretation of gray-scale events in an
image or sequence of images.

Intensity (Density) Slicing


If an image is viewed as a 3-D function, the method can be
described as one of placing planes parallel to the coordinate
45

Digital Image Processing


Course 7

plane of the image; each plane then slices the function in


the area of intersection.

46

Digital Image Processing


Course 7

The plane at f ( x , y ) li slices the image function into two


levels. If a different color is assigned to each side of the
plane, any pixel whose intensity level is above the plane will
be coded with one color and any pixel below the plane will be
coded with other color. Levels that lie on the plane itself may
be arbitrarily assigned one of the two colors. The result is a
two color image whose relative appearance can be controlled
by moving the slicing plane up and down the intensity axis.

47

Digital Image Processing


Course 7

Let [0,L-1] represent the gray scale, let l0 represent black


(f(x,y)=0) and level lL-1 represent white (f(x,y)=L-1). Suppose
that P planes perpendicular to the intensity axis are defined at
levels l1, l2,, lP , 0<P<L-1. The P planes partition the gray
scale into P+1 intervals, V1, V2,, VP+1. Intensity to color
assignments are made according to the relation:

f ( x , y ) ck if f ( x , y ) Vk .

48

Digital Image Processing


Course 7

Measurements of rainfall levels with ground-base sensors are


difficult and expensive, and total rainfall figures are even
more difficult to obtain because a significant portion of
49

Digital Image Processing


Course 7

precipitations occurs over the ocean. One way to obtain these


figures is to use a satellite. The TRMM (Tropical Rainfall
Measuring Mission) satellite utilizes, among others, three
sensors specially designed to detect rain: a precipitation radar,
a microwave imager, and a visible and infrared scanner. The
results from the various rain sensors are processed, resulting
in estimates of average rainfall over a given time period in the
area monitored by the sensors. From these estimates, it is not
difficult to generate gray-scale images whose intensity values
50

Digital Image Processing


Course 7

correspond directly to rainfall, with each pixel representing a


physical land area whose size depends on the resolution of the
sensors.

51

Digital Image Processing


Course 7

Basics of Full-Color Image Processing

c R ( x , y ) R( x , y )

3
4
f : D / , f ( x , y ) c cG ( x , y ) G ( x , y )

cB ( x , y ) B( x , y )
C ( x , y )
M ( x , y )

f ( x, y) c
Y ( x , y )

K ( x, y)

52

Digital Image Processing


Course 7

Color Transformations
- processing the components of a color image within the

context of a single color model


g ( x , y ) T f ( x , y )
si Ti ( r1 , r2 ,..., rn ) , i 1, 2,..., n ( f ( x , y ) r , g ( x , y ) s )
ri, si are the color components of f(x,y) and g(x,y), n is the

number of color components, and {T1, T2,, Tn} is a set of


transformations or color mapping functions that operate on

ri to produce si. (n=3 or n=4)


53

Digital Image Processing


Course 7

54

Digital Image Processing


Course 7

In theory, any transformation can be performed in any color


model. In practice, some operations are better suited to
specific color models.
Suppose we wish to modify the intensity of a color image,
using
g( x , y ) k f ( x , y ) , 0 k 1

In the HSI color space, this can be done with:


s1= r1 , s2= r2 , s3=k r3

55

Digital Image Processing


Course 7

In the RGB/CMY color model all components must be


transformed
s1= kr1 , s2= kr2 , s3=kr3 (RGB)
si = kri+(1-k) , i=1,2,3 (CMY)

Although the HSI transformation involves the fewest number


of operations, the costs for converting an RGB or CMY(K)
image to the HSI color space are much bigger than the
transformations.

56

Digital Image Processing


Course 7

57

Digital Image Processing


Course 7

Color Complements

The hues directly opposite one another on the above color


circle are called complements (analogous to the gray-scale

negatives).

58

Digital Image Processing


Course 7

59

Digital Image Processing


Course 7

Unlike the intensity transformation, the RGB complement


transformation functions used in this example do not have
straightforward

HSI

space

equivalent.

The

saturation

component of the complement cannot be computed from the


saturation component of the input image alone.
Color Slicing

Highlighting a specific range of colors in an image is useful


for separating objects from their surroundings. The basic idea
is either to:
60

Digital Image Processing


Course 7

- display the colors of interest so they stand out from the

background
- use the region defined by the colors as a mask for further

processing.
One of the simplest ways to slice a color image is to map
the colors outside some range of interest to a neutral color.
If the colors of interest are enclosed by a cube (or
hypercube, if n>3) of width W and centered at a

61

Digital Image Processing


Course 7

prototypical

(e.g.

average)

color

with

components

a1 , a2 ,..., an the set of transformations is:


W

, 1 j n
0.5 if rj a j
si
2
ri
otherwise

, i 1, 2,..., n

These transformations highlight the colors around the


prototype by forcing all other colors to the midpoint of the
reference color space (an arbitrarily chosen neutral point).

62

Digital Image Processing


Course 7

For the RGB color space, for example, a suitable neutral point
is middle gray or color (0.5, 0.5, 0.5).
If a sphere is used to specify the colors of interest, the
transformations are:
n

2
2
0.5
if
(
r
a
)
R

j
j
0
si
j 1
r
otherwise
i

, i 1, 2,..., n

where R0 is the radius of the enclosing sphere and

a1 , a2 ,..., an are the components of its center.


63

Digital Image Processing


Course 7

64

Digital Image Processing


Course 7

Tone and Color Corrections

The effectiveness of such transformations is judged ultimately


in print. The transformations are developed and evaluated on
monitors. It is necessary to have a high degree of consistency
between the monitors and the output devices. This is best
accomplished with a device-independent color model that
relates the color gamuts of the monitors and output devices,
as well as any other devices being used, to one another. The
model of choice for many color management systems (CMS)
65

Digital Image Processing


Course 7

is the CIE L*a*b* model, also called CIELAB. The L*a*b*


color components are given by the following equations:
Y
L* 116 h
YW

16

X
Y
a* 500 h
h
YW
XW

Y
b* 200 h
YW

66

Z
h

ZW

Digital Image Processing


Course 7

3 q

h(q )
16
7.787q
116

q 0.008856
q 0.008856

X W , YW , ZW are reference tristimulus values typically the

white of a perfectly reflecting diffuser under CIE standard


D65 illumination ( x 0.3127 , y 0.33290 , z 1 x y ).
The L*a*b* color space is colorimetric (i.e. colors perceived
as matching are encoded identically), perceptually uniform
(i.e. color differences among various hues are perceived
67

Digital Image Processing


Course 7

uniformly), and device independent. Like the HSI system, the


L*a*b* system is an excellent decoupler of intensity

(represented by lightness L*) and color (represented by a* for


red minus green and b* for green minus blue), making it
useful in both image manipulation (tone and contrast editing)
and image compression applications.
Histogram Processing

It is not advisable to histogram equalize the components of a


color image independently. This can produce wrong colors. A
68

Digital Image Processing


Course 7

more logical approach is to spread the color intensity


uniformly, leaving the colors themselves (e.g., hues)
unchanged. The HSI color space is ideally suited for this type
of approach.

69

Digital Image Processing


Course 7

The unprocessed image contains a large number of dark


colors that reduce the median intensity to 0.36. Histogram
equalizing the intensity component, without altering the hue
and saturation produced image Figure 6.37(c). The image is
brighter. Figure 6.37(d) was obtained by increasing also the
saturation component.

70

Digital Image Processing


Course 7

Color Image Smoothing

Let Sxy denote a neighborhood centered at (x,y) in an RGB


color image. The average of the RGB component vectors in
this neighborhood is:

1
c ( x, y)
K

K
1
c ( x, y)
K

1
K

c( s , t )

( s , t ) S xy

71

R( s , t )

( s , t ) S xy

G ( s, t )

( s , t ) S xy

B( s , t )

( s , t ) S xy

Digital Image Processing


Course 7

72

Digital Image Processing


Course 7

Color Image Sharpening


g( x , y ) f ( x , y ) c 2 f ( x , y )

73

Digital Image Processing


Course 7

2 R( x , y )
2

2
c( x , y ) G ( x , y )
2 B( x , y )

74

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Morphological Image Processing


Morphology deals with form and structure. Mathematical
morphology is a tool for extracting image components that are
useful in the representation and description of region shape,
such as boundaries, skeletons, and the convex hull. In this
chapter, the inputs are images but the outputs are attributes
extracted from these images.

Digital Image Processing


Course 8

Preliminaries
The reflection of a set B, denoted B is defined as

B { w ; w b , for b B }
The translation of a set B by point z = (z1, z2), denoted (B)z
is defined as

( B ) z { c ; c z b for b B }

Digital Image Processing


Course 8

Set reflection and translation are used in morphology to


formulate operations based on so-called structuring elements
(SE): small sets or subimages used to probe an image under
study for properties of interest. In addition to a definition of
which elements are members of the SE, the origin of a

Digital Image Processing


Course 8

structuring element also must be specified. The origin of the


SE is usually indicated by a black dot. When the SE is
symmetric and no dot is shown, the assumption is that the
origin is at the center of symmetry.
When working with images, it is required that structuring
elements are rectangular arrays. This is accomplished by
appending the smallest possible number of background
elements necessary to form a rectangular array.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Erosion and Dilation


Many of the morphological algorithms are based on these two
primitive operations: erosion and dilation.
Erosion
Let A and B be to sets in 2 . The erosion of A by B, denoted
A B is defined as

A B { z ; ( B )z A }
This definition indicates that the erosion of A by B is the set
of all points z such that B, translated by z, is contained in A.

Digital Image Processing


Course 8

In the following, set B is assumed to be a structuring element.


Because the statement that B has to be contained in A is
equivalent to B not shearing any common elements with the
background, erosion can be expressed equivalently:

A B { z ; ( B ) z Ac }

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Equivalent definitions of erosion:

A B { w 2 ; w b A for every b B }
A B ( A) b
bB

Erosion shrinks or thins objects in a binary image. We can


view erosion as a morphological filtering operation in which
image details smaller than the structuring element are filtered
(removed) from the image.

Digital Image Processing


Course 8

Dilation
Let A and B be to sets in 2 . The dilation of A by B, denoted
A B is defined as:

A B { z ; ( B )z A }
The dilation of A by B is the set of all displacements, z, such
that B and A overlap by at least one element. The above
definition can be written equivalently as:

A B { z ; ( B ) z A A }
We assume that B is a structuring element.

Digital Image Processing


Course 8

Equivalent definitions of dilation:


A B { w 2 ; w a b , for some a A and b B }
A B ( A) b
bB

The basic process of rotating B about its origin and then


successively displacing it so that it slides over set (image) A
is analogous to spatial convolution. Dilation being based on
set operations is a nonlinear operation, whereas convolution is
a linear operation.

Digital Image Processing


Course 8

Unlike erosion which is a shrinking or thinning operation,


dilation grows or thickens objects in a binary image. The
specific manner and the extent of this thickening are
controlled by the shape of the structuring element used.

Digital Image Processing


Course 8

One of the simplest applications of dilation is for bridging


gaps.

Digital Image Processing


Course 8

Duality
Erosion and dilation are duals of each other with respect to set
complementation and reflection:
c
A

A
B

c
A

A
B

The duality property is useful particularly when the


structuring element is symmetric with respect to its origin, so
that B B . Then, we can obtain the erosion of an image by B

Digital Image Processing


Course 8

simply by dilating its background (i.e. dilating Ac) with the


same structuring element and complementing the result.
Opening and Closing
Opening generally smoothes the contour of an object, breaks

narrow isthmuses, and eliminates thin protrusions. Closing


also tends to smooth section of contours but, as opposed to
opening, it generally fuses narrow breaks and long thin gulfs,
eliminates small holes, and fills gaps in the contour.

Digital Image Processing


Course 8

The opening of set A by structuring element B is defined as:


A B A B B

Thus, the opening of A by B is the erosion of A by B,


followed by a dilation of the result by B.
Similarly, the closing of set A by structuring element B is
defined as:
A B A B B

which says that the closing of A by B is the dilation of A by


B, followed by an erosion of the result by B.

Digital Image Processing


Course 8

The opening operation has a simple geometric interpretation.


Suppose that we view the structuring element B as a (flat)
rolling ball. The boundary of A B is then established by
the points in B that reach the farthest into the boundary of A
as B is rolled around the inside of this boundary. The opening
of A by B is obtained by taking the union of all translates of B
that fit into A.
A B { ( B )z ; ( B )z A }

Digital Image Processing


Course 8

Closing has a similar geometric interpretation, except that


now we roll B on the outside of the boundary.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Opening and closing are dual of each other with respect to


set complementation and reflection:
c
A

A
B

c

A B

Ac B

The opening operation satisfies the following properties:


1.

A B A

2.

if C D then C B D B

3.

A B B A B

Digital Image Processing


Course 8

Similarly, the closing operation satisfies the following


properties:
1)

A AB

2)

if C D then C B D B

3)

AB B AB

Condition 3 in both cases states that multiple openings or


closings of a set have no effect after the operator has been
applied once.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

The Hit-or-Miss Transformation


The morphological hit-or-miss transformation is a basic tool
for shape detection. Consider the set A from Figure 9.12
consisting of three shapes (subsets) denoted C, D, and E. The
objective is to locate one of the shapes, say, D.
Let the origin of each shape be located at its center of gravity.
Let D be enclosed by a small window, W. The local
background of D with respect to W is defined as the set

difference (W-D) (Figure 9.12(b)). Figure 9.12(c) shows the

Digital Image Processing


Course 8

complement of A. Fig. 9.12(d) shows the erosion of A by D.


Figure 9.12(e) shows the erosion of the complement of A by
the local background set (W-D). From Figures 9.12(d) and
(e) we can see that the set of location for which D exactly fits
inside A is the intersection of the erosion of A by D and the
erosion of Ac by (W-D) as shown in Figure 9.12(f).
If B denotes the set composed of D and its background, the
match (or the set of matches) of B in A, denoted A B is:

A B ( A D ) Ac (W D )

Digital Image Processing


Course 8

Digital Image Processing


Course 8

We can generalize the notation by letting B = (B1, B2), where

B1 is the set formed from elements of B associated with an


object and B2 is the set of elements of B associated with the
corresponding background (B1=D, B2=W-D) in the preceding
example).

A B ( A B1 ) Ac B2
The set A B contains all the (origin) points at which,
simultaneously, B1 found a match (hit) in A and B2 found a

Digital Image Processing


Course 8

match in Ac. Taking into account the definition and properties


of erosion we can rewrite the above relation as:

A B ( A B1 ) ( A B2 )
The above three equations for A B are referred as the
morphological hit-or-miss transform.

Digital Image Processing


Course 8

Some Basic Morphological Algorithms


When dealing with binary images, one of the principal
applications

of

morphology

is

in

extracting

image

components that are useful in the representation and


description of shape. We consider morphological algorithms
for extracting boundaries, connected components, the convex
hull, and the skeleton of a region.
The images are shown graphically with 1s shaded and 0s in
white.

Digital Image Processing


Course 8

Boundary Extraction
The boundary of a set A, denoted (A), can be obtained by
first eroding A by B and then performing the set difference
between A and its erosion.

( A) A A B
where B is a suitable structuring element.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Hole Filling
A hole may be defined as a background region surrounded by
a connected border of foreground pixels.

We present an

algorithm based on set dilation, complementation, and


intersection for filling holes in an image.
Let A denote a set whose elements are 8-connected
boundaries, each boundary enclosing a background region

Digital Image Processing


Course 8

(i.e. a hole). Given a point in each hole, the objective is to fill


all the holes with 1s.
We form an array, X0, of 0s (the same size as the array
containing A), except at the location in X0 corresponding to
the given point in each hole, which is set to 1. The following
procedure fills all the holes with 1s:
X k X k 1 B Ac

, k 1, 2, 3,...

where B is the symmetric structuring element in Figure


9.15(c).

The algorithm terminates at iteration step k if

Digital Image Processing


Course 8

Xk=Xk-1. The set Xk then contains all the filled holes. The set

union of Xk and A contains all the filled holes and their


boundaries.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Extraction of Connected Components

Extraction of connected components from binary images is


important in many automated image analysis applications.
Let A be a set containing one or more connected
components. Form an array X0 (of the same size as the array
containing A) whose elements are 0s (background values),
except at each location known to correspond to a point in
each connected component in A, which we set to 1

Digital Image Processing


Course 8

(foreground value). The objective is to start with X0 and find


all the connected components.
The procedure that accomplishes this task is the following:
X k ( X k 1 B ) A , k 1, 2, 3,...

where B is a suitable structuring element. The procedure


terminates when Xk = Xk-1, with Xk containing all connected
components of the input image.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Figure 9.18(a) shows an X-ray image of a chicken breast that


contains bone fragment. It is of considerable interest to be
able to detect such objects in processed food befor packing
and/or shiping. In this case, the density of the bones is such
that their normal intensity values are different from the
background. This makes extraction of the bones from the
background a simple matter by using a single threshold. The
result is the binary image in Figure 9.18(b). We can erode the
thresholded image so that only objects of significant size

Digital Image Processing


Course 8

remain. In this example, we define as significant any object


that remains after erosion with a 55 structuring elemnt of 1s.
The result of erosion is shown in Figure 9.18(c). The next
step is to analyse the objects that remain. We identify these
objects by extracting the connected components in the image.
There are a total of 15 connected components, with four of
them being of dominant size. This is enough to determine that
significant undesirable objects are containd in the original
image.

Digital Image Processing


Course 8

Convex Hull

A set A is said to be convex if the straight line segment


joining any two points in A lies entirely within A. The convex
hull H of an arbitrary set S is the smallest convex set

containing S. The set difference H-S is called the convex


deficiency of S.

Convex hull and convex deficiency are

useful for objects description.


We present a simple morphological algorithm for obtaining
the convex hall C(A) of a set A.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Let Bi , i=1,2,3,4, represent the four structuring elements in


Figure 9.19(a). The procedure consists of implementing the
equation:
X 0i A
X ki ( X ki 1 B i ) A , i 1, 2, 3,4 and k 1, 2, 3,...

When the procedure converges ( X ki X ki 1 ), we let Di X ki .


Then the convex hull of A is
4

C ( A) D i
i 1

Digital Image Processing


Course 8

The method consists of iteratively applying the hit-or-miss


transform to A with B1; when no further changes occur, we
perform the union with A and call the result D1. The
procedure is repeated with B2 (applied to A) until no further
chances occur; and so on. The union of the four resulting Ds
constitutes the convex hull of A.

Digital Image Processing


Course 8

Morphological Reconstruction
Morphological reconstruction is a powerful morphological
transformation that involves two images and a structuring
element. One image, the marker, contains the starting points
for the transformation. The second image, the mask,
constrains the transformation. The structuring element is used
to define connectivity.

Digital Image Processing


Course 8

Geodesic dilation and erosion


Let F denote the marker image and G the mask image. We
assume that both F and G are binary images and that F G.
The geodesic dilation of size 1 of the marker image with
respect to the mask is defined as:

DG(1) ( F ) ( F B ) G
The geodesic dilation of size n of F with respect to G is
defined as:

DG( n ) ( F ) DG(1) DG( n1) ( F ) , with DG(0) ( F ) F

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Similarly, the geodesic erosion of size 1 of marker F with


respect to mask G is defined as:

EG(1) ( F ) ( F B ) G
The geodesic erosion of size n of F with respect to G is
defined as:

EG( n ) ( F ) EG(1) EG( n1) ( F ) with EG(0) ( F ) F

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Morphological reconstruction by dilation and erosion


Morphological reconstruction by dilation of a mask image G

from a marker image F, denoted RGD ( F ) is defined as the


geodesic dilation of F with respect to G, iterated until
stability is achieved:

RGD ( F ) DG( k ) ( F ) , with k such that DG( k ) ( F ) DG( k 1) ( F )


In a similar manner, the morphological reconstruction by
erosion of a mask image G from a marker image F, denoted

Digital Image Processing


Course 8

RGE ( F ) is defined as the geodesic dilation of F with respect to


G, iterated until stability is achieved:
RGE ( F ) EG( k ) ( F ) , with k such that EG( k ) ( F ) EG( k 1) ( F )

Digital Image Processing


Course 8

Opening by reconstruction
In a morphological opening, erosion removes small objects
and the dilation attempts to restore the shape of the objects
that remains. The accuracy of this restoration depends on the
similarity of the shape of the objects and the structuring
element used. Opening by reconstruction restores exactly the
shapes of the objects that remain after erosion. The opening
by reconstruction of size n of an image F is defined as the
reconstruction by dilation of F from the erosion of size n of F

Digital Image Processing


Course 8

OR( n ) ( F ) RFD F nB
where ( F nB ) denotes n erosions of F by B.
Figure 9.29 shows an example of opening by reconstruction.
We are interested in extracting from this image the characters
that contain long, vertical strokes. Opening by reconstruction
requires at least one erosion operation that was performed and
produced Figure 9.29(b). The structuring elements length
was proportional to the average height of the tall characters
(51 pixels) and width of one pixel.

Digital Image Processing


Course 8

In Figure 9.29(c) is the opening of the image using the same


structuring element. Figure 9.29(d) is the opening by
reconstruction of size 1 of F.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Filling holes
The following procedure is a fully automated procedure for
filling holes based on morphological reconstruction. Let

I(x,y) denote a binary image and we form a marker image F


that is 0 everywhere, except at the image border, where it is
ser to 1-I, that is:
1 I ( x , y )
F ( x, y)
0

if ( x , y ) is on the border of I
otherwise

Digital Image Processing


Course 8

Then

H R ( F )
D
Ic

is a binary image equal to I with all holes filled.

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Digital Image Processing


Course 8

Digital Image Processing


Course 9

Digital Image Processing


Course 9

Gray-Scale Morphology
In this section we denote by f(x,y) the gray-scale image and
by b(x,y) the structuring element. Structuring elements in
gray-scale morphology are of two categories: nonflat and flat.

Digital Image Processing


Course 9

As in the binary case, the origin of the structuring element


must be clearly identified. In this section only symmetrical,
flat structuring elements of unit height whose origin are the
center are considered. The reflection of a structuring element
in gray-scale morphpology is defined as:

b ( x , y ) b( x , y ).

Digital Image Processing


Course 9

Erosion and Dilation


The erosion of f by a flat structuring element b at any locatio

(x,y) is defined as the minimum value of the image in the


region coincident with b when the origin of b is at (x,y):

f b ( x , y ) min f ( x s, y t ) ; ( s, t ) b
To find the erosion of f by b, we place the origin of the
structuring element at every pixel location in the image. The
erosion at any location is determined by selecting the

Digital Image Processing


Course 9

minimum value of f from all the values of f contained in the


region coincident with b.
Similarly, the dilation of f by a flat structuring element b at
any location (x,y) is defined as the maximum value of the
image in the window outlined by b when the origin of b is at

(x,y):

f b ( x , y ) max f ( x s, y t ) ; ( s, t ) b
Because gray-scale erosion with a flat SE computes the
minimum value of f in every neighborhood of (x,y) coincident

Digital Image Processing


Course 9

with b, we expect in general that an eroded gray-scale image


will be darker than the original, that the sizes (with respect to
the size of the SE) of bright features will be reduced, and that
the sizes of dark feature will be increased.
Nonflat SE have gray-scale values that vary over their domain
of definition. The erosion of image f by nonflat structuring
element bN is defined as
f bN ( x , y ) min f ( x s , y t ) bN ( s , t ) ; ( s , t ) bN

Digital Image Processing


Course 9

Erosion using a nonflat element is not bounded in general by


the values of f, which can present problems in interpreting
results, and that is the reason for which this type of erosion is
seldom used in practice.
In a similar manner, dilation using nonflat SE is defined as:
f bN ( x , y ) max f ( x s , y t ) bN ( s , t ) ; ( s , t ) bN

As in binary case, erosion and dilation are duals with respect


to function complementation and reflection, that is:

Digital Image Processing


Course 9

( f b )c ( x , y ) ( f c b )( x , y ) ,
f c ( x , y ) f ( x , y ) , b b( x , y )

( f b )c ( f c b )

( f b )c ( f c b )

Opening and Closing


The expressions for opening and closing gray-scale images
have the same form as for the binary image. The opening of
image f by structuring element b is:

f b ( f b) b

Digital Image Processing


Course 9

Opening is simply the erosion of f by b, followed by dilation


of the result with b. Similarly, the closing of f by b is:

f b ( f b) b
The opening and the closing for gray-scale images are dual
with respect to complementation and SE reflection:

( f b )c f c b

( f b )c f c b

Opening and closing of images have a simple geometric


interpretation. Suppose that an image function f(x,y) is
viewed as a 3-D surface. The opening of f by b can be

Digital Image Processing


Course 9

interpreted geometrically as pushing the structuring element


up from below against the undersurface of f. At each location
of the origin of b, the opening is the highest value reached by
any part of b as it pushes up against the undersurface of f. The
complete opening is the set of all such values obtained by
having the origin of b visit every (x,y) coordinate of f. Figure
9.36 illustrates the concept in one dimension.
The gray-scale opening operation satisfies the following
properties:

Digital Image Processing


Course 9

Digital Image Processing


Course 9

1. f b f
2. If f1 f 2 then f1 b

f2 b

3. f b b f b
The notation e r indicates that the domain of e is a subset
of the domain of r, and also that e(x, y) r(x, y) for any (x, y)
in the domain of e.
Similarly, the closing operation satisfies the following
properties:

Digital Image Processing


Course 9

a. f f b
b. If f1 f 2 then f1 b

f2 b

c. f b b f b

Some Basic Gray-Scale Morphological Algorithms


Morphological smoothing
Because opening suppresses bright details smaller than the
specified SE, and closing suppresses dark details, they are

Digital Image Processing


Course 9

used often in combination as morphological filters for image


smoothing and noise removal.
Figure 9.38(a) is the image of Cygnus Loop supernova taken
in X-ray band and the region of interest in the central light
region, the smaller components are noise. The objective is to
remove the noise. Figure 9.38(b) shows the result of opening
the original image with a flat disk of radius 2 and then closing
the opening with an SE of the same size. Figures 9.38(c) and
(d) show the results of the same operation using SEs of radii 3

Digital Image Processing


Course 9

and 5, respectively. The noise components on the lower side


of the image could not be removed completely because of
their density.

Digital Image Processing


Course 9

Digital Image Processing


Course 9

Morphological gradient
Dilation and erosion can be used in combination with image
subtraction to obtain the morphological gradient of an image:

g f b f b
The dilation thickens regions in an image and the erosion
shrinks them. Their difference emphasizes the boundaries
between regions. Homogeneous areas are not affected (as
long as the SE is relatively small) so the subtraction operation
tends to eliminate them. The net result is an image in which

Digital Image Processing


Course 9

the edges are enhanced and the contribution of the


homogeneous areas is suppressed, thus producing a
derivative-like (gradient) effect.
Figure 9.39(a) is a head CT scan, and the next two figures are
the opening and the closing with a 3 3 SE of all 1s. Figure
9.39(d) is the morphological gradient, in which the
boundaries between regions are clearly delineated.

Digital Image Processing


Course 9

Digital Image Processing


Course 9

Image Segmentation
Segmentation subdivides an image into its constituent regions
and objects. The level of detail to which the subdivision is
carried depends on the problem being solved. Segmentation
should stop when the objects or regions of interest in an
application have been detected. For example, in the
automated inspection of electronic assemblies, interest lies in
analyzing images of products with the objective of
determining the presence or absence of specific anomalies,

Digital Image Processing


Course 9

such as missing components or broken connection paths.


There is no point in carrying segmentation past the level of
detail required to identify those elements.
Segmentation of nontrivial images is one of the most
difficult tasks in image processing. Segmentation accuracy
determines the eventual success or failure of computerized
analysis procedure.

Digital Image Processing


Course 9

Most of the segmentation algorithms described in this


section are based on one of two basic properties of intensity
values:
- discontinuity
- similarity
In the first category, the approach is to partition an image
based on abrupt changes in intensity, such as edges. The
principal approaches in the second category are based on
partitioning an image into regions that are similar according

Digital Image Processing


Course 9

to a set of predefined criteria. Thresholding, region growing,


and region splitting and merging are examples of methods in
this category.

Fundamentals
Let R represent the entire spatial region occupied by an
image. Image segmentation can be viewed as a process that
partitions R into n subregions R1, R2,, Rn such that:

Digital Image Processing


Course 9
n

(a)

R
i 1

(b) Ri is a connected set, i = 1,2,,n


(c) Ri R j for all i and j , i j
(d) Q( Ri ) = TRUE for i = 1,2,,n
(e) Q(Ri Rj ) = FALSE for any adjacent regions Ri , Rj

Q(R) is a logical predicate defined over the points in set R.


Condition (a) indicates that the segmentation must be
complete; that every pixel must be in a region. Condition (b)

Digital Image Processing


Course 9

requires that points in a region be connected in some


predefined sense (e.g. the points must be 4- or 8-connected).
Condition (c) indicates that the regions must be disjoint.
Condition (d) deals with the properties that must be satisfied
by the pixels in a segmented region (for example Q(Ri ) is true
if all pixels have the same intensity level). Finally, condition
(e) indicates that two adjacent regions Ri and Rj must be
different in the sense of predicate Q.

Digital Image Processing


Course 9

The fundamental problem in segmentation is to partition an


image into regions that satisfy the preceding conditions.
Segmentation algorithms for monochrome images generally
are based on one of two basic categories dealing with
properties of intensity values: discontinuity and similarity. In
the first category, the assumption is that boundaries of regions
are sufficiently different from each other and from the
background to allow boundary detection based on local
discontinuities in intensity.

Digital Image Processing


Course 9

Edge-based segmentation is the principal approach used in

this category. Region-based segmentation approaches from


the second category are based on partitioning an image into
regions that are similar according to a set of predefined
criteria.

Digital Image Processing


Course 9

Digital Image Processing


Course 9

Point, Line, and Edge Detection


Edge pixels are pixels at which the intensity of an image

function changes abruptly, and edges (or edge segments) are


sets of connected edge pixels. Edge detectors are local image
processing methods designed to detect edge pixels. A line
may be viewed as an edge segment in which the intensity of
the background on either side of the line is either much higher
or much lower than the intensity of the line pixels.

Digital Image Processing


Course 9

Background
Abrupt, local changes in intensity can be detected using
derivatives, usually first- and second-order derivatives which
are defined in terms of differences.
Any approximation for a first derivative must be:
(1) zero in areas of constant intensity
(2) nonzero at the onset of an intensity step or ramp
(3) nonzero at points along an intensity ramp.

Digital Image Processing


Course 9

An approximation for a second derivative must be:


(1) zero in areas of constant intensity
(2) nonzero at the onset and end of an intensity step or ramp
(3) zero along intensity ramps.

f
f ( x ) f ( x 1) f ( x )
x
2 f f ( x )

f ( x 1) f ( x ) f ( x 2) 2 f ( x 1) f ( x )
2
x
x
2 f
f ( x ) f ( x 1) f ( x 1) 2 f ( x )
2
x

Digital Image Processing


Course 9

Digital Image Processing


Course 9

(a) First-order derivatives generally produce thicker


edges in an image
(b) Second-order derivatives have a stronger response to
fine detail, such as thin lines, isolated points, and
noise
(c) Second-order derivatives produce a double-edge
response at ramp and step transitions in intensity

Digital Image Processing


Course 9

(d) The sign of the second-order derivative can be used


to determine whether a transition into an edge is
from light to dark or dark to light.
Computing first and second derivatives at every pixel location
is done using spatial filters.

Digital Image Processing


Course 9

The response of the mask at the center point of the region is:
9

R w1 z1 w2 z2 w9 z9 wk zk

(1)

k 1

where zk is the intensity of the pixel whose spatial location


corresponds to the location of the k-th coefficient in the mask.

Detection of isolated points


Point detection is based on computation of the second
derivative of the image.

Digital Image Processing


Course 9
2
2

f
2
f ( x, y)
2 the Laplacian
2
x
y

2 f
f ( x 1, y ) f ( x 1, y ) 2 f ( x , y )
2
x
2 f
f ( x , y 1) f ( x , y 1) 2 f ( x , y )
2
y

2 f ( x , y ) f ( x 1, y ) f ( x 1, y ) f ( x , y 1)
f ( x , y 1) 4 f ( x , y )

Digital Image Processing


Course 9

Digital Image Processing


Course 9

Using the Laplacian mask in Figure 10.4(a) we say that a


point has been detected at the location (x,y) on which the
mask is centered if the absolute value of the response of the
mask at that point exceeds a specified threshold. Such points
are labeled 1 in the output image and all others are labeled 0,
thus producing a binary image. The output is obtained using
the following expression:
1
g( x , y )
0

if

R( x , y ) T

otherwise

Digital Image Processing


Course 9

Where g is the output image, T > 0 is the threshold, and R is


given by (1). This formulation measures the weighted
difference between a pixel and its 8-neighbors. The idea is
that the intensity of an isolated point will be quite different
from its surroundings and thus will be easily detectable by
this type of mask. The only differences in intensity that are
considered of interest are those large enough (as determined
by T) to be considered isolated points. The sum of the

Digital Image Processing


Course 9

coefficients of the mask is zero, indicating that the mask


response will be zero in areas of constant intensity.
Line Detection
For line detection we can expect second derivatives to result
in a stronger response and to produce thinner lines than first
derivatives. We can use the Laplacian mask in Figure 10.4(a)
for line detection also, taking care of the double-line effect of
the second order derivative.

Digital Image Processing


Course 9

Figure 10.5(a) shows a 486 486 (binary) portion of a


wire-bond mask for an electronic circuit and Figure 10.5(b)
shows its Laplacian. Scaling is necessary in this case (the
Laplacian image contains negative values). Mid gray
represents 0, darker shades of gray represent negative values,
and lighter shades are positive. It might appear that negative
values can be handled simply by taking the absolute value of
the Laplacian image. Figure 10.5(c) shows that this approach
doubles the thickness of the lines. A more suitable approach

Digital Image Processing


Course 9

is to use only the positive values of the Laplacian (Figure


10.5(d)).

Digital Image Processing


Course 9

The Laplacian detector in Figure 10.4(a) is isotropic, so its


response is independent of the direction (with respect to the
four directions of the 3 3 Laplacian mask: vertical,
horizontal, and two diagonals). Often, interest lies in
detecting lines in specified directions.
Consider the masks in Figure 10.6. Suppose that an image
with a constant background and containing various lines
(oriented at 0, 45 and 90) is filtered with the first mask.
The maximum responses would occur at image locations in

Digital Image Processing


Course 9

which horizontal lines passed through the middle row of the


mask.

Digital Image Processing


Course 9

A similar experiment would reveal that the second mask in


Figure 10.6 responds best to lines oriented +45; the third
mask to vertical lines; and the fourth mask to lines in the -45
direction.
Let R1, R2, R3 and R4 denote the response of the masks in
Figure 10.6 from left to right, where the Rs are given by (1).
Suppose that an image is filtered (individually) with the four
masks. If at a given point in the image |Rk| > |Rj|, for all j k,

Digital Image Processing


Course 9

that point is said to be more likely associated with a line in


the direction of mask k.
If we are interested in detecting all the lines in an image in the
direction defined by a given mask, we simply run the mask
through the image and threshold the absolute value of the
result. The points that are left are the strongest responses
which, for line 1 pixels thick, correspond closest to the
direction defined by the mask.

Digital Image Processing


Course 9

In Figure 10.7(a) image we are interested in lines oriented at


+45. We use the second mask, the result is in Figure 10.7(b).

Digital Image Processing


Course 9

Edge Models
Edge detection is the approach used most frequently for
segmenting images based on abrupt (local) changes in
intensity.
Edge models are classified according to their intensity
profiles. A step edge involves a transition between two
intensity levels occurring ideally over the distance of 1 pixel.
Figure 10.8(a) shows a section of a vertical step edge and a
horizontal intensity profile through the edge.

Digital Image Processing


Course 9

In practice, digital images have edges that are blurred and


noisy, with the degree of blurring determined principally by
limitations in the focusing mechanism, and the noise level
determined principally by the electronic components of the
imaging system. In such situations, edges are more closely

Digital Image Processing


Course 9

modeled as having an intensity ramp profile, such as the edge


in Figure 10.8(b). The slope of the ramp is inversely
proportional to the degree of blurring in the edge. In this
model, we no longer have a thin (1 pixel thick) path. An edge
point now is any point contained in the ramp and an edge
segment would then be a set of such points that are
connected.
A third model of an edge is the so-called roof edge, having
the characteristics illustrated in Figure 10.8(c). Roof edges

Digital Image Processing


Course 9

are models of lines through a region, with the base (width) of


a roof edge being determined by the thickness and sharpness
of the line.
It is not unusual to find images that contain all three types of
edges.
The magnitude of the first derivative can be used to detect the
presence of an edge at a point in an image. Similarly, the sign
of the second derivative can be used to determine whether an

Digital Image Processing


Course 9

edge pixel lies on the dark or light side of an edge. The


second derivative has the following properties:
(1) it produces two values for every edge in an image (an
undesirable feature)
(2) its zero crossing can be used for locating the centers of
thick edges
The zero crossing of the second derivative is the intersection
between the zero intensity axis and a line extending between
the extrema of the second derivative.

Digital Image Processing


Course 9

Digital Image Processing


Course 9

There are three fundamental steps performed in edge


detection:
1. Image smoothing for noise reduction
2. Detection of edge points this is a local operation that
extracts from an image all points that are potential
candidates to become edge points
3. Edge localization the objective of this step is to select
from the candidate points only the points that are true
members of the set of points comprising an edge.

Digital Image Processing


Course 10

Digital Image Processing


Course 10

Basic Edge Detection


The image gradient and its properties
The gradient of an image is the tool for finding edge strength
and direction at location (x,y):

f
g x x
f grad ( f )
g y f
y

Digital Image Processing


Course 10

This vector has the important geometrical property that it


points in the direction of the greatest rate of change of f at
location (x,y).
The magnitude (length) of vector f

M ( x , y ) mag(f )

g x2 g 2y

is the value of the rate of change in the direction of the


gradient vector.

Digital Image Processing


Course 10

The direction of the gradient vector is given by the angle:


gx
( x , y ) arc tan
g y

measured with respect to the x-axis. The direction of an edge


at any arbitrary point (x,y) is orthogonal to the direction,

( x , y ) , of the gradient vector at the point.

Digital Image Processing


Course 10

The gradient vector sometimes is called the edge normal.


When the vector is normalized to unit length (by dividing it
by its magnitude) the resulting vector is commonly referred to
as the edge unit normal.

Gradient operators
f ( x , y )
gx
f ( x 1, y ) f ( x , y )
x
f ( x , y )
gy
f ( x , y 1) f ( x , y )
y

Digital Image Processing


Course 10

When diagonal edge direction is of interest, we need a 2-D


mask. The Roberts cross-gradient operators are one of the
earliest attempts to use 2-D masks with a diagonal preference.
Consider the 33 region in Figure 10.14(a). The Roberts
operators are based on implementing the diagonal differences.

Digital Image Processing


Course 10

Digital Image Processing


Course 10

f
z9 z5 f ( x 1, y 1) f ( x , y )
gx
x
f
z8 z6 f ( x 1, y ) f ( x , y 1)
gy
y
Masks of size 2 2 are simple conceptually, but they are
not as useful for computing edge direction as masks that are
symmetric about the center point, the smallest of which are of
size 3 3.

Digital Image Processing


Course 10

Prewitt operators

f
gx
( z7 z8 z9 ) ( z1 z2 z3 )
x
f
gy
( z3 z6 z9 ) ( z1 z4 z7 )
y
Sobel operators

f
( z7 2 z8 z9 ) ( z1 2 z2 z3 )
gx
x
f
( z3 2 z6 z9 ) ( z1 2 z4 z7 )
gy
y

Digital Image Processing


Course 10

The Sobel masks have better noise-suppression (smoothing)


effects than the Prewitt masks.

Digital Image Processing


Course 10

Digital Image Processing


Course 10

Digital Image Processing


Course 10

When interest lies both in highlighting the principal edges


and on maintaining as much connectivity as possible, it is
common practice to use both smoothing and thresholding.

Digital Image Processing


Course 10

Digital Image Processing


Course 10

More Advanced Techniques for Edge Detection


The edge-detection methods described until now are based on
filtering an image with one or more masks, without
approaching the edge characteristics or the noise content of
the image. In this section, the noise and the nature of the
edges are considered in more advanced edge-detection
techniques.

Digital Image Processing


Course 10

The Marr-Hildreth edge detector


Marr and Hildreth noticed that:
(1) intensity changes are not independent of image scale
and so their detection requires the use of operators of
different sizes;
(2) a sudden intensity change will give rise to a peak or
trough in the first derivative or, equivalently, to a zero
crossing in the second derivative.

Digital Image Processing


Course 10

These ideas suggest that an operator used for edge detection


should have two features:
1) it should be a differential operator capable of
computing a digital approximation of the first or
second derivative at every point in the image
2) it should be capable of being tuned to act at any
desired scale, so that large operators can be used to
detect blurry edges and small operators to detect
sharply focused fine detail.

Digital Image Processing


Course 10

Marr and Hilderth argued that the most satisfactory operator


fulfilling these conditions is the filter 2G , the Laplacian of

G, the 2-D Gaussian function with standard deviation :


G( x, y) e

x2 y2
2 2

(2)

x y 2
G( x, y)
e

x2 y2
2 2

(3)

Digital Image Processing


Course 10

The last expression is called the Laplacian of a Gaussian


(LoG).

Digital Image Processing


Course 10

Because of the shape illustrated in Figure 10.21(a), the LoG


function sometimes is called the Mexican hat operator. Figure
10.21(d) shows a 5 5 mask that approximates the shape in
Figure 10.21(a) (in practice, the negative of this mask is
used). This approximation is not unique. Its purpose is to
capture the essential shape of the LoG function.
Masks of arbitrary size can be generated by sampling
equation (3) and scaling the coefficients so that they sum to
zero. A more effective approach for generating LoG filters is

Digital Image Processing


Course 10

to sample equation (2) to the desired n n size and then


convolve the resulting array with a Laplacian mask, such as
the mask in Figure 10.4 (a).
There are two fundamental ideas behind the selection of the
operator 2G . First the Gaussian part of the operator blurs
the image, thus reducing the intensities of structures
(including noise) at scales much smaller than . The Gaussian
function is smooth in both spatial and frequency domains and
is thus less likely to introduce artifacts (e.g. ringing) not

Digital Image Processing


Course 10

present in the original image. Although first derivatives can


be used for detecting abrupt changes in intensity, they are
directional operators. The Laplacian, on the other hand, has
the important advantage of being isotropic (invariant to
rotation), which not only corresponds to characteristics of the
human visual system but also responds equally to changes in
intensity in any mask direction, thus avoiding having to use
multiple masks to calculate the strongest response at any
point in the image.

Digital Image Processing


Course 10

The Marr-Hildreth algorithm consists of convolving the LoG


filter with an input image f(x,y)

g ( x , y ) 2G ( x , y ) f ( x , y ) 2 G ( x , y ) f ( x , y )
The

Marr-Hildreth

edge-detection

algorithm

may

be

summarized as follows:
1. Filter the input image with an n n Gaussian lowpass
filter obtained by sampling equation (2)
2. Compute the Laplacian of the image resulting in Step 1
3. Find the zero crossing of the image from Step 2.

Digital Image Processing


Course 10

The size of an n n LoG discrete filter should be such that n


is the smallest odd integer greater than or equal to 6.
Choosing a filter mask smaller than this will tend to
truncate the LoG function, with the degree of truncation
being inversely proportional to the size of the mask; using a
larger mask would make little difference in the result.
One approach for finding the zero crossing at any pixel pm
of the filtered image, g(x,y), is based on using a 3 3
neighborhood centered at p. A zero crossing at p implies that

Digital Image Processing


Course 10

the signs of at least two of its opposing neighboring pixel


must differ. There are 4 cases to test: left/right, up/down, and
the two diagonals.

Digital Image Processing


Course 10

The Canny edge detector


Cannys approach is based on three basic objectives:
1. Low error rate. All edges should be found, and there
should be no false responses. The edges detected must
be as close as possible to the true edges
2. Edge points should be well localized. The edges located
must be as close as possible to the true edges, that is, the
distance between a point marked as an edge by the
detector and the center of the true edge should be minim.

Digital Image Processing


Course 10

3. Single edge response. The detector should return only


one point for each true edge point. That is, the number
of local maxima around the true edge should be minim.
This means that the detector should not identify multiple
edge pixels where only a single edge point exists.
In general, it is difficult (or impossible) to find a closed form
solution that satisfy all the preceding objectives. However,
using numerical optimization with 1-D step edges corrupted
by additive white Gaussian noise led to the conclusion that a

Digital Image Processing


Course 10

good approximation to the optimal step edge detector is the


first derivative of a Gaussian:
x2

x2

d 2 2
x 2 2
2e
.
e

dx
Let f(x,y) denote the input image and G(x,y) denote the
Gaussian function:

G( x, y) e

x2 y2
2 2

Form a smoothed, fs(x,y) , by convolving G and f:

Digital Image Processing


Course 10

f s ( x , y ) G ( x , y ) f ( x , y ) .
We compute the gradient magnitude and the angle for fs
M ( x, y)

g g ,
2
x

2
y

f s
f s
gx
, gy
x
y

gx
( x , y ) arctan
g y
M(x, y) contains ridges around local maxima. The next step is
to thin those ridges. One approach is to use nonmaxima
suppression. This can be done in several ways, but the

Digital Image Processing


Course 10

essence of this approach is to specify a number of discrete


orientations of the edge normal (gradient vector). For
example, in a 3 3 region we can define four orientations for
an edge passing through the center point of the image:
horizontal, vertical, +45 and -45.

Digital Image Processing


Course 10

Let d1, d2 , d3 and d4 denote the four basic edge directions for
a 33 region: horizontal, -45, vertical, and +45, respectively.

Digital Image Processing


Course 10

We can formulate the following nonmaxima suppression


scheme for a 3 3 region centered at every point (x,y) in

(x,y):
1. Find the direction dk that is closest to (x,y).
2. If the value of M(x,y) is less than at least one of its two
neighbors

along

dk,

let

gN(x,y)=0

(suppression);

otherwise, let gN(x,y)=M(x,y), where gN(x,y) is the


nonmaxima-supressed image.

Digital Image Processing


Course 10

The final operation is to threshold gN(x, y) to reduce false


edge points. If we set the threshold to low, there will still be
some false edges (called false positive). If the threshold is
too high, then actual valid edge points will be eliminated
(false negative). Cannys algorithm attempts to improve on
this situation by using hysteresis thresholding, which uses
two thresholds: a low threshold TL and a high threshold TH.
Canny suggested that the ration of the high to low threshold
should be two or three to one.

Digital Image Processing


Course 10

We can visualize the thresholding operation as creating two


additional images

g NH ( x , y ) g N ( x , y ) TH
g NL ( x , y ) g N ( x , y ) TL
g NH g NL 0 initially
After thresholding gNH(x,y) will have fewer nonzero pixels
than gNL(x,y) in general, but all the nonzero pixels in
gNL(x,y) will be contained in gNH(x,y) because the later

Digital Image Processing


Course 10

image is formed with a lower threshold. We eliminate from


gNL(x,y) all the nonzero pixels from gNH(x,y) by letting:
g NL ( x , y ) g NL ( x , y ) g NH ( x , y )

The nonzero pixels in gNL(x,y) and gNH(x,y) may be viewed


as being strong and weak edge pixels.
After the thresholding operation, all strong pixels in
gNH(x,y) are assumed to be valid edge pixels and are so

marked immediately. Depending on the value of TH , the

Digital Image Processing


Course 10

edges in gNH(x,y) typically have gaps. Longer edges are


formed using the following procedure:
(a) Locate the next unvisited edge pixel, p, in gNH(x,y).
(b) Mark as valid edge pixels all the pixels in gNL(x,y) that
are connected to p (using 8-connectivity, for example)
(c) If all nonzero pixels in gNH(x,y) have been visited go to
Step (d). Else return to Step (a).
(d) Set to zero all pixels in gNL(x,y) that were not marked as
valid edge pixels.

Digital Image Processing


Course 10

At the end of this procedure, the final image output by the


Canny algorithm is formed by appending to gNH(x,y) all the
nonzero pixels from gNL(x,y).
In practice, hysteresis thresholding can be implemented
directly during nonmaxima suppression, and thresholding can
be implemented directly on gN(x,y) by forming a list of strong
pixels and the weak pixels connected to them.

Digital Image Processing


Course 10

Canny edge detection algorithm consists of the following


basic steps:
1. Smooth the input image with a Gaussian filter.
2. Compute the gradient magnitude and angle images.
3. Apply

nonmaxima

suppression

to

the

gradient

magnitude image.
4. Use double thresholding and connectivity analysis to
detect and link edges.

Digital Image Processing


Course 10

Edge Linking and Boundary Detection

Ideally, edge detection should yield sets of pixels lying only


on edges. In practice, these pixels seldom characterizes edge
completely because of noise, breaks in the edges due to
nonuniform illumination, and other effects that introduce fake
discontinuities in intensity values. Therefore, edge detection
typically is followed by linking algorithms designed to
assemble edge pixels into meaningful edges and/or region
boundaries. We discuss three fundamental approaches to edge

Digital Image Processing


Course 10

linking that are representative of techniques used in practice.


The first requires knowledge about edge points in a local
region; the second requires that points on the boundary of a
region be known; the third is a global approach that works
with an entire edge image.
Local processing

A simple way to link edge points is to analyze the


characteristics of pixels in small neighborhood about every
point (x,y) that has been declared an edge point. All points

Digital Image Processing


Course 10

that are similar according to predefined criteria are linked,


forming an edge of pixels that share common properties
according to the specified criteria.
The two principal properties used for establishing similarity
of edge pixels in this kind of analysis are:
(1) the strength (magnitude)
(2) the direction
of the gradient vector. Let Sxy denote the set of coordinates of
a neighborhood centered at (x,y) in an image. An edge pixel

Digital Image Processing


Course 10

with coordinates (s,t) in Sxy is similar in magnitude to the


pixel at (x,y) if:
| M ( s , t ) M ( x , y ) | E , E 0 - positive threshold.

An edge pixel with coordinates (s,t) in Sxy has an angle


similar to the pixel at (x,y) if:
| ( s , t ) ( x , y ) | A , A 0 - positive angle threshold.

The direction of the edge at (x,y) is perpendicular to the


direction of the gradient vector at that point.

Digital Image Processing


Course 10

A pixel with coordinates (s, t) in Sxy is linked to the pixel at

(x, y) if both magnitude and direction criteria are satisfied.


This process is repeated at every location in the image. A
record must be kept of linked points as the center of the
neighborhood is moved from pixel to pixel. A simple
procedure would be to assign a different intensity value to
each set of linked edge pixels.
The preceding formulation is computationally expensive
because all neighbors of every point have to be examined. A

Digital Image Processing


Course 10

simplification

particularly

well

suited

for

real

time

applications consists of the following steps:


1. Compute the gradient magnitude and the angle arrays

M(x,y)and (x,y) of the input image f(x,y).


2. Form a binary image g:

1
g( x , y )
0

if M ( x , y ) TM AND ( x , y ) A TA
otherwise

where TM is a threshold, A is a specified angle direction,


and TA defines a band of acceptable directions about A

Digital Image Processing


Course 10

3. Scan the rows of g and fill (set to 1) all gaps (sets of 0s)
in each row that do not exceed a specified length K.
Note that a gap is bounded at both ends by one or more
1s. The rows are processed individually, with no
memory between them.
4. To detect gaps in any other direction , rotate g by this
angle and apply the horizontal scanning procedure in
Step 3. Rotate the result back by .

Digital Image Processing


Course 10

In general, image rotation is an expensive computational


process so, when linking in numerous angle directions is
required, it is more practical to combine Steps 3 and 4 into a
single radial scanning procedure.
Figure 10.27(a) shows an image of the rear of a vehicle. The
objective of this example is to illustrate the use of the
preceding algorithm for finding rectangles whose sizes makes
them suitable candidates for licence plates. The formation of

Digital Image Processing


Course 10

these rectangles can be accomplished by detecting strong


horizontal and vertical edges.

Digital Image Processing


Course 10

TM =30% of the maximum gradient value, A=90, TA = 45,


K=25.
Regional processing
Often the location of the regions of interest in an image is
known or can be determined. This implies that knowledge is
available regarding the regional membership of pixels in the
corresponding edge image. We can use techniques for linking
pixels on a regional basis, with the desired result being an
approximation to the boundary of the region. One approach is

Digital Image Processing


Course 10

to fit a 2-D curve to the known points. Interest lies in


fast-executing techniques that yield an approximation to
essential features of the boundary, such as extreme points and
concavities.

Polygonal

approximations

are

particularly

attractive because they capture the essential shape features of


a region while keeping the representation of the boundary
relatively simple. We present an algorithm suitable for this
purpose.

Digital Image Processing


Course 10

Two important requirements are necessary. First, two starting


points must be specified; second, all the points must be
ordered (e.g. in a clockwise or counter clockwise direction).
An algorithm for finding a polygonal fit to open and closed
curves may be stated as follows:
1. Let P be a sequence of ordered, distinct, 1-valued points
of a binary image. Specify two starting points, A and B.
These are the two starting vertices of the polygon.

Digital Image Processing


Course 10

2. Specify a threshold T, and two empty stacks OPEN and


CLOSED.
3. If the points in P correspond to a closed curve, put A
into OPEN and put B into OPEN and into CLOSED. If
the points correspond to an open curve, put A into
OPEN and put B into CLOSED.
4. Compute the parameters of the line passing from the last
vertex in CLOSED to the last vertex in OPEN.

Digital Image Processing


Course 10

5. Compute the distance from the line in Step 4 to all


points in P whose sequence places them between the
vertices from Step 4. Select the point Vmax with the
maximum distance Dmax (ties are resolved arbitrarily)
6. If Dmax > T, place Vmax at the end of the OPEN stack as a
new vertex. Go to step 4.
7. Else, remove the last vertex from OPEN and insert it as
the last vertex in CLOSED.

Digital Image Processing


Course 10

8. If OPEN is not empty go to Step 4.


9. Else, exit. The vertices in CLOSED are the vertices of
the polygonal fit to the points in P.

Digital Image Processing


Course 10

Digital Image Processing


Course 10

Global processing using the Hough transform


The previous methods assumed available knowledge about
pixels belonging to individual objects. Often, we work with
unstructured environments in which all we have is an edge
image and no knowledge about where objects of interest
might be. In such situations, all pixels are candidates for
linking and thus have to be accepted or eliminated based on
predefined global properties. The technique approach in this
section is based on whether set of pixels lie on curves of a

Digital Image Processing


Course 10

specified shape. Once detected, these curves form the edges


or region boundaries of interest.
Given n points in an image, suppose that we want to find
subsets of these points that lie on straight lines. One possible
solution is to find first of all lines determined by every pair of
points and then find all subsets of points that are close to
n( n 1)
n2
particular lines. This approach involves finding
2
n( n 1)
n 3 comparisons of
lines and then performing n
2

Digital Image Processing


Course 10

every point to all lines. This is a computationally prohibitive


task.
Hough proposed an alternative approach, commonly referred
to as Hough transform. Consider a point (xi , yi) in the
xy-plane and the general equation of a line that passes

through this point:


yi a x i b

Digital Image Processing


Course 10

Infinitely many lines pass through (xi , yi) , but they all satisfy
the equation yi a xi b for varying values of a and b.
However, writing this equation as:
b x i a yi

And considering the ab-plane (also called parameter space)


yields the equation of a single line for a fixed pair (xi , yi) .
Furthermore, a second point (xj , yj) also has a line in the
parameter space associated with it, and, unless they are

Digital Image Processing


Course 10

parallel, this line intersects the line associated with (xi , yi) at
some point (a', b'). In fact, all the points on this line have
lines in parameter space that intersect at (a', b').

Digital Image Processing


Course 10

In principle, the parameter-space lines corresponding to all


points (xk , yk ) in the xy-plane could be plotted, and the
principal lines in that plane could be found by identifying
points in parameter space where large numbers of parameter
space line intersect. A practical difficulty with this approach,
however, is that a tends to infinity as the lines approaches the
vertical direction. To solve this inconvenient we use the
normal representation of a line:

x cos y sin

Digital Image Processing


Course 10

Digital Image Processing


Course 10

The computational attractiveness of the Hough transform


arises from subdividing the parameter space into so called
accumulator cells, as Figure 10.32(c) illustrates, where

min , max

and min , max are the expected ranges of the

parameter values: -90 90 and -D D , where D is


the maximum distance between opposite corners in an image.
The cell at coordinates (i , j ) with accumulator value A(i, j),
corresponds to the square associated with parameter-space
coordinates (i , j ). In initially, these cells are set to zero.

Digital Image Processing


Course 10

Then, for every non-background point (xk , yk ) in the


xy-plane, we let equal each of the allowed subdivision

values on the -axis and solve for the corresponding using


the equation xk cos yk sin . The resulting values are
then rounded off to the nearest allowed cell value along the
axis. If a choice of p results in solution q, then we let
A(p,q)=A(p,q)+1.

At the end of this procedure, a value of P in A(i,j) means that


P points in the xy-plane lie on the line x cos j y sin j i .

Digital Image Processing


Course 10

The number of subdivisions in the -plane determines the


accuracy of the colinearity of these points. It can be shown
that the number of computations in the above described
method is linear with respect to n, the number of
non-background points in the xy-plane.
An approach based on the Hough transform for edge-linking
is as follows:
1. Obtain a binary edge image.
2. Specify subdivisions in the -plane.

Digital Image Processing


Course 10

3. Examine the counts of the accumulator cells for high


pixel concentration
4. Examine the relationship (principally for continuity)
between pixels in a chosen cell.
Continuity in this case usually is based on computing the
distance between disconnected pixels corresponding to a
given accumulator cell. A gap in a line associated with a
given cell is bridged if the length of the gap is less than a
specified threshold.

Digital Image Processing


Course 11

Digital Image Processing


Course 11

Image Segmentation Thresholding


We discuss techniques for partitioning images directly
into regions based on intensity values and/or properties
of these values.
Suppose that the intensity histogram in Figure 10.35(a)
corresponds to an image, f(x,y), composed of light
objects and a dark background, in such a way that object
and background pixels have intensity values grouped into
two dominant modes. One way to extract the objects

Digital Image Processing


Course 11

from the background is to select a threshold T that


separates these modes.

Any point (x,y) in the image for which f(x,y) > T is called
an object point; otherwise, the point is called a

Digital Image Processing


Course 11

background point. The segmented image, g(x,y), is given


by:
1
g( x , y )
0

if f ( x , y ) T
if f ( x , y ) T

where T is a constant applicable over an entire image, the


process given in this equation is referred to as global
thresholding. When the value of T changes over an
image, we use the term variable thresholding. The term
local or regional thresholding is used sometimes to

Digital Image Processing


Course 11

denote variable thresholding in which the value of T at


any point (x, y) in an image depends on properties of a
neighborhood of (x, y) (for example, the average
intensity of the pixels in the neighborhood). If T depends
on the spatial coordinates (x, y) themselves, then variable
thresholding is often referred to as dynamic or adaptive
thresholding.

Digital Image Processing


Course 11

If in an image we have, for example, two types of light


objects on a dark background, multiple thresholding is
used. The segmented image is given by:
a

g( x , y ) b
c

if f ( x , y ) T2
if T1 f ( x , y ) T2
if f ( x , y ) T1

where a, b, and c are any three distinct intensity values.


Segmentation problems requiring more than two

Digital Image Processing


Course 11

thresholds are difficult to solve, and better results usually


are obtained using other methods.
The success of intensity thresholding is directly related
to the width and depth of the valley(s) separating the
histogram modes. The key factors affecting the properties
of the valley(s) are:
(1) the separation between peaks (the further apart the
peaks are, the better the chances of separating the
modes)

Digital Image Processing


Course 11

(2) the noise content in the image (the modes broaden


as noise increases)
(3) the relative size of objects and background
(4) the uniformity of the illumination source
(5) the uniformity of the reflectance properties of the
image

Digital Image Processing


Course 11

The role of noise in image thresholding


Consider Figure 10.36(a) the image is free of noise and
its histogram has two spike modes. Figure 10.36(b)
shows the original image corrupted by Gaussian noise of
zero mean and a standard deviation of 10 intensity levels.
Although the corresponding histogram modes are now
broader (Figure 10.36(e)), their separation is large
enough so that the depth of the valley between them is
sufficient to make the modes easy to separate.

Digital Image Processing


Course 11

Digital Image Processing


Course 11

Figure 10.36(c) shows the result of corrupting the image


with Gaussian noise of zero mean and a standard
deviation of 50 intensity levels. As the histogram in
Figure 10.36(f) shows, the situation is much more
difficult as there is now way to differentiate between two
modes.
The role of illumination and reflectance
Figure 10.37 illustrates the effect that illumination can
have on the histogram of an image. Figure 10.37(a) is the

Digital Image Processing


Course 11

noisy image form Figure 10.36(b) and Figure 10.37(d)


shows its histogram.

Digital Image Processing


Course 11

We can illustrate the effects of nonuniform illumination


by multiplying the image in Figure 10.37(a) by a variable
intensity function, such the intensity ramp in Figure
10.37(b), whose histogram is shown in Figure 10.37(e).
Figure 10.37(c) shows the product of the image and this
shading pattern. Figure 10.37(f) shows, the deep valley
between peaks was corrupted to the point where
separation of the modes without additional processing is
no longer possible.

Digital Image Processing


Course 11

Illumination and reflectance play a central role in the


success of image segmentation using thresholding or
other segmentation techniques. Therefore, controlling
these factors when it is possible to do so should be the
first step considered in the solution of segmentation
problem. There are three basic approaches to the problem
when control over these factors is not possible. One is to
correct the shading pattern directly. For example,
nonuniform (but fixed) illumination can be corrected by

Digital Image Processing


Course 11

multiplying the image by the inverse pattern, which can


be obtained by imaging a flat surface of constant
intensity. The second approach is to attempt to correct
the global shading pattern via processing it. The third
approach is to work around nonuniformities using
variable thresholding.

Digital Image Processing


Course 11

Basic Global Thresholding


When the intensity distributions of objects and
background pixels are sufficiently distinct, it is possible
to use a single (global) threshold applicable over the
entire image. An algorithm capable of estimating
automatically the threshold value for each image is
required. The following iterative algorithm can be used
for this purpose:

Digital Image Processing


Course 11

1. Select an initial estimate for the global threshold, T.


2. Segment the image using T. This will produce two
groups of pixels: G1 consisting of all pixels with
intensity values > T, and G2 consisting of pixels
with values T.
3. Compute the average (mean) intensity values m1 and
m2 for the pixels in G1 and G2.
4. Compute a new threshold value:
1
T ( m1 m2 )
2

Digital Image Processing


Course 11

5. Repeat Steps 2 through 4 until the difference


between values of T in successive iteration is
smaller than a predefined parameter T
This simple algorithm works well in situations where
there is a reasonably clear valley between the modes of
the histogram related to objects and background.
Parameter T is used to control the number of iterations
in situations when speed is an important issue. The initial
threshold must be chosen greater than the minimum and

Digital Image Processing


Course 11

less than maximum intensity level in the image. The


average intensity of the image is a good initial choice for
T.

Digital Image Processing


Course 11

Optimum Global Thresholding Using Otsus Method


Thresholding may be viewed as a statistical-decision
theory problem whose objective is to minimize the
average error that appears in assigning pixels to two or
more groups (also called classes). The solution (Bayes
decision rule) is based on only two parameters: the
probability density function (PDF) of the intensity levels
of each class and the probability that each class occurs in
a given application. Estimating PDFs is not a trivial task,

Digital Image Processing


Course 11

so the problem usually is simplified by making workable


assumption about the form of the PDFs, such as
assuming that they are Gaussian function.
Otsus method offers an alternative solution. The
method is optimum in the sense that it maximizes the
between-class variance. The basic idea is that the
well-thresholded classes should be distinct with respect
to the intensity values of their pixels and, conversely, that
a threshold giving the best separation between classes in

Digital Image Processing


Course 11

terms of their intensity values would be the best


(optimum) threshold. Otsus method has the important
property that it is based entirely on computations
performed on the histogram of an image.
Let {0, 1, 2, , L-1} denote the L distinct intensity
levels in a digital image of size MN pixels, and let ni
denote the number of pixels with intensity i.
MN ( total number of pixels) n0 n1 n2 nL1

Digital Image Processing


Course 11

The normalized histogram has components


ni
pi
MN

L 1

with

p
i 0

1 , pi 0.

Suppose that we select a threshold T(k)=k , 0 < k < L-1


and use it to threshold the image into two classes C1 and C2
where C1 consists of all pixels in the image with intensity
values in the range [0,k] and C2 consists of all pixels in the
image with intensity values in the range [k+1,L-1] . Using

Digital Image Processing


Course 11

this threshold the probability P1(k) that a pixel is assigned to


class C1 is given by the cumulative sum:
k

P1 ( k ) pi
i 0

This is the probability of class C1 occurring. Similarly, the


probability of class C2 occurring is:
P2 ( k )

L 1

i k 1

pi .

The mean intensity values of the pixels assigned to class C1 is

Digital Image Processing


Course 11
k

P (i )
1 k
m1 ( k ) iP ( i / C1 ) iP (C1 / i )
ipi

P (C1 ) P1 ( k ) i 0
i 0
i 0
P(i/C1) is the probability of value i , given that i comes from
class C1. We have used the Bayes formula:

P ( A)
.
P ( A / B ) P ( B / A)
P( B)
P(C1/i)=1 the probability of C1 given i (i belongs to C1).
Similarly, the mean intensity value of the pixels assigned to
class C2 is:

Digital Image Processing


Course 11
L 1

1 L 1
m2 ( k ) i P ( i / C 2 )
i pi .

P2 ( k ) i k 1
i k 1
The cumulative mean (average intensity) up to level k is
given by:
k

m ( k ) ipi
i 0

and the average intensity of the entire image (the global


mean) is given by:
L 1

mG ipi
i 0

Digital Image Processing


Course 11

We have:

P1 m1 P2 m2 mG ,

P1 P2 1 .

In order to evaluate the goodness of the threshold at level k


we use the normalized, dimensionless metric:

B2
2
G
where G2 is the global variance:
L 1

G2 ( i mG )2 pi
i 0

Digital Image Processing


Course 11

and B2 is the between-class variance, defined as:

B2 P1 ( m1 mG )2 P2 ( m2 mG )2 P1 P2 ( m1 m2 )2
mG P1 m

P1 (1 P1 )

From the above formula, we see that the farther the two
means m1 and m2 are from each other the larger B2 will be,
indicating that the between-class variance is a measure of
separability between classes. Because G2 is a constant, it

Digital Image Processing


Course 11

follows that also is a measure of separability, and


maximizing this metric is equivalent to maximizing B2 .
The objective then is to determine the threshold value k that
maximizes the between-class variance.

We have:

B2 ( k )
(k )
G2

Digital Image Processing


Course 11

mG P1 ( k ) m ( k )
(k )
P1 ( k ) 1 P1 ( k )
2
B

The optimum threshold is the value k* that maximizes B2 ( k )

B2 ( k ) max{ B2 ( k ) ; 0 k L 1 , k integer }.
If the maximum exists for more than one value of k, it is
customary to average the various values of k for which B2 ( k )
is maximum.
Once k* has been obtained, the input image is segmented as:

Digital Image Processing


Course 11

1
g( x , y )
0

if f ( x , y ) k
if f ( x , y ) k

The metric ( k ) can be used to obtain quantitative


estimate of the separability of classes.
0 (k ) 1

The lower bound is attainable only by images with a


single, constant intensity level; the upper bound is

Digital Image Processing


Course 11

attainable only by 2-valued images with intensities equal


to 0 and L-1.
Otsus algorithm may be summarized as follows:
1. Compute the normalized histogram of the input
image, pi , i=0,1,2,,L-1
2. Compute the cumulative sums, P1(k),k=0,1,2,,L-1
3. Compute the cumulative means, m(k),k=0,1,,L-1
4. Compute the global intensity mean, mG
5. Compute the between-class variance,

Digital Image Processing


Course 11

B2 ( k ) , k=0, 1,, L-1


6. Obtain the Otsu threshold, k*, as the value of k for which

B2 ( k ) is maximum. If the maximum is not unique,

obtain k* by averaging the values of k corresponding to


the various maxima detected
7. Obtain the separability measure, ( k )

Digital Image Processing


Course 11

Digital Image Processing


Course 11

Noise can turn a simple thresholding problem into an


unsolvable one. When noise cannot be reduced at the source,
and thresholding is the segmentation method used, a
technique that often enhances performances is to smooth the
image before thresholding it.

Digital Image Processing


Course 11

Digital Image Processing


Course 11

Multiple Thresholds
The idea of the thresholding method used by Otsus method
can be extended to an arbitrary number of thresholds, because
the separability measure on which it is based also extends to
an arbitrary number of classes. In the case of K classes, C1,

C2,, CK , the between-classes variance generalizes to the


expression:
K

B2 Pk ( mk mG )2
k 1

Digital Image Processing


Course 11

Pk

iC k

1
mk
Pk

i p

iC k

mG is the global mean of the image. The K classes are

separated by K-1 thresholds whose values k1 , k2 ,..., k K 1 are


the values that maximizes B2

B2 ( k1 , k2 ,..., k K 1 )

max

{ B2 ( k1 , k2 ,..., k K 1 )}

0 k1 k2 k K 1 L 1
k1 , k2 ,k K 1 integers

Digital Image Processing


Course 11

In practice, using multiple global thresholding is considered a


viable approach when there is reason to believe that the
problem can be solved effectively with two thresholds.
Applications that require more than two thresholds generally
are solved using more than just intensity values.
For three classes consisting of three intensity intervals
(which are separated by two thresholds) the between-class
variance is given by:

B2 P1 ( m1 mG )2 P2 ( m2 mG )2 P3 ( m3 mG )2

Digital Image Processing


Course 11
k1

P1 pi
i 0

1 k1
m1
i pi

P1 i 0

P2

k2

i k1 1

1
, m2
P2

pi

k2

i k1 1

i pi

P1 m1 P2 m2 P3 m3 mG ,

P3

L 1

i k2 1

pi

1
m3
P3

L 1

i k2 1

i pi

P1 P2 P3 1.

The two optimum threshold values k1 and k2 are the values


that maximizes B2 ( k1 , k2 ).

B2 ( k1 , k2 )

max

0 k1 k2 L 1

B2 ( k1 , k2 ) .

Digital Image Processing


Course 11

The thresholded image is given by:

g( x , y ) b

if f ( x , y ) k1
if k1 f ( x , y ) k2
if f ( x , y ) k2

where a, b, and c are any three distinct valid intensity


values. The separability measure extended to multiple
thresholds is given by:
2

(
k
,
k

B
1
2)
( k1 , k2 )
.
2
G

Digital Image Processing


Course 11

Digital Image Processing


Course 11

Variable Thresholding
Image partitioning
One of the simplest approaches to variable thresholding is to
subdivide an image into nonoverlapping rectangles. This
approach is used to compensate for non-uniformities in
illumination and/or reflectance. The rectangles are chosen
small enough so that the illumination of each is
approximately uniform.

Digital Image Processing


Course 11

Digital Image Processing


Course 11

Image subdivision generally works well when the objects of


interest and the background occupy regions of reasonably
comparable size. When this is not the case, the methods fails
because of the likelihood of subdivisions containing only
object are background pixels.

Variable thresholding based on local image properties


A more general approach than the image subdivision method
is to compute a threshold at every point (x,y) in the image

Digital Image Processing


Course 11

based on one or more specified properties computed in a


neighborhood of (x, y).
We illustrate the basic approach to local thresholding by
using the standard deviation and mean of the pixels in a
neighborhood of every point in an image. Let xy and mxy
denote the standard deviation and mean value of a set of
pixels contained in a neighborhood Sxy centered at
coordinates (x, y) in an image.
The following are common forms of variable, local thresholds

Digital Image Processing


Course 11

Txy a xy bm xy , a , b 0
Txy a xy bmG , mG - global image mean .
The segmented image is computed as:

1
g( x , y )
0

if f ( x , y ) Txy
if f ( x , y ) Txy

Significant improvement can be obtained in local


thresholding by using predicates based on the parameters
computed in the neighborhood of (x, y):

Digital Image Processing


Course 11

1
g( x , y )
0

if Q (local parameters) is true


if Q (local parameters) is false

Where Q is a predicate based on parameters computed


using the pixels in neighborhood Sxy.
true if f ( x , y ) a xy AND f ( x , y ) bm xy
Q ( xy , m xy )
false otherwise

Digital Image Processing


Course 11

Using moving averages


A special case of the local thresholding method just
discussed is based on computing a moving average along
scan lines of an image. This implementation is useful in
document processing, where speed is a fundamental
requirement. The scanning is typically carried out line by
line in a zigzag pattern to reduce illumination bias. Let
zk+1 denote the intensity of the point encountered in the

Digital Image Processing


Course 11

scanning sequence at step k+1. The moving average


(mean intensity) at this new point is given by:
1 k 1
m ( k 1)
zi

n i k 2 n
1
m ( k ) ( zk 1 zk n )
n

z1
, m (1)
n

where n denotes the number of points used in computing


the average. The algorithm is initialized only once, not at
every row. Segmentation is implemented using the

Digital Image Processing


Course 11

variable threshold Txy bm xy where b is a constant and


mxy is the moving average computed as above.

Digital Image Processing


Course 11

Multivariable Thresholding
In some cases, a sensor can make available more than
one variable to characterize each pixel in an image, and
thus allow multivariable thresholding. A notable example
is color imaging where red (R), green (G), and blue (B)
components are used to form a composite color image. In
this case, each pixel is characterized by three values,
and can be represented as a 3-D vector z = (z1 , z2 , z3)T
whose components are the RGB colors at a point.

Digital Image Processing


Course 11

These 3D points often are referred to as voxels, to denote


volumetric elements, as opposed to image elements.
Multivariable thresholding may be viewed as a distance
computation. Suppose that we want to extract from a
color image all regions having a specified color range,
for example, reddish hues. Let a denote the average
reddish color in which we are interested. One way to
segment a color image based on this parameter is to
compute a distance measure, D(z,a) between an arbitrary

Digital Image Processing


Course 11

color point z and the average color a. Then we segment


the input image:
1
g
0

if D( z , a ) T
otherwise

, T is a threshold

D(z,a) Euclidian distance


D( z , a ) ( z a )T ( z a )

1
2

Mahalanobis distance

D( z , a ) ( z a )T C 1 ( z a )

1
2

Digital Image Processing


Course 11

where C is the covariance matrix of the zs.


Region-Based Segmentation
Region growing
Region growing is a procedure that groups pixels or

subregions into larger regions based on predefined criteria for


growth. The basic approach is to start with a set of seed
points and form these grow regions by appending to each
seed those neighboring pixels that have predefined properties

Digital Image Processing


Course 11

similar to the seed (such as specific ranges of intensity


colors).
Selecting a set of one or more starting points often can be
based on the nature of the problem. When a priori information
is not available, the procedure is to compute at every pixel the
same set of properties that ultimately will be used to assign
pixels to regions during the growing process. If the result of
these computations shows clusters of values, the pixels whose

Digital Image Processing


Course 11

properties place them near the centroid of these clusters can


be used as seeds.
The selection of similarity criteria depends not only on the
problem under consideration, but also on the type of image
data available.
Another problem in region growing is the formulation of a
stopping rule. Region growth should stop when no more
pixels satisfy the criteria for inclusion in that region. Criteria
such as intensity values, texture, and color are local in nature

Digital Image Processing


Course 11

and do not take into account the history of region growth.


Additional criteria that increase the power of a regiongrowing algorithm utilize the concept of size, likeness
between a candidate pixel and the pixels grown so far, and the
shape of the region being grown.
Let f(x,y) denote an input image array, S(x,y) denote a seed
array containing 1s at the locations of seed points and 0s
elsewhere, Q denote a predicate to be applied at each pixel
location (x, y). Arrays f and S are assumed to be of the same

Digital Image Processing


Course 11

size.

basic

region-growing

algorithm

based

on

8-connectivity may be stated as follows:


1. Find all connected components in S(x,y) and erode each
connected component to one pixel; label all such pixels
found as 1. All other pixels in S are labeled 0.
2. Form an image fQ such that, at a pair of coordinates

(x,y), let fQ (x,y)=1 if the input image satisfies the given


predicate, Q,

fQ(x,y)=0.

at those coordinates, otherwise, let

Digital Image Processing


Course 11

3. Let g be an image formed by appending to each seed


point in S all the 1-valued points in fQ that are

8-connected to that seed point.


4. Label each connected component in g with a different
region label. This is the segmented region obtained by
region growing.

TRUE

Q
FALSE

if the absoulte difference of the intensities


between the seed and the pixel at ( x , y ) is T
otherwise

Digital Image Processing


Course 11

Digital Image Processing


Course 11

Region Splitting and Merging


The method used in this case is to subdivide an image initially
into a set of arbitrary, disjoint regions and then merge and/or
split the regions in an attempt to satisfy the condition of
segmentation.
Let R represent the entire image region and select a
predicate Q. One approach for segmenting R is to subdivide it
successively into smaller and smaller quadrant regions so
that, for any region Ri , Q(Ri)=TRUE. We start with the entire

Digital Image Processing


Course 11

region. If Q(R)=FALSE, we divide the image into quadrants.


If Q is False for any quadrant, we subdivide the quadrant into
subquadrant, and so on. This particular splitting technique has
a convenient representation in the form of so-called
quadtrees, that are trees in which each node has exactly four

descendants. The images corresponding to the nodes of a


quadtree sometimes are called quadregions or quadimages.
Note that the root of the tree corresponds to the entire image

Digital Image Processing


Course 11

and that each node corresponds to the subdivision of a node


into four descendant nodes.

If only splitting is used, the final partition normally contains


adjacent region with identical properties. Satisfying the
constraints of segmentation requires merging only adjacent
regions whose combined pixels satisfy the predicate Q. That

Digital Image Processing


Course 11

is, two adjacent regions Rj and Rk are merged only if

Q(RjRk)=TRUE.
The procedure described above can be summarized as follows
1. Split into four quadrants any region Ri for which

Q(Ri)=TRUE
2. When no further splitting is possible, merge any
adjacent regions Rj and Rk for which Q(RjRk)=TRUE
3. Stop when no further merging is possible.

Digital Image Processing


Course 11

It is customary to specify a minimum quadregions size


beyond which no further splitting is carried out.
Numerous variations of the preceding basic theme are
possible. For example, a significant simplification results if in
Step 2 we allow merging for any two adjacent regions if each
one satisfies the predicate individually. This results in a much
simpler (and faster) algorithm, because testing the predicate is
limited to individual quadregions.

Digital Image Processing


Course 11

Digital Image Processing


Course 11

TRUE
Q
FALSE

if a AND 0 m b
otherwise

Where m and are the mean and the standard deviation of


the pixels in a quadregion, and a and b are constants.

Digital Image Processing


Course 12

Digital Image Processing


Course 12

Representation and Description


After segmentation, the resulting sets of pixels are
represented in a form suitable for further processing.
(1) Represent an image using the boundary (external
characteristics)
(2) Represent an image using the internal characteristics
(the pixels inside the region)
The next task is to describe the region based on the chosen
representation.

Digital Image Processing


Course 12

External representation is chosen when the primary focus is


on shape characteristics, internal representation is used when
the focus is on regional properties, such as color and texture.

Representation
-

boundary following
chain codes
polygonal approximations
signatures
skeletons

Digital Image Processing


Course 12

Boundary Following
We assume that the points in the boundary of a region are
ordered in a clockwise (or counterclockwise) direction. We
also assume that:
1. we are working with binary images in which objects are
labeled 1 and background 0;
2. the images are padded with a border of 0s to eliminate
the possibility of an object merging with the image
border.

Digital Image Processing


Course 12

Given a binary region R or its boundary, an algorithm of


following the border of R consists of:
1. The starting point is b0 the uppermost, leftmost point in
the image that is labeled 1. Let c0 be the west neighbor of
b0, which is always a background point. Examine the 8neighbors of b0, starting at c0 and proceeding in a
clockwise direction. Let b1 denote the first neighbor
whose value is 1, and let c1 be the background point
immediately preceding b1. Store the location of b0 and b1.

Digital Image Processing


Course 12

2. Let b= b1 and c= c1.


3. Let n1, n2,, n8 be the 8-neighbors of b starting at c in a
clockwise direction. Find the first nk labeled 1.
4. Let b= nk and c= nk-1.
5. Repeat Steps 3 and 4 until b=b0 and the next boundary
point found is b1. The sequence of b points found when
the algorithm stops constitutes the set of ordered boundary
points.

Digital Image Processing


Course 12

This algorithm is referred to as the Moore boundary tracking


algorithm.

Digital Image Processing


Course 12

Chain Codes
Chain codes are used to represent a boundary by a connected
sequence of straight line segments of specified length and
direction.
The direction of each segment is coded by using a numbering
scheme such as the ones shown below. A boundary code
formed a sequence of such directional numbers is referred to
as a Freeman chain code.

Digital Image Processing


Course 12

This method generally is unacceptable to apply for the chain


codes to pixels:
(a) The resulting chain of codes usually is quite long;
(b) Sensitive to noise: any small disturbances along the
boundary owing to noise or imperfect segmentation
cause changes in the code that may not necessarily be
related to the shape of the boundary.
A frequently used method to solve the problem is to resample
the boundary by selecting a larger grid spacing. A boundary
point is assigned to each node of the large grid, depending on

Digital Image Processing


Course 12

the proximity of the original boundary to that node. The


accuracy of the resulting code representation depends on the
spacing of the sampling grid.

The chain code of a boundary depends on the starting point.


The problem is solved by normalization.

Digital Image Processing


Course 12

Normalization for starting point:


Treat the code as a circular sequence and redefine the starting
point s that the resulting sequence of numbers forms an
integer of minimum magnitude.
Normalization for rotation:
Use the first difference of the chain code instead of the code
itself. The difference is simply by counting counterclockwise
the number of directions that separate two adjacent elements
of the code.
Example: The first difference of the 4-direction chain code
10103322 is
33133030.

Digital Image Processing


Course 12

Polygonal Approximations
The

objective is to capture the essence of the boundary shape


with the fewest possible polygonal segments.
This problem in general is not trivial and can quickly turn into
a time-consuming iterative search.
Minimum-Perimeter Polygons
The approach for generating a MPP is to enclose the
boundary by a set of concatenated cells. The boundary can be

Digital Image Processing


Course 12

viewed as a rubber band constrained by the inner and outer


walls of the region defined by the cells.

The size of the cells determines the accuracy of the polygonal


approximation. The objective is to use the largest possible

Digital Image Processing


Course 12

cell size acceptable in a given application, thus producing


MPPs with the fewest number of vertices.

Digital Image Processing


Course 12

The boundary in the above figure consists of 4-connected


straight line segments. Suppose we traverse this boundary in a
counterclockwise direction. Every turn encountered in the
traversal will be either a convex or a concave vertex. with the
angle of a vertex an interior of the 4-connectcd boundary.
Convex and concave vertices are shown respectively as
whitte and black dots in the above figure. Note that these
vertices are the vertices of the inner wall of the light-gray
bounding region in Fig. 11.7(b ), and that every concave

Digital Image Processing


Course 12

(black) vertex in the dark gray region has a corresponding


"mirror" vertex in the light gray wall, located diagonally
opposite the vertex. Figure 11.7(c) shows the mirrors of all
the concave vertices, with the MPP from Fig. 11.6(c)
superimposed for reference. We see that the vertices of the
MPP coincide either with convex vertices in the inner wall
(white dots) or with the mirrors of the concave vertices (black
dots) in the outer wall.

Digital Image Processing


Course 12

MPP algorithm
The set of cells enclosing a digital boundary is called a
cellular complex. We assume that the boundaries under
consideration are not self intersecting, which leads to simply
connected cellular complexes. Based on these assumptions,
and letting white (W) and black (B) denote convex and
mirrored concave vertices, respectively, we state the
following observations:

Digital Image Processing


Course 12

1. The MPP bounded by a simply connected cellular


complex is not selfintersecting.
2. Every convex vertex of the MPP is a W vertex, but not
every W vertex of a bounda ry is a vertex of the MPP.
3. Every mirrored concave vertex of the MPP is a B
vertex, but not every B vertex of a boundary is a vertex
of the MPP.
4. All B vertices are on or outside the MPP, and all W
vertices are on or inside the MPP.

Digital Image Processing


Course 12

5. The uppermost, leftmost vertex in a sequence of vertices


contained in a cellular complex is always a W vertex of
the MPP.
Let a=(x1,y1), b=(x2,y2), c=(x3,y3) and
x1
A x2
x3

y1 1
y2 1
y3 1

Digital Image Processing


Course 12

det A = 0
< 0

if (a , b, c )is a counterclockwise sequence


if the points are colinear
if (a , b, c )is a clockwise seq

Denote sgn(a , b, c ) det( A) . Geometrically sgn(a,b,c) < 0


indicates that the point c lies on the positive side of pair (a,b),
i.e., c lies on the positive side of the line passing through
points a and b.
Suppose we have a list with the coordinates of each vertex
and the additional information whether the vertex is W or B.

Digital Image Processing


Course 12

It is important that the concave vertices be mirrored, that the


vertices be in sequential order, and that the first vertex be the
uppermost leftmost vertex, which we know is a W vertex of
the MPP. Let V0 denote this vertex. We assume that the
vertices are arranged in the counterclockwise direction. The
algorithm for finding MPPs uses two "crawler" points: a
white crawler (W0) and a black (B0) crawler. W0 crawls along
convex (W) vertices, and B0 crawls along mirrored concave
(B) vertices.

Digital Image Processing


Course 12

The algorithm starts by setting W0 = B0 = V0 . Then, at any


step in the algorithm, let VL denote the last MPP vertex
found, and let Vk denote the current vertex being examined.
One of three conditions can exist between VL, Vk and the two
crawler points:
1. Vk lies to the positive side of the line through the pair of
points (VL, W0); that is sgn(VL, W0 , Vk ) > 0.
2. Vk lies to the negative side of the line through the pair
(VL, W0); that is sgn(VL, W0 , Vk ) 0. At the same time Vk

Digital Image Processing


Course 12

lies to the positive side of the line through (VL, B0) or is


collinear with them; that is sgn(VL, B0 , Vk ) 0.
3. Vk lies to the negative side of the line through (VL, B0);
that is sgn(VL, B0 , Vk )< 0.
If condition 1. holds the next MPP vertex is W0, VL=W0 and
we set W0= B0= VL and continue with the next vertex after VL.
If condition 2. holds, Vk becomes a candidate MPP vertex.
We set W0= Vk if Vk is convex (i.e. labeled W) otherwise
B0= Vk and continue with the next vertex after in the list.

Digital Image Processing


Course 12

If condition 3. holds the next MPP vertex is B0, VL=B0 and


we set W0= B0= VL and continue with the next vertex after VL.
The algorithm terminates when it reaches the first vertex
again. The VL vertices found by the algorithm are the vertices
of the MPP.

Merging technique
The idea is to merge points along a boundary until the least
square error line fit of the points merged so far exceeds a

Digital Image Processing


Course 12

preset threshold. When this condition occurs, the parameters


of the line are stored, the error is set to 0, and the procedure is
repeated, merging new points along the boundary until the
error again exceds the threshold. One of the main problem
with this technique is that vertices do not corespond with
corners in the boundary.

Digital Image Processing


Course 12

Splitting technique
One approach to boundary segment splitting is to subdivide a
segment successively into two parts until a specified criterion
is satisfied. For instance, a requirement might be that the
maximum perpendicular distance from a boundary segment to
the line joining its two end points not exceed a preset
threshold. If it does, the point having the greatest distance
from the line becomes a vertex, thus subdividing the initial
segment into two subsegments.

Digital Image Processing


Course 12

This approach has the advantage of seeking prominent


inflection points. For a closed boundary, the best starting
points usually are the two farthest points in the boundary.

Digital Image Processing


Course 12

Signatures
A signature is a 1-D functional representation of a boundary
and may be generated in various ways. One of the simplest is
to plot the distance from the centroid of the region to the
boundary as a function of angle.
The basic idea is to reduce the boundary representation to a
1-D function that presumably is easier to describe than the
original 2-D boundary. Signatures generated by the approach
just described are invariant to translation, but they do depend

Digital Image Processing


Course 12

on rotation and scaling. Normalization with respect to rotation


can be achieved by finding a way to select the same starting
point to generate the signature, regardless of the shape's
orientation. One way to do so is to select the starting point as
the point farthest from the centroid, assuming that this point is
unique for each shape of interest.

Digital Image Processing


Course 12

Digital Image Processing


Course 12

Skeletons
The approach is representing the structural shape of a plane
image using graph theory. We first obtain the skeleton of the
image via a thinning (skeletonizing) algorithm.
The skeleton of a region may be defined via the medial axis
transformation (MAT) proposed by Blum. Let R be a region
with border B. The MAT of a region is computed as follows:
for each point p in R, we find its closest neighbor in B. If p
has more than one such neighbor then it belongs to the medial

Digital Image Processing


Course 12

axis (skeleton) of R. The concept of "closest" (and the


resulting MAT) depend on the definition of a distance
The MAT of a region has an intuitive definition based on the
so-called "prairie fire concept." Consider an image region as a
prairie of uniform, dry grass, and suppose that a fire is lit
along its border. All fire fronts will advance into the region at
the same speed. The MAT of the region is the set of points
reached by more than one fire front at the same time.

Digital Image Processing


Course 12

Direct implementation of this definition is expensive


computationally.

Implementation

potentially

involves

calculating the distance from every interior point to every


point on the boundary of a region. Thinning algorithms for
MAT computation, iteratively delete boundary points of a

Digital Image Processing


Course 12

region subject to the constraints that deletion of these points


(1) does not remove end points, (2) does not break
connectivity, and (3) does not cause excessive erosion of the
region.
In the following we present an algorithm for thinning binary
regions. Region points are assumed to have value 1 and
background points to have value 0. The method consists of
successive passes of two basic steps applied to the border
points of the given region. A border point is any pixel with

Digital Image Processing


Course 12

value 1 and having at least one neighbor valued 0. We


consider the 8-neighborhood pixels indexed as in the figure
below:

Digital Image Processing


Course 12

Step 1
A contour point p1 is flaged for deletion if the following
conditions are satisfied:
a)

2 N(p1) 6

b)

T(p1)=1

c)

p2 p4 p6 = 0

d)

p4 p6 p8 = 0
N(p1)= p2+p3 ++p8+p9 (pi{0,1})

Digital Image Processing


Course 12

T(p1)=the number of 0-1 transitions in the ordered sequence


p2, p3, ,p8, p9, p2
Step 2
Conditions a) and b) remain the same and we add
c) p2 p4 p8 = 0
d) p2 p6 p8 = 0
Step 1 is applied to every border pixel of the region. If one or
more of conditions a) - d) are violated, the value of the point

Digital Image Processing


Course 12

in question is not changed. If all conditions are fulfilled the


point is flagged for deletion. However, the point is not deleted
until all border points have been processed. After Step 1 has
been applied to all border points, those that were flagged are
deleted (changed to 0). Then Step 2 is applied to the resulting
data in exactly the same manner as Step 1. Thus, one iteration
of the thinning algorithm consists of (1) applying Step 1 to
flag border points for deletion; (2) deleting the flagged points;
(3) applying Step 2 to flag the remaining border points for

Digital Image Processing


Course 12

deletion; and (4) deleting the flagged points. This basic


procedure is applied iteratively until no further points are
deleted, at which time the algorithm terminates, yielding the
skeleton of the region.
Conditions c) and d) are satisfied simultaneously if:
(p4 = 0 or p6 = 0) or (p2 = 0 and p8 =0).
A point that satisfies all the conditions required for Step 1 is
an east or south boundary point or a northwest corner point in
the boundary. In either case, p1 is not part of the skeleton and

Digital Image Processing


Course 12

should be removed. Similarly, conditions c') and d') are


satisfied simultaneously if:
(p2 = 0 or p8 = 0) or (p4 = 0 and p6 =0).
These correspond to north or west points, or a southeast
corner point.

Digital Image Processing


Course 12

Boundary Descriptors
The length of a boundary is one of its simplest descriptors.
The number of pixels along a boundary gives a rough
approximation of its length.
The diameter of a boundary B is defined as:
Diam( B ) max{ D( pi , p j ); pi , p j B }

where D is a distance measure. The value of the diameter and


the orientation of a line segment connecting the two extreme
points that comprise the diameter (this line is called the major

Digital Image Processing


Course 12

axis of the boundary) are useful descriptors of a boundary.


The minor axis of a boundary is defined as the line
perpendicular to the major axis, and of such length that a box
passing through the outer four points of intersection of the
boundary with the two axes

completely encloses the

boundary. The box just described is called the basic


rectangle, and the ratio of the major axis to the minor axis is
called the eccentricity of the boundary. This also is a useful
descriptor.Curvature is defined as the rate of change of slope.

Digital Image Processing


Course 12

Shape numbers
Assume that the boundary is described by the first difference
of a the associated chain-coded. The shape number of such a
boundary, based on the 4-directional code, is defined as the
first difference of smallest magnitude. The order n of a shape
number is defined as the number of digits in its
representation. Moreover, n is even for a closed boundary,
and its value limits the number of possible different shapes.

Digital Image Processing


Course 12

Digital Image Processing


Course 12

Although the first difference of a chain code is independent of


rotation, in general the coded boundary depends on the
orientation of the grid. One way to normalize the grid
orientation is by aligning the chain-code grid with the sides of
the basic rectangle.
In practice, for a desired shape order, we find the rectangle of
order n whose eccentricity best approximates that of the basic
rectangle of the region and use this new rectangle to establish
the grid size.

Digital Image Processing


Course 12

Digital Image Processing


Course 12

Fourier descriptors
Assume we have a K-point digital boundary in the xy-plane:

( x0 , y0 ),( x1 , y1 ),...,( x K 1 , y K 1 ).

are

the

points

of

the

boundary encountered in traversing the boundary, say, in the


counterclockwise direction. In the complex plane we have:

s( k ) x ( k ) i y( k ) , k 0,1,..., K 1
We compute the discrete Fourier transform (DFT) of s(k) is
K 1

a ( u) s( k )e i 2 uk / K , u 0,1,..., K 1
k 0

Digital Image Processing


Course 12

The complex coefficients a(u) are called the Fourier


descriptors of the boundary.
The inverse Fourier transform of these coefficients restores
the s(k):

1
s( k )
K

K 1

i 2 uk / K
(
)
, k 0,1,..., K 1
a
u
e

u 0

Suppose, however, that instead of all the Fourier coefficients,


only the first P coefficients are used. This is equivalent to

Digital Image Processing


Course 12

setting a(u)=0 for u>P-1. The result is the following


approximation to s(k):

1 P 1
s ( k ) a ( u)e i 2 uk / P , k 0,1,..., K 1.
P u 0
Although P terms are used to obtain each component of s ( k )

k still ranges from 0 to K-1. That is, the number of points


exists in the approximate boundary, but not many terms the
reconstruction of each point. The smaller P becomes, the
more detail that is lost on the example demonstrates.

Digital Image Processing


Course 12

Digital Image Processing


Course 12

Statistical moments
The shape of boundary segments (and of signature
waveforms) can be described quantitatively by using
statistical moments, such as the mean, variance, and higher
order moments.

Digital Image Processing


Course 12

We represent the segment of a boundary by a 1-D function

g(r). This function is obtained by connecting the two end


points of the segment and rotating the line segment until it is
horizontal. The coordinates of the points are rotated by the
same angle.
Let us treat the amplitude of g as a discrete random variable v
and form an amplitude histogram p(vi), i = 0, 1, 2, ... , A - 1,

Digital Image Processing


Course 12

where A is the number of discrete amplitude increments in


which we divide the amplitude scale.
The nth moment of v about its mean is:
A1

A1

i 0

i 0

n ( v ) ( v i m ) n p ( v i ) , m v i p( v i ) .
The quantity m is recognized as the mean or average value of

v and 2 as its variance. Generally, only the first few


moments are required to differentiate between signatures of
clearly distinct shapes.

Digital Image Processing


Course 12

Regional descriptors
The area of a region is defined as the number of pixels in the
region. The perimeter of a region is the length of its
boundary. These two descriptors apply primarily to situations
in which the size of the regions of interest is invariant. A
more frequent use of these two descriptors is in measuring
compactness of a region:

(perimeter)2 P 2
compactness =

A
area

Digital Image Processing


Course 12

Another descriptor of compactness is the circularity ratio:b


area of the region
circularity ratio
area of the circle having the same perimeter

The area of a circle with perimeter length P is P2/4 .

4 A
Rc 2 .
P
The value of this measure is 1 for a circular region and /4 for
a square. Compactness is a dimensionless measure and thus is
insensitive to uniform scale changes; it is insensitive also to

Digital Image Processing


Course 12

orientation, ignoring computational errors that may be


introduced in resizing and rotating a digital region.
Other simple measures used as region descriptors include the
mean and median of the intensity levels, the minimum and
maximum intensity values, and the number of pixels with
values above and below the mean.

Digital Image Processing


Course 12

Topological Descriptors
Topology is the study of properties of a figure that are
unaffected by any deformation, as long as there is no tearing
or joining of the figure (sometimes these are called
rubber-sheet distortions).

Digital Image Processing


Course 12

For example, the above figure shows a region with two holes.
Thus if a topological descriptor is defined by the number of
holes (H) in the region, this property obviously will not be
affected by a stretching or rotation transformation. In general,
however, the number of holes will change if the region is torn
or folded. Note that, as stretching affects distance, topological
properties do not depend on the notion of distance or any
properties implicitly based on the concept of a distance
measure.

Digital Image Processing


Course 12

Another topological property useful for region description is


the number of connected components (C).
The number of holes H and connected components C in a
figure can be used to define the Euler number E:

E = C H.
Regions represented by straight-line segments (referred to as
polygonal networks) have a particularly simple interpretation
in terms of the Euler number.

Digital Image Processing


Course 12

Figure 11.26 shows a polygonal network. Classifying interior


regions of such a network into faces and holes is often
important. Denoting the number of vertices by V, the number

Digital Image Processing


Course 12

of edges by Q, and the number of faces by F gives the


following relationship, called the Euler formula:

V-Q+F = C-H = E.

Digital Image Processing


Course 12

Suppose we want to segment the river from image in Fig.


11.27 (a). The image in Fig. 11.27 (b) has 1591 connected
components (obtained using 8-connectivity) and its Euler
number is 1552, from which we deduce that the number of
holes is 39. Figure 11.27(c) shows the connected component
with the largest number of elements (8479). This is the
desired result, which we already know cannot be segmented
by itself from the image using a threshold.

Digital Image Processing


Course 12

Texture
An important approach to region description is to quantify its
texture content. Although no formal definition of texture
exists, this descriptor provides measures of properties such as
smoothness, coarseness and regularity. The three principal
approaches for describing the texture of a region are
statistical, structural, and spectral. Statistical approaches yield
characterizations of textures as smooth, coarse, grainy,
Structural techniques deal with the arrangement of image

Digital Image Processing


Course 12

primitives, such as the description of texture based on


regularly spaced parallel lines. Spectral techniques are based
on properties of the Fourier spectrum and are used primarily
to detect global periodicity in an image by identifying
high-energy, narrow peaks in the spectrum.

Statistical approaches
One of the simplest for describing texture is to use statistical
moments of the intensity histogram of an image or region. Let

z be a random variable denoting intensity and let p(zi),

Digital Image Processing


Course 12

i= 0, 1, 2, ... , L-1 be the corresponding histogram, where L is


the number of distinct intensity levels. The nth moment of z
about the mean is where m is the mean value of z is:
L 1

n ( z ) ( z i m ) n p( z i )
i 0

L 1

, m z i p( z i ) .
i 0

Note that 0 1 and 1 0 . The second moment, the


variance ( 2 ( z ) 2 ( z ) ) is of particular importance in texture
description. It is a measure of intensity contrast that can be

Digital Image Processing


Course 12

used to establish descriptors of relative smoothness. For


example, the measure:

R( z ) 1

1
1 2 (z)

is 0 for areas of constant intensity and appraoches 1 for large


values of 2 ( z ) .
The third moment 3 ( z ) is a measure of the skewness of the
histogram while the forth moment is a measure of its relative

Digital Image Processing


Course 12

flatness. Some other useful textures measure are uniformity


and the average entropy:
L 1

U ( z ) p 2 ( zi )
i 0

L 1

e( z ) p( zi )log 2 p( zi )
i 0

Digital Image Processing


Course 12

Digital Image Processing


Course 12

Structural aproach
Structural techniques deal with the arrangement of image
primitives. They use a set of predefined texture primitives and
a set of construction rules to define how a texture region is
constructed with the primitives and the rules.

Digital Image Processing


Course 12

Spectral approaches
Spectral techniques use the Fourier transform of the image
and its properties in order to detect global periodicity in an
image, by identifying highenergy, narrow peaks in the
spectrum.
The Fourier spectrum is ideally suited for describing the
directionality of periodic or almost periodic 2-D patterns in an
image.

Digital Image Processing


Course 12

Three features of the spectrum are suited for texture


description:
(1) prominent peaks give the principal direction of the
patterns;
(2) the location of the peaks gives the fundamental spatial
period of the patterns;
(3) eliminating any periodic components via filtering
leaves nonperiodic image elements, which can be
described by statistical techniques.

Digital Image Processing


Course 12

We express the spectrum in polar coordinates to yield a


function S(r, ). For each direction , S(r, ) may be
considered a 1-D function S(r). Similarly, for each frequency
r, Sr() is a 1-D function. Analyzing S(r) for a fixed value of

yields the behavior of the spectrum (such as the presence


of peaks) along a radial direction from the origin, whereas
analyzing Sr() for a fixed value of r yields the behavior
along a circle centered on the origin.

Digital Image Processing


Course 12

A more global description is obtained by using the following


functions:

S ( r ) S ( r )
0

R0

, S ( ) Sr ( )
r 1

where R0 is the radius of a circle centered at the origin.

S(r) and S(), that constitute a spectral-energy description of


texture for an entire image or region under consideration.
Furthermore, descriptors of these functions themselves can be
computed

in

order

to

characterize

their

behavior

Digital Image Processing


Course 12

quantitatively. Descriptors typically used for this purpose are


the location of the highest value, the mean and variance of
both the amplitude and axial variations, and the distance
between the mean and the highest value of the function.

Digital Image Processing


Course 13

Digital Image Processing


Course 13

Recognition of Image Patterns


Once an image is segmented, the next task is to recognize the
segmented objects or regions in the scene. Hence, the
objective in pattern recognition is to recognize objects in the
scene from a set of measurements of the objects.
Each object is a pattern and the measured values are the
features of the pattern. A set of similar objects possessing
more or less identical features are said to belong to a certain
pattern class.

Digital Image Processing


Course 13

Pattern recognition is an integral part of machine vision and


image processing and finds its applications in biometric and
biomedical image diagnostics to document classification,
remote sensing, and many other fields .
There are many types of features and each feature has a
specific technique for measurement.
As an example, each letter in the English alphabet is
composed of a set of features like horizontal, vertical, slant
straight lines, as well as some curvilinear line segments.

Digital Image Processing


Course 13

While the letter A is described by two slant lines and one


horizontal line, letter B has a vertical line with two
curvilinear segments, joined in a specific structural format.
Some of the features of a two- or three-dimensional object
pattern are the area, volume, perimeter, surface, etc. which
can be measured by counting pixels. Similarly the shape of an
object may be characterized by its border. Some of the
attributes to characterize the shape of an object pattern are

Digital Image Processing


Course 13

Fourier descriptors, invariant moments, medial axis of the


object, and so on.
The color of an object is an extremely important feature,
which can be described in various color spaces. Also various
types of textural attributes characterize the surface of an
object. The techniques to measure the features are known as
feature extraction techniques. Patterns may be described by a
set of features, all of which may not have enough
discriminatory power to discriminate one class of patterns

Digital Image Processing


Course 13

from another. The selection and extraction of appropriate


features from patterns is the first major problem in pattern
recognition.

Decision Theoretic Pattern Classification


The classification of an unknown pattern is decided based on
some deterministic or statistical or even fuzzy set theoretic
principles. The block diagram of a decision theoretic pattern
classifier is shown in the below figure:

Digital Image Processing


Course 13

Test
Pattern

Feature
Extraction

Classifier

Sample
Pattern

Feature
Extraction

Learning

Classified
Output

Block diagram of a decision theoretic pattern classifier

The decision theoretic pattern recognition techniques are


mainly of two types:

Digital Image Processing


Course 13

1. Classification methods based on supervised learning,


2. Classification methods using unsupervised techniques.
The supervised classification algorithms can further be
classified as
Parametric classifiers
Nonparametric classifiers
In parametric supervised classification, the classifier is
trained with a large set of labeled training pattern samples in
order to estimate the statistical parameters of each class of

Digital Image Processing


Course 13

patterns such as mean, variance, etc. By the term labeled


pattern samples, we mean the set of patterns whose class
memberships are known in advance. The input feature vectors
obtained during the training phase of the supervised
classification are assumed to be Gaussian in nature.
The minimum distance classifier and the maximum likelihood
classifier are some of the frequently used supervised
algorithms.

Digital Image Processing


Course 13

On the other hand, the parameters are not taken into


consideration in the nonparametric supervised classification
techniques. Some of the nonparametric techniques are
k-nearest neighbor, Parzen window technique, etc.
In unsupervised case, the machine partitions the entire data
set based on some similarity criteria. This results in a set of
clusters, where each cluster of patterns belong to a specific
class.

Digital Image Processing


Course 13

Bayesian Decision Theory


Assume that there are N classes of patterns C1, C2, . . . , CN,
and an unknown pattern x in a d-dimensional feature space
x = [x1, x2,, xd]. Hence the pattern is characterized by d
number of features. The problem of pattern classification is to
compute the probability of belongingness of the pattern x to
each class Ci, i = 1 , 2 , . . . , N . The pattern is classified to
the class Ck if probability of its belongingness to Ck is
maximum.

Digital Image Processing


Course 13

While classifying a pattern based on Bayesian decision


theory, we distinguish two kinds of probabilities: (1) Apriori
probability, and (2) Aposteriori probaility. The apriori
probability indicates the probability that the pattern should
belong to a class, say Ck, based on the prior belief or evidence
or knowledge. This probability is chosen even before making
any measurements, i.e., even before selection or extraction of
a feature. Sometimes this probability may be modeled using
Gaussian distribution, if the previous evidence suggests it. In

Digital Image Processing


Course 13

cases where there exists no prior knowledge about the class


membership of the pattern, usually a uniform distribution is
used to model it. For example, in a four class problem, we
may choose the apriori probability as 0.25, assuming that the
pattern is equally likely to belong to any of the four classes.
The aposteriori probability P(Ci|x), on the other hand,
indicates the final probability of belongingness of the pattern
x to a class Ci . The aposteriori probability is computed based
on the

Digital Image Processing


Course 13

feature vector of the pattern,


class conditional probability density functions p(x|Ci) for
each class Ci,
apriori probability P(Ci) of each class Ci.
Bayesian decision theory states that the aposteriori probability
of a pattern belonging to a pattern class Ck is given by:

P (C k | x )

p( x | C k ) P ( C k )
N

p( x | C ) P ( C )
i 1

Digital Image Processing


Course 13

p( x | C i )

1
( x i )T i 1 ( x i )
2

(2 )2 det i

where i is the mean feature vector of the patterns in class Ci


and i is the covariance matrix for class Ci. If the chosen
features are statistically independent covariance matrix is a
diagonal matrix which simplifies computations.
The pattern x belongs to class Cp when:

P (C p | x ) max{ P (C1 | x ), P (C 2 | x ), ..., P (C N | x )} .

Digital Image Processing


Course 13

Minimum Distance Classification


Distance functions are used to measure the similarity or
dissimilarity between two classes of patterns. The smaller the
distance between two classes of patterns, the larger is the
similarity between them. The minimum distance classification
algorithm is computationally simple and commonly used.
The classifier finds the distances from a test input data
vectors to all the mean vectors representative of the target
classes. The unknown pattern is assigned to that class from

Digital Image Processing


Course 13

which its distance is smaller than its distances to all other


classes.
Let us consider an N class problem. If the class Ci, contains a
single prototype pattern i (the mean vector) and the
unknown pattern is x = [x1, x2,, xd], then pattern belongs to
the class Ck if:

Dk min{d ( x , i ); i 1, 2,..., d }
where d is a distance.

Digital Image Processing


Course 13

Minkowski Distance

d p ( y, z )

y
i 1

zi

p=1 city block or Manhattan distance


p=2 Euclidean distance
Mahalanobis Distance
If the parameters of the distribution of a specific pattern class
are assumed to be Gaussian with mean feature vector and
the covariance matrix , then the Mahalanobis distance

Digital Image Processing


Course 13

between the test pattern with the feature vector x and that
pattern class C is given by

d ( x , C ) ( x )T 1 ( x ) .
Bounded Distance
In many pattern classification problems, it may be useful to
work with a bounded distance function, which lies in the
range [0,1]. Any given distance function D(x,y) may be
transformed into a bounded distance function d(x,y) , where:

Digital Image Processing


Course 13

D( x , y )
.
d ( x, y)
D( x , y ) 1

Nonparametric Classification
The nonparametric classification strategies are not dependent
on the estimation of parameters.

k-Neareast-Neighbor Classification
In many situations we may not have the complete statistical
knowledge about the underlying joint distribution of the

Digital Image Processing


Course 13

observation or feature vector x and the true class C, to which


the pattern belongs. For an unknown test sample, k-nearest
rule suggests that it should be assigned to the class to which
the majority of its k-nearest neighbors belong.
There are, however, certain problems in classifying an
unknown pattern using nearest neighbor rule. If there are N
number of sample patterns, then to ascertain the nearest
neighbor, we need to compute N distances from the test
pattern to each of the sample points. Also it is important to

Digital Image Processing


Course 13

store all these N sample points. This leads to increase of the


computational as well as storage complexity of the k-nearest
neighbor problem. As the number of features increases, we
require more number of training data samples and hence it
increases the storage and computational complexities as well.
To reduce these complexities various researchers have taken
different measures:

Remove the redundant data from the data set, which


will reduce the storage complexity.

Digital Image Processing


Course 13

The training samples need to be sorted to achieve


better data structure for reducing the computational
complexities.

The distance measure to be used for computation


should be simple.

Digital Image Processing


Course 13

Linear Discriminant Analysis


An image can be described by a set of local features, these
features can be extracted at each pixel of the image. Let f k(p)
denotes the k-th feature at pixel p. If each pixel in an image is
associated with d number of features, we have a matrix

F = { f1 , . . . , fd }
of dimension n d, where n is the total number of pixels in
the image. It may be noted here that this matrix contains lot of
local information of the entire image, much of which is

Digital Image Processing


Course 13

redundant. The discriminant analysis is employed to find


which variables discriminate between two classes and is
essentially analogous to the analysis of variance. In
discriminant analysis, we assume that the discriminant
function is linear, i.e.,

g ( x ) w T x x0 0
is a hyperplane, which partitions the feature space in two
subspaces. In Fisher's linear discriminat approach, the

Digital Image Processing


Course 13

d-dimensional patterns x are projected onto a line, such that


the projection of data

y wT x
are well separated. The measure of this separation can be
chosen as
2
(
m
m
)

J ( w T ) 12 22
S1 S2

Digital Image Processing


Course 13

where ml and m2 are the projection means for classes C1 and

C2 and S 12 and S 22 are the within class variances of the


projected data.

S i2

2
(
y

m
)

y C i

gives a measure of scatter of the projected set of data points y.


The objective function J ( w T ) is maximized for the weight w
such that:

1 m
2 ) , W 1 2 .
w W1 ( m

Digital Image Processing


Course 13

The Fisher linear discriminant function is widely used for


identifying the linear separating vector between pattern
classes. The procedure uses the maximization of between
class scatter while minimizing the intra-class variances.

Unsupervised Classification Strategied Clustering


In a clustering problem, we have a set of patterns, that have to
be partitioned in a set of clusters such that the patterns within
a cluster are more similar to each other than the patterns from

Digital Image Processing


Course 13

other clusters or partitions. Thus central to the goals of cluster


analysis lies the notion of similarity. There are a couple of
methods of clustering. We can divide these methods into the
following three classes:
1. Hierarchical methods

2 . K-means methods
3. Graph theoretic methods
In hierarchical algorithms, the data set is partitioned in a
number of clusters in a hierarchical fashion. The hierarchical

Digital Image Processing


Course 13

clustering methods may again be subdivided into the


following two categories.
1.

Agglomerative clustering: In agglomerative clustering,

we start with a set of singleton clusters, which are merged


in each step, depending on some similarity criterion, and
finally we get the appropriate set of clusters.
2.

Divisive clustering: In divisive clustering, as the name

suggests, the whole set of patterns initially is assumed to

Digital Image Processing


Course 13

belong to a single cluster, which subsequently is divided


in several partitions in each step.
The

hierarchical

clustering

may

be

represented

by

dendograms, a tree structure which demonstrates the merging


(fusion) or division of points in each step of hierarchical
partitioning. Agglomerative clustering is the bottom up
clustering procedure where each singleton pattern (leaf nodes
at the bottom of the dendogram) merges with other patterns,
according to some similarity criterion. In divisive algorithm,

Digital Image Processing


Course 13

on the other hand, starting with the root node S, we


recursively partition the set of patterns until singleton patterns
are reached at the bottom of the tree.

Single Linkage Clustering


The single linkage or nearest neighbor agglomerative
clustering technique involves grouping of patterns based on a
measure of intercluster (distance between two clusters).

Digital Image Processing


Course 13

Assuming two clusters P1 and P2, each containing finite


number of patterns, in single linkage method, the distance
between P1 and P2 is given by:

Dmin ( P1 , P2 ) min{d ( pi1 , p 2j ); pi1 P1 , p 2j P2 }

Digital Image Processing


Course 13

Complet Linkage Clustering


In complete linkage clustering, distance between two clusters
is defined as the distance between the most distant pair of
patterns, each pattern belonging to one cluster. This method
may thus be called the farthest-neighbor method.
In complete linkage method, the distance between P1 and P2 is
given by:

Dmax ( P1 , P2 ) max{d ( pi1 , p 2j ); pi1 P1 , p 2j P2 }

Digital Image Processing


Course 13

Average Linkage Clustering


In average linkage clustering, distance between two clusters is
the average of all distances between all pairs of patterns.
In this method, the distance between P1 and P2 is given by:

Davg ( P1 , P2 ) average{d ( pi1 , p 2j ); pi1 P1 , p 2j P2 }


If there are ni patterns in cluster Pi , i=1,2 then

1
Davg ( P1 , P2 )
2

1
2
d
(
p
,
p
i j)
i, j

n1n2

Digital Image Processing


Course 13

K-Means Clustering Algorithm


In K-means clustering approach, we partition the set of input
patterns S into a set of K partitions, where K is known in
advance. The method is based on the identification of the
centroids of each of the K clusters. Thus, instead of
computing the pairwise interpattern distances between all the
patterns in all the clusters, here the distances may be
computed only from the centroids. The method thus

Digital Image Processing


Course 13

essentially reduces to searching for a best set of K centroids


of the clusters as follows:

Step 1: Select K initial cluster centers C1, C2,. . . , CK.


Step 2: Assign each pattern X S to a cluster Ci (1 i K) ,
whose centroid is nearest to pattern X.

Step 3: Recompute the centroids in each cluster Cj (1 j K)


in which there has been any addition or deletion of pattern
points.

Step 4: Jump to Step 2, until convergence is achieved.

Digital Image Processing


Course 13

The major problem is the selection of initial cluster


configurations. It is possible either to select the first K
samples as the initial cluster centers or to randomly select K
samples from the pool of patterns as the cluster centers. A
rough partition in K clusters may, however, yield a better set
of initial cluster centers.

Digital Image Processing


Course 13

Syntactic Pattern Classification


It may be noted that there exists an inherent structure inside a
pattern and there is a positive interrelationship among the
primitive elements which form a pattern. The interrelationship
between pattern elements called primitives and the articulated
description of a pattern in terms of such relations provide a
basis of structural or linguistic approach to pattern
recognition.

Digital Image Processing


Course 13

In syntactic pattern recognition each pattern is characterized


by a string of primitives and the classification of a pattern in
this approach is based on analysis of the string with respect to
the grammar defining that pattern class.
The syntactic approach to pattern recognition involves a set of
processes:

Digital Image Processing


Course 13

1.

Selection and extraction of a set of primitives

(segmentation problem);
2.

Analysis of pattern description by identification of

the interrelationship among the primitives;


3.

Recognition of the allowable structures defining

the interrelationship between the pattern primitives.

Digital Image Processing


Course 13

Primitive Selection Strategies


Segmentation of patterns poses the first major problem in
syntactic pattern recognition. A pattern may be described by a
string of subpatterns or primitives, which may easily be
identified. If each subpattern is complex in structure, each of
them may again be described by simpler subpatterns which
are easily identifiable.
Various approaches to primitive selection have been
suggested in the literature. One of the most frequently used

Digital Image Processing


Course 13

schemes of boundary descriptions is the chain code method


by Freeman. Under this approach, a rectangular grid is
overlaid on a two-dimensional pattern and straight line
segments are used to connect the adjacent grid points
covering the pattern.
Let us consider a sequence of n points { p1 , p2 . . . . ,pn}
which describe a closed curve. Here the point pi is a neighbor
of pi-1 and pi+1 when i < n, the point pn is the neighbor of pn-1
and the point p0 and also p0 is the neighbor of p1 and pn. The

Digital Image Processing


Course 13

Freeman chain code contains n vectors pi pi-1 and each of


these vectors is represented by an integer m = 0,1, . . . ,7 as
shown in the figure:

Digital Image Processing


Course 13

Each line segment is assigned an octal digit according to its


slope and the pattern is represented by a chain of octal digits.
This type of representation yields patterns composed of a
string of symbolic valued primitives.
This method may be used for coding any arbitrary twodimensional figures composed of straight line or curved
segments and has been widely used in many shape
recognition applications. The major limitation of this

Digital Image Processing


Course 13

procedure is that the patterns need adequate preprocessing


for ensuring proper representation.
Once a satisfactory solution to the primitive selection and
extraction problem is available, the next step is the
identification of structural interrelationship among the
extracted pattern primitives. A pattern may be described as
sets of strings or sentences belonging to specific pattern
classes. First order logic may be used for describing the
primitive interrelationship where a pattern is described by

Digital Image Processing


Course 13

certain predicates and objects occurring in the pattern may be


defined using the same predicates. When the patterns are
represented as strings of primitives they may be considered as
sentences of a regular, context-free, or context-sensitive
languages. Thus suitable grammars may be defined for
generating pattern languages by specifying a set of production
rules which generate the sentences in the said pattern
language. The corresponding computing machines known as

Digital Image Processing


Course 13

automata have the capability of recognizing whether a string


of primitives belongs to a specific pattern class.

High-Dimensional Pattern Grammars


The string representation of patterns is quite adequate for
structurally simpler forms of patterns. The classical string
grammars are, however, weak in handling noisy and
structurally complex pattern classes. This is because the only
relationship

supported

by

string

grammars

is

the

concatenation relationship between the pattern primitives.

Digital Image Processing


Course 13

Here each primitive element is attached with only two other


primitive elements-one to its right and the other to its left.
Such a simple structure thus may not be sufficient to
characterize more complex patterns, which may require better
connectivity relationship for their description. An appropriate
extension of string grammars has been suggested in the form
of high-dimensional grammars. These grammars are more
powerful as generators of language and are capable of

Digital Image Processing


Course 13

generating complex patterns like chromosome patterns,


nuclear bubble chamber photographs, and so on.
In a string grammar each primitive symbol is attached with
only two other primitive elements, one to the right and the
other to the left of the element. A class of grammars was
suggested by Fedder, where a set of primitive elements may
be used with multiple connectivity structure. These grammars
are known as PLEX grammars. PLEX grammar involving
primitive structures called n-attaching point entity (NAPE)

Digital Image Processing


Course 13

and a set of identifiers associated with each NAPE has been


used for pattern generation. The n-attaching point entities are
primitive elements in which there are n number of specified
points on the primitive elements where other attaching
elements may be connected. Thus this class of grammars have
more generating capabilities compared to the string
grammars.

Digital Image Processing


Course 13

Syntactic Inference
A key problem in syntactic Pattern Recognition is inferring
an appropriate grammar using a set of samples belonging to
different pattern classes.
In syntactic pattern recognition, the problem of grammatical
inference is one of central importance. This approach is based
on the underlying assumption of the existence of at least one
grammar characterizing each pattern class. The identification
and extraction of the grammar characterizing each pattern

Digital Image Processing


Course 13

class forms the core problem in the design of a syntactic


pattern classifier. The problem of grammatical inference
involves development of algorithms to derive grammars using
a set of sample patterns which are representatives of a pattern
class under study. This may thus be viewed as a learning
procedure using a finitely large and growing set of training
patterns. In syntactic pattern classification, the strings
belonging to a particular pattern class may be considered to
form sentences belonging to the language corresponding to

Digital Image Processing


Course 13

the pattern class. A machine is said to recognize a pattern


class if for every string belonging to that pattern class, the
machine decides that it is a member of the language and for
any string not in the pattern class, it either rejects or loops
forever. A number of interesting techniques have been
suggested for the automated construction of automaton which
accepts the strings belonging to a particular pattern class.

Digital Image Processing


Course 13

Symbolic Projection Method


Here we will present a scene interpretation scheme based on a
work by Jungert. The structure is called symbolic projections.
The basic idea is to project the positions of all objects in a
scene or image along each coordinate axis and then generate a
string corresponding to each one of the axes. Each string
contains all the objects in their relative positions, that is, one
object is either equal to or less than any of the others.

Digital Image Processing


Course 13

Digital Image Processing


Course 13

Figure 1 shows how simple objects can be projected along the


X- and Y-coordinate axis. The two operators used are equal
to and less then. The strings are called U- and the V-strings,
where the U-string corresponds to the projections of the
objects along the X-axis, and the V-string to the Y-axis. The
symbolic projections are best suited for describing relative
positions of objects, which is important in spatial reasoning in
images of our discussion.

Digital Image Processing


Course 13

One may use several spatial relational operators, such as


equal, less then, greater than, etc., as follows:

Equal (=): Two objects A and B are said to be equal in


spatial dimension, i.e., A = B if and only if centroid of A
is same as the centroid of B.

Less than (<): Two objects A and B separated by a


distance may be spatially related by A < B if and only if

max(Ax) < min(Bx), where max(Ax) (or min(Bx))


indicates the maximum (or minimum) values of the

Digital Image Processing


Course 13

projection of all the pixels in object A (or object B) along


the X-direction. Similar relationships can be defined along
the Y-axis.

Greater than (>): Two objects A and B separated by a


distance may be spatially related by A > B if and only if

min(Ax) > max(Bx).


Top and Bottom: Two objects A and B separated by a
distance may be spatially related by A top of B if and only
if min(Ay) > max(By).

Digital Image Processing


Course 13

In Figure 1 the object A is to the left of object B, A < B, and


object A is on top of object B.

Neural Networks
The approaches discussed untill now are based on the use of
sample patterns to estimate statistical parameters of each
pattern class (mean vector of each class,covariance matrix).
The patterns (of known class membership) used to estimate
these parameters usually are called training patterns, and a

Digital Image Processing


Course 13

set of such patterns from each class is called a training set.


The process by which a training set is used to obtain decision
functions is called learning or training.
The training patterns of each class are used to compute the
parameters of the decision function corresponding to that
class. After the parameters in question have been estimated,
the structure of the classifier is fixed, and its eventual
performance will depend on how well the actual pattern
populations satisfy the underlying statistical assumptions

Digital Image Processing


Course 13

made in the derivation of the classification method being


used.
The statistical properties of the pattern classes in a problem
often are unknown or cannot be estimated. In practice, such
decision-theoretic problems are best handled by methods that
yield the required decision functions directly via training.
Then,

making

probability

assumptions

density

functions

regarding
or

the

other

underlying
probabilistic

Digital Image Processing


Course 13

information about the pattern classes under consideration is


unnecessary.
Background
The idea of neural networks is the use of a multitude of
elemental nonlinear computing elements (called neurons)
organized as networks reminiscent of the way in which
neurons are belihed to be interconnected in the brain. The
resulting models are referred to as neural networks.

Digital Image Processing


Course 13

We use these networks as vehicles for adaptively developing


the coefficients

of decision function via successive

presentations of training sets of patterns.

Perceptron for two pattern classes


In its most basic form, the perceptron learns a linear decision
function that dichotomizes two linearly separable training
sets. The perceptron model for two pattern classes. The

Digital Image Processing


Course 13

response of this basic device is based on a weighted sum of


its inputs; that is,
n

d ( x ) wi xi wn 1
i 1

which is a linear decision function with respect to the


components of the pattern vectors. The coefficients wi called
weights, modify the inputs before they are summed and fed
into the threshold element. In this sense, weights are
analogous to synapses in the human neural system. The

Digital Image Processing


Course 13

function that maps the output of the summing junction into


the final output of the device sometimes is called the
activation function.
When d(x) > 0, the threshold element causes the output of the
perceptron to be + 1, indicating that the pattern x was
recognized as belonging to class C1, the reverse is true when

d(x) < 0. When d(x) = 0, x lies on the decision surface


separating the two pattern classes, giving an indeterminate
condition.

Digital Image Processing


Course 13

O
1

if

w x
i 1

wn1

wn1

if

w x
i 1

n1

d ( y ) w i yi y T w ,
i 1

y ( y1 , y2 ,..., yn ,1)T augmented pattern vector


w ( w1 , w2 ,..., wn , wn1 )T weight vector

Digital Image Processing


Course 13

Training algorithms
Linearly separable classes: A simple, iterative algorithm for
obtaining a solution weight vector for two linearly separable
training sets follows. For two training sets of augmented
pattern vectors belonging to pattern classes C1 and C2,
respectively, let w(l) represent the initial weight vector, which
may be chosen arbitrarily. Then, at the kth iterative step:

Digital Image Processing


Course 13

w ( k ) cy( k ) if y( k ) C1 and w T ( k ) y( k ) 0

w ( k 1) w ( k ) cy( k ) if y( k ) C 2 and w T ( k ) y( k ) 0
w(k )
otherwise

where c is a positive correction increment.


This algorithm makes a change in w only if the pattern being
considered at the kth step in the training sequence is
misclassified. The correction increment c is assumed to be
positive and, for now, to be constant. This algorithm

Digital Image Processing


Course 13

sometimes is referred to as the fixed increment correction


rule.
Nonseparable classes: In practice, linearly separable pattern
classes are the (rare) exception, rather than the rule.
We describe in the following the original delta rule, known
alsoe as the Widrow-Hoff, or least-mean-square (LMS) delta
rule for training perceptrons, the method minimizes the error
between the actual and desired response at any training step.
Consider the function

Digital Image Processing


Course 13

1
J ( w ) ( r w T y )2
2
Where r is the desired response (r=+1 if y belongs to C1 and

r=-1 if y belongs to C2). The task is to find w which


minimizes J(w). We have the following iterative method:

w ( k 1) w ( k ) r ( k ) w T ( k ) y( k ) y( k ), w (1) arbitrary.

Digital Image Processing


Course 13

Multilayer Perceptron
The most popular neural network model is the multilayer
perceptron (MLP), which is an extension of the single layer
perceptron proposed by Rosenblatt. Multilayer perceptrons, in
general, are feedforward network, having distinct input,
output, and hidden layers. The architecture of multilayered
perceptron with error backpropagation network is shown in
the figure below.

Digital Image Processing


Course 13

Digital Image Processing


Course 13

In an M-class problem where the patterns are N-dimensional,


the input layer consists of N neurons and the output layer
consists of M neurons. There can be one or more middle or
hidden layer(s). We will consider here a single hidden layer
case, which is extendable to any number of hidden layers. Let
the hidden layer consists of p neurons. The output from each
neuron in the input layer is fed to all the neurons in the hidden
layer. No computations are performed at the input layer
neurons. The hidden layer neurons sum up the inputs, passes

Digital Image Processing


Course 13

them through the sigmoid non-linearity and fan-out multiple


connections to the output layer neurons.
In feed forward activation, neurons of the first hidden layer
compute their activation and output values and pass these on
to the next layer as inputs to the neurons in the output layer,
which produce the networks actual response to the input
presented to neurons at the input layer. Once the activation
proceeds forward from the input to the output neurons, the
network's response is compared to the desired output

Digital Image Processing


Course 13

corresponding to each set of labeled pattern samples


belonging to each specific class, there is a desired output. The
actual response of the neurons at the output layer will deviate
from the desired output which may result in an error at the
output layer. The error at the output layer is used to compute
the error at the hidden layer immediately preceding the output
layer and the process continues.
In view of the above, the net input to the j-th hidden neuron
may be expressed as

Digital Image Processing


Course 13
N

I wijh xi jh .
h
j

i 1

The output of the j-th hidden layer neuron is:

1
Oj f (I )
1 exp( I hj )
h
j

h
j

where x1,. . . , xN is the input pattern vector, weights wijh


represents the weight between the hidden layer and the input
layer, and jh is the bias term associated with each neuron in
the hidden layer. Identical equations with change of

Digital Image Processing


Course 13

subscripts hold good for the output layer. These calculations


are known as forward pass. In the output layer, the desired or
target output is set as Tk and the actual output obtained from
the network is Ok. The error (Tk - Ok) between the desired
signal and the actual output signal is propagated backward
during the backward pass. The equations governing the
backward pass are used to correct the weights. Thus the
network learns the desired mapping function by back
propagating

the

error

and

hence

the

name

error

Digital Image Processing


Course 13

backpropagation. The generalized delta rule originates from


minimizing the sum of squared error between the actual
network output and desired output responses (Tk ) over all
the patterns. The average error E is a function of weight as
shown:
1 M
E ( w jk ) (Tk Ok )2
2 k 1

w (jknew ) w (jkold ) j O j
where is the learning rate of the hidden layer neurons.

Digital Image Processing


Course 13

j O j (1 O j )(T j O j )
where Tj is the ideal response.

Вам также может понравиться