Document

Robust Document Image Binarization Techniques
CHAPTER 1
INTRODUCTION
1.1.
Overview
The Image segmentation is an essential task in the fields of image processing and
computer vision. It is a process of partitioning the digital images and is used to locate the
boundaries into a finite number of meaning full regions and easier to analyze. The
Simplest method for image segmentation is thresholding. Thresholding is an important
technique in image segmentation, enhancement and object detection. The output of the
thresholding process is a binary image whose gray level value 0 (black) will indicate a
pixel belonging to a print, legend, drawing, or target and a gray level value 1 (white) will
indicate the background. The main complexity coupled with thresholding in documents
applications happen when the associated noise process is non-stationary.
The factors that make difficult thresholding action are ambient illumination,
variance of gray levels within the object and the background, insufficient contrast, object
shape and size non-commensurate with the spectacle. The lack of objective measures to
assess the performance of thresholding algorithms is another handicap. Many methods
have been reported in the literature. It can extract the object from the background by
grouping the intensity values according to the thresholding value.
Thresholding divides the image into patches, and each patch is thresholding by a
threshold value that depends on the patch contents. In order to decrease the effects of
noise, common practice is to first smooth a boundary prior to partitioning. The
Binarization technique is aimed to be used as a primary phase in various manuscript
analysis, processing and retrieval tasks. So, the unique manuscript characteristics, like
textual properties, graphics, line drawings and complex mixtures of the layout-semantics
should be included in the requirements.
On the other hand, the technique should be simple while taking all the document
analysis demands into consideration. The threshold evaluation techniques are adapted to
Department of E.C.E, MRITS

textual and non-textual area properties, with the special tolerance and detection to
different basic defect types that are usually introduced to images.
The outcome of these techniques represents a threshold value proposed for each
pixel. These values are used to collect the final outcome of the Binarization by a
threshold control module. The approach is to examine the manuscript image surface in
order to decide about the Binarization method requirement. The Binarization algorithms
are to produce an optimal threshold value for each pixel. Therefore we can verify about
the algorithm, is best selected for obtaining the optimum thresholding value.
Image Binarization is performed in the .preprocessing stage for document analysis
and it aims to segment the foreground text from the document background. A fast and
accurate document image binarization technique is important for the ensuing document
image processing tasks such as optical character recognition (OCR).
(a)
(b)
(c)
(d)
(e)
Figure 1.1: Five degraded document image examples (a)(d) are taken from DIBCO
series datasets and (e) is taken from Bickley diary dataset.
Though document image binarization has been studied for many years, the
thresholding of degraded document images is still an unsolved problem due to the high
inter/intra-variation between the text stroke and the document background across
different document images.
As illustrated in Fig. 1.1, the handwritten text within the degraded documents
often shows a certain amount of variation in terms of the stroke width, stroke brightness,
stroke connection, and document background. In addition, historical documents are often
degraded by the bleed through as illustrated in Fig. 1.1(a) and (c) where the ink of the
other side seeps through to the front. In addition, historical documents are often degraded
by different types of imaging artifacts as illustrated in Fig. 1.1(e). These different types of
document degradations tend to induce the document thresholding error and make
degraded document image binarization a big challenge to most state-of-the-art
techniques.
The recent Document Image Binarization Contest (DIBCO) held under the
framework of the International Conference on Document Analysis and Recognition

(ICDAR) 2009 & 2011 and the Handwritten Document Image Binarization Contest (HDIBCO) held under the framework of the International Conference on Frontiers in
Handwritten Recognition show recent efforts on this issue.
We participated in the DIBCO 2009 and our background estimation method
performs the best among entries of 43 algorithms submitted from 35 international
research groups. We also participated in the H-DIBCO 2010 and our local maximumminimum method was one of the top two winners among 17 submitted algorithms.
In the latest DIBCO 2011, our proposed method achieved second best results
among 18 submitted algorithms. This paper presents a document binarization technique
that extends our previous local maximum-minimum method and the method used in the
latest DIBCO 2011. The proposed method is simple, robust and capable of handling
different types of degraded document images with minimum parameter tuning. It makes
use of the adaptive image contrast that combines the local image contrast and the local
image gradient adaptively and therefore is tolerant to the text and background variation
caused by different types of document degradations.
1.2 Related Work

Many thresholding techniques have been reported for document image
binarization. is usually not a suitable approach for the degraded document binarization.
Adaptive thresholding, which estimates a local threshold for each document image pixel,
is often a better approach to deal with different variations within degraded document
images.
For example, the early window-based adaptive thresholding techniques estimate
the local threshold by using the mean and the standard variation of image pixels within a
local neighborhood window.
The main drawback of these window-based thresholding techniques is that the
thresholding performance depends heavily on the window size and hence the character
stroke width. Other approaches have also been reported, including background
subtraction, texture analysis, recursive method, decomposition method, contour
completion, Markov Random Field, matched wavelet, cross section sequence graph
analysis, self-learning, Laplacian energy user assistance and combination of binarization

techniques. These methods combine different types of image information and domain
knowledge and are often complex.
The local image contrast and the local image gradient are very useful features for
segmenting the text from the document background because the document text usually
has certain image contrast to the neighboring document background. They are very
effective and have been used in many document image binarization techniques. In
Bernsens paper, the local contrast is defined as follows:
C ( i , j )=I max ( i, j )I min (i, j)
(1.1)
Where C ( i , j ) denotes the contrast of an image pixel (i, j),
I max (i , j ) and
I min ( i, j )
denote the maximum and minimum intensities within a local neighborhood windows of
(i, j), respectively. If the local contrast C (i, j) is smaller than a threshold, the pixel is set
as background directly. Otherwise it will be classified into text or background by
comparing with the mean of
I max (i , j)
I min (i , j)
and
Bernsens method is simple,
but cannot work properly on degraded document images with a complex document
background. We have earlier proposed a novel document image binarization method by
using the local image contrast that is evaluated as follows
C ( i , j )=
I max ( i, j ) I min (i , j)
I max ( i , j )+ I min (i , j ) +
(1.2)
Where is a positive but infinitely small number that is added in case the local
maximum is equal to 0. Compared with Bernsens contrast in Equation 1.1, the local
image contrast in Equation 1.2 introduces a normalization factor (the denominator) to
compensate the image variation within the document background. Take the text within
shaded document areas such as that in the sample document image in Figure 1.1(b) as an
example. The small image contrast around the text stroke edges in Equation 1.1 (resulting

from the shading) will be compensated by a small normalization factor (due to the dark
document background) as defined in Equation 1.2.
1.3. Tools Used

MATLAB R2013a (Version 8.1.0.604)
Image Processing Toolbox
Image Acquisition Toolbox
1.4 Organization of the Report

The rest of the report organized as follows
Chapter 2: Describes the different types of noises, thresholding techniques
Chapter 3: Describes the adaptive image contrast algorithm
Chapter 4: Describes the results and discussions
Chapter 5: Describe the conclusion & future scope
CHAPTER 2
LITERATURE SURVEY
.
2.1 IMAGE NOISE

The concept can be defined also for signals spread over more complicated the
main source of noise in digital images arises during image acquisition (digitization) or
during image transmission. The performance of image sensor is affected by variety of
reasons such as environmental condition during image acquisition or by the quality of the
sensing element themselves.
Image noise styles may be divided differently according to different criterion. The
criterions include: the causes of image noises generation, the shape of the noise
amplitude distribution over time, noise spectrum shape and the relationship between
noise and signal, and so on. For example, image noise can be divided into additive noise
and multiplicative noise according to the relationship between noise and signal. There are
many types of image noise. Such as additive noise, multiplicative noise, salt and pepper
noise, Gaussian noise.
Image noise is random (not present in the object imaged) variation of brightness
or color information in images, and is usually an aspect of electronic noise. It can be
produced by the sensor and circuitry of a scanner or digital camera. Image noise can also
originate in film grain and in the unavoidable shot noise of an ideal photon detector.
Image noise is an undesirable by-product of image capture that adds spurious and
extraneous information. The original meaning of "noise" was and remains "unwanted
signal"; unwanted electrical fluctuations in signals received by AM radios caused audible
acoustic noise ("static"). By analogy unwanted electrical fluctuations themselves came to
be known as "noise". Image noise is, of course, inaudible.
The magnitude of image noise can range from almost imperceptible specks on a
digital photograph taken in good light, to optical and radio astronomical images that are
almost entirely noise, from which a small amount of information can be derived by
sophisticated processing (a noise level that would be totally unacceptable in a photograph
since it would be impossible to determine even what the subject was)

For processing of digital image, we can add Gaussian noise, Poisson noise, salt
and pepper noise to the original image in the Mat lab platform. The Gaussian noise is
Gaussian white noise with constant mean and variance. The probably of most frequently
occurring noise is additive Gaussian noise. The PDF of a Gaussian random variable, z , is
given by
2
( zu) / 2
1
P ( z )=
e
2
(2.1)
Where z represents gray level, is the mean of average value of z , and is its
standard deviation. Salt and pepper noise refers to a wide variety of processes that result
in the same basic image degradation: only a few pixels are noisy, but they are very noisy.
The PDF of Salt and pepper noise is given by
Pa
P (z) = bo
for z = a
for z = b
0
(2.2)
otherwise
2.2 TYPES OF IMAGE NOISE

2.2.1 Additive Noise
In signal processing, white noise is a random signal with a flat power spectral
density. In other words, a signal that contains equal power within any frequency
band with is a fixed width. The term is used, with this or similar meanings, in many
scientific techniques, telecommunications, statistical forecasting, and many more.
The term is also used for a discrete signal whose samples are regarded as a
sequence of serially uncorrelated random variables with zero mean and finite variance.
Depending on the context, one may also require that the samples be independent and
have the same probability distribution. In particular, if each sample has a normal
distribution with zero mean, the signal is said to be Gaussian white noise. The samples of
a white noise signal may be sequential in time, or arranged along one or more spatial

dimensions. In digital image processing, the samples (pixels) of a white noise image are
typically arranged in a rectangular grid, and are assumed to be independent random
variables with uniform probability distribution over some domains, such as a sphere or
a torus. White noise draws its name from white light, which is commonly (but incorrectly)
assumed to have a flat spectral power density over the visible band.
An infinite-bandwidth white noise signal is a purely theoretical construction. The
bandwidth of white noise is limited in practice by the mechanism of noise generation, by
the transmission medium and by finite observation capabilities. Thus, a random signal is
considered "white noise" if it is observed to have a flat spectrum over the range of
frequencies that is relevant to the context. For an audio signal, for example, the relevant
range is the band of audible sound frequencies, between 20 to 20,000 Hz. Such a signal is
heard as a hissing sound, resembling the /sh/ sound in "ash". In music and acoustics, the
term white noise may be used for any signal that has a similar hissing sound.
White noise draws its name from white light, which is commonly (but incorrectly)
assumed to have a flat spectral power density over the visible band. The term is
sometimes used in non technical contexts, in the metaphoric sense of "random talk
without meaningful contents".
2.1 (a)
2.1 (b)
Figure 2.1 (a) Original image (b) Additive noise image

2.2.2 Multiplicative Noise
In signal processing, the term multiplicative noise refers to an unwanted random
signal that gets multiplied into some relevant signal during capture, transmission, or other
processing. Speckle noise is a granular noise that inherently exists in and degrades the

quality of the active radar and synthetic aperture radar (SAR) images. An important
example is the speckle noise commonly observed in radar imagery. Examples of
multiplicative noise affecting digital photographs are proper shadows due to undulations
on the surface of the imaged objects, shadows cast by complex objects like foliage
and Venetian blinds, dark spots caused by dust in the lens or image sensor, and variations
in the gain of individual elements of the image sensor array.
Speckle noise is a granular noise that inherently exists in and degrades the quality
of the active radar and synthetic aperture radar (SAR) images. Speckle noise in
conventional radar results from random fluctuations in the return signal from an object
that is no bigger than a single image-processing element. It increases the mean grey level
of a local area. Speckle noise in SAR is generally more serious, causing difficulties for
image interpretation. It is caused by coherent processing of backscattered signals from
multiple distributed targets. In SAR oceanography, for example, speckle noise is caused
by signals from elementary scatters, the gravity-capillary ripples, and manifests as a
pedestal image, beneath the image of the sea waves.
Several different methods are used to eliminate speckle noise, based upon
different mathematical models of the phenomenon. One method, for example,
employs multiple-look processing (multi-look processing), averaging out the speckle
noise by taking several "looks" at a target in a single radar sweep. The average is
the incoherent average of the looks.
A second method involves using adaptive and non-adaptive filters on the signal
processing (where adaptive filters adapt their weightings across the image to the speckle
level, and non-adaptive filters apply the same weightings uniformly across the entire
image). Such filtering also eliminates actual image information as well, in particular
high-frequency information, and the applicability of filtering and the choice of filter type
involves tradeoffs. Adaptive speckle filtering is better at preserving edges and detail in
high-texture areas (such as forests or urban areas). Non-adaptive filtering is simpler to
implement, and requires less computational power, however.
There are two forms of non-adaptive speckle filtering: one based on the mean and
one based upon the median (within a given rectangular area of pixels in the image). The
10

latter is better at preserving edges whilst eliminating noise spikes, than the former is.
There are many forms of adaptive speckle filtering, including the Lee filter, the Frost
filter, and the Refined Gamma Maximum-A-Posteriori (RGMAP) filter. They all rely
upon three fundamental assumptions in their mathematical models, however:
Speckle noise in SAR is a multiplicative noise, i.e. it is in direct proportion to the

local grey level in any area.
The signal and the noise are statistically independent of each other.
The sample mean and variance of a single pixel are equal to the mean and
variance of the local area that is centered on that pixel.
The Lee filter converts the multiplicative model into an additive one, thereby
reducing the problem of dealing with speckle noise to a known tractable case. The
speckle can also represent some useful information, particularly when it is linked to
the laser speckle and to the dynamic speckle phenomenon, where the changes of
the speckle pattern, in time, can be a measurement of the surface's activity.
2.2.3 Salt and Pepper Noise
Salt and pepper noise is a form of noise typically seen on images. It represents
itself as randomly occurring white and black pixels. An effective noise reduction method
for this type of noise involves the usage of a median filter, morphological filter or a contra
harmonic mean filter. Salt and pepper noise creeps into images in situations where quick
transients, such as faulty switching, take place. In this noisy image the white dot
represents the salt and black dot represent the pepper which are represented from 0 to 256
bits i.e. in binary form. The salt pixel is represented as all zero in the binary codes where
as for pepper they are represented as all ones in binary codes. Image noise is random (not
present in the object imaged) variation of brightness or color information in images, and
is usually an aspect of electronic noise. It can be produced by the sensor and circuitry of
a scanner or digital camera. Image noise can also originate in film grain and in the
unavoidable shot noise of an ideal photon detector. Image noise is an undesirable byproduct of image capture that adds spurious and extraneous information.
11
2.2 (a)
2.2(b)
Figure 2.2(a) Original image (b) Salt and pepper noise image
2.2.4 Gaussian Noise
Gaussian noise is statistical noise that has its probability density function equal to
that of the normal distribution, which is also known as the Gaussian distribution. In other
words, the values that the noise can take on are Gaussian-distributed. A special case
is white Gaussian noise, in which the values at any pair of times are identically
distributed and statistically
independent and
hence uncorrelated.
In
applications,
Gaussian noise is most commonly used as additive white noise to yield additive white
Gaussian noise.
The probability density function of a Gaussian random variable is given by:
(2.3)
Where
represents the grey level,
the mean value and the standard deviation.
12

2.3(a)
2.3 (b)
Figure 2.3(a) original image (b) Gaussian noise image.
2.3 Thresholding
We categorize the thresholding methods in to groups according to the information
they are exploiting. These categories are:
1. Histogram shape-based methods, where, for example, the peaks, valleys and
curvatures of the smoothed histogram are analyzed
2. Clustering-based methods, where the gray-level samples are clustered in two parts
as background and foreground (object), or alternately are modeled as a mixture of
two Gaussians
3. Entropy-based methods result in algorithms that use the entropy of the foreground
and background regions, the cross-entropy between the original and binarized
image, etc.
4. Object attribute-based methods search a measure of similarity between the graylevel and the binarized images, such as fuzzy shape similarity, edge coincidence,
etc.
In the sequel, we use the following notation. The histogram and the probability mass
function (PMF) of the image are indicated, respectively, by h (g) and by p (g), g50...G,
where G is the maximum luminance value in the image, typically 255 if 8-bit
quantization is assumed. If the gray value range is not explicitly indicated as [gmin,
gmax], it will be assumed to extend from 0 to G. The cumulative probability function is
defined as
g
P ( g )= P(i)
(2.4)
i=0
It is assumed that the PMF is estimated from the histogram of the image by
normalizing it to the total number of samples. In the context of document processing, the
foreground (object) becomes the set of pixels with luminance values less than T, while
the background pixels have luminance value above this threshold.
13

In NDT images, the foreground area may consists of darker more absorbent,
denser, etc. regions or conversely of shinier regions, for example, hotter, more reflective,
less dense, etc., regions.
In the latter contexts, where the object appears brighter than the background,
obviously the set of pixels with luminance greater than T will be defined as the
foreground. The foreground (object) and background PMFs are expressed as pf (g),
0<g<T, and pb(g), T+1<g<G, respectively, where T is the threshold value. The foreground
and background area probabilities are calculated as:
T
Pf ( T )=Pf = p ( g ) , Pb (T )=P b=
g =0
p ( g)
g =T +1
(2.5)
We refer to a specific thresholding method, which was programmed in the

simulation analysis and whose formula appears in the table, with the descriptor,
categoryauthor. For example, ShapeSezan and ClusterOtsu refer, respectively, to
the shape-based thresholding method introduced in a paper by Sezan and to the
clustering-based thresholding method first proposed by Otsu.
2.3.1 Histogram Shape-Based Thresholding Methods
This category of methods achieves thresholding based on the shape properties of
the histogram. The shape properties come into play in different forms. The distance from
the convex hull of the histogram is investigated, while the histogram is forced into a
smoothed two-peaked representation via autoregressive modeling. A more crude
rectangular approximation to the lobes of the histogram. Other algorithms search
explicitly for peaks and valleys, or implicitly for overlapping peaks via curvature
analysis.
Convex hull thresholding: Rosenfelds method (ShapeRosenfeld) is based on
analyzing the concavities of the histogram h(g) vis--vis its convex hull, Hull(g), that is
the set theoretic difference |Hull(g)2p(g)|. When the convex hull of the histogram is
calculated, the deepest concavity points become candidates for a threshold. In case of
competing concavities, some object attribute feedback, such as low busyness of the
threshold image edges, could be used to select one of them. Other variations on the theme
14

can be found in Weszka and Rosenfeld, and Halada and Osokov. Sahasrabudhe and
Gupta27 have addressed the histogram valley-seeking problem. More recently,
Whatmough35 has improved on this method by considering the exponential hull of the
histogram.
Peak-and-valley thresholding. Sezan (ShapeSezan) carries out the peak analysis
by convolving the histogram function with a smoothing and differencing kernel. By
adjusting the smoothing aperture of the kernel and resorting to peak merging, the
histogram is reduced to a two-lobe function. The differencing operation in the kernel
outputs the triplet of incipient, maximum, and terminating zero crossings of the histogram
lobe S=[ (ei ,mi ,si),i=1,...2]. The threshold sought should be somewhere between the
first terminating and second initiating zero crossing, that is:
Topt= e1+1(1-) s2, 0<g<1.
(2.6)
In our work, we have found that g51 yields good results. Variations on this theme
are provided in Boukharouba, Rebordao, and Wendel, 36 where the cumulative
distribution of the image is first expanded in terms of Ts Chebyshev functions, followed
by the curvature analysis.
Tsai obtains a smoothed histogram via Gaussians, and the resulting histogram is
investigated for the presence of both valleys and sharp curvature points. We point out that
the curvature analysis becomes effective when the histogram has lost its bimodality due
to the excessive overlapping of class histograms.
In a similar vein, Carlotto and Olivo (ShapeOlivio) consider the multi-scale
analysis of the PMF and interpret its fingerprints, that is, the course of its zero crossings
and extreme over the scales. Using a discrete dyadic wavelet transform, one obtains the
multi-resolution analysis of the histogram, ps(g)5p(g)*cs(g), s51,2..., where p0(g)5p(g) is
the original normalized histogram. The threshold is defined as the valley (minimum)
point following the first peak in the smoothed histogram. This threshold position is
successively refined over the scales starting from the coarsest resolution.
Thus starting with the valley point T(k) at the kth coarse level, the position is
backtracked to the corresponding extreme in the higher resolution histograms p(k21)
15

(g)...p(0)(g), that is, T(0) is estimated by refining the sequence T(1)...T(k) (in our work
k53 was used).
Shape-modeling thresholding:
Ramesh, Yoo, and Sethi (ShapeRamesh) use a simple functional approximation
to the PMF consisting of a two-step function. Thus the sum of squares between a bilevel
function and the histogram is minimized, and the solution for Topt is obtained by iterative
search. Kampke and Kober31 have generalized the shape approximation idea. In Cai and
Liu, 29 the authors have approximated the spectrum as the power spectrum of multicomplex exponential signals using Pronys spectral analysis method.
A similar all-pole model was assumed in Guo28 (ShapeGuo). We have used a
modified approach, where an autoregressive (AR) model is used to smooth the histogram.
Here one begins by interpreting the PMF and its mirror reflection around g=0, p(-g), as a
noisy power spectral density, given by p (g)=p(g) for g>0, and p(-g) for g<0. One then
obtains the autocorrelation coefficients at lags k 50...G, by the inverse discrete Fourier
transform (IDFT) of the original histogram, that is, r (k) =IDFT [p (g)].
The autocorrelation coefficients {r(k)} are then used to solve for the nth order
AR coefficients {ai}. In effect, one smoothes the histogram and forces it to a bimodal or
two-peaked representation via the nth order AR model (n=1, 2, 3, 4, 5, 6). The threshold
is established as the minimum, resting between its two pole locations, of the resulting
smoothed AR spectrum.
2.3.2 Clustering-Based Thresholding Methods
In this class of algorithms, the gray-level data undergoes a clustering analysis,
with the number of clusters being set always to two. Since the two clusters correspond to
the two lobes of a histogram (assumed distinct), some authors search for the midpoint of
the peaks. The algorithm is based on the fitting of the mixture of Gaussians. Mean-square
clustering, while fuzzy clustering ideas have been applied. Iterative thresholding.
Riddler38 (ClusterRiddler) Advanced one of the first iterative schemes based on twoclass.
Gaussian mixture models. At iteration n, a new threshold Tn is established using
the average of the foreground and background class means. In practice, iterations
16

terminate when the changes |Tn-Tn+1| become sufficiently small. Leung and Fam and
Trussel40 realized two similar methods. In his method, Yanni and Horne41 (Cluster
Yanni) Initializes the midpoint between the two assumed peaks of the histogram as
gmid=(gmax+gmin)/2, where gmax is the highest nonzero gray level and gmin is the lowest one,
so that (gmax2gmin) becomes the span of nonzero gray values in the histogram. This
midpoint is updated using the mean of the two peaks on the right and left, that is, as
gmid * = (gpeak1+gpeak2)/2.
Clustering thresholding:
Otsu46 (ClusterOtsu) Suggested minimizing the weighted sum of within-class
variances of the foreground and background pixels to establish an optimum threshold.
Recall that minimization of with in class variances is tantamount to the maximization of
between-class scatter. This method gives satisfactory results when the numbers of pixels
in each class are close to each other. The Otsu method still remains one of the most
referenced thresholding methods.
In a similar study, thresholding based on isodata clustering is given in Velasco.48
some limitations of the Otsu method are discussed in Lee and Park.49 Liu and Li50
generalized it to a 2-D Otsu algorithm.
Minimum error thresholding:
These methods assume that the image can be characterized by a mixture
distribution of foreground and background pixels: p(g) = P(T).pf (g)+[1-P(T)].pb(g).
Lloyd42 (ClusterLloyd) Considers equal variance Gaussian density functions, and
minimizes the total misclassification error via an iterative search. In contrast, Kittler and
Illingworth (ClusterKittler) removes the equal variance assumption and, in essence,
addresses a minimum error Gaussian density-fitting problem. Recently Cho, Haralick,
and Yi44 have suggested an improvement of this thresholding method by observing that
the means and variances estimated from truncated distributions result in a bias. This bias
becomes noticeable, however, only whenever the two histogram modes are not
distinguishable.
2.3.3 Entropy-Based Thresholding Methods
17

This class of algorithms exploits the entropy of the distribution of the gray levels
in a scene. The maximization of the entropy of the thresholded image is interpreted as
indicative of maximum information transfer. Other authors try to minimize the crossentropy between the input gray-level image and the output binary image as indicative of
preservation of information, or a measure of fuzzy entropy. Johannsen and Bille and Pal,
King, and Hashim were the first to study Shannon entropy-based thresholding. See Table
3 for these algorithms.
Entropic thresholding:
Kapur, Sahoo, and Wong (EntropyKapur) consider the image foreground and
background as two different signal sources, so that when the sum of the two class
entropies reaches its maximum, the image is said to be optimally thresholded. Following
this idea, Yen, Chang, and Chang54 (EntropyYen) define the entropic correlation as
{ [ ] } { [
T
TC ( T )=C b ( T )+ C f ( T )=log
g=0
p (g )
P(T )
log
g=T +1
p ( g)
1P(T )
]}
2
(2.7)
2.3.4 Thresholding Based on Attribute Similarity

These algorithms select the threshold value based on some attribute quality or
similarity measure between the original image and the binarized version of the image.
These attributes can take the form of edge matching, shape compactness, gray-level
moments, connectivity, texture, or stability of segmented objects. Some other algorithms
evaluate directly the resemblance of the original gray-level image to binary image
resemblance using fuzzy measure or resemblance of the cumulative probability
distributions, or in terms of the quantity of information revealed as a result of
segmentation.
The thresholding is established so that the first three gray-level moments match
the first three moments of the binary image. The gray-level moments mk and binary
image moments bk are defined, respectively, as:
G
mk = p ( g) g k bk =Pf mkf + Pb mkb

g=0
18
(2.8)

Cheng and Tsai reformulate this algorithm based on neural networks. Delp and
Mitchell72 have extended this idea to quantization. Edge field matching thresholding.
Hertz and Schafer (AttributeHertz) consider a multi-thresholding technique
where a thinned edge field, obtained from the gray-level image Egray, is compared with
the edge field derived from the binarized image Ebinary (T).
The global threshold is given by the value that maximizes the coincidence of the
two edge fields based on the count of matching edges, and penalizes the excess original
edges and the excess thresholded image edges. Both the gray-level image edge field and
the binary image edge field have been obtained via the Sobel operator. In a
complementary study, Venkatesh and Rosin have addressed the problem of optimal
thresholding for edge field estimation.
2.4 Binarization
Many binarization techniques that are used in processing tasks are aimed at
simplifying and unifying the image data at hand. The simplification is performed to
benefit the oncoming processing characteristics, such as computational load, algorithm
complexity and real-time requirements in industrial-like environments.
One of the key reasons when the binarization step fails to provide the subsequent
processing a high-quality data is caused by the different types and degrees of degradation
introduced to the source image.
Figure 2.4: Taxonomy of thresholding schemes

19
Figure 2.5: Examples of document analysis problem types in binarization.

The reasons for the degradation may vary from poor source type, the image
acquisition process to the environment that causes problems for the image quality
directly. Since the degradation is unquestionably one of the main reasons for processing
to fail, it is very important to design the binarization technique to detect and filter
possible imperfections from becoming the subject for processing and potential cause of
errors for post-processing steps. Most degradation types in document images affect both
physical and semantic understandability in the document analysis tasks, such as page
segmentation, classification and optical character recognition.
Therefore, the result after all the desired processing steps can be entirely
unacceptable, just because of the poorly performed binarization.
2.4.1 Survey on document image binarization techniques
The research on binarization techniques originates from the traditional &scene
image processing needs to optimize the image processing tasks in terms of image data at
hand. While the image types have become more complex the algorithms developed have
gained wider theoretical grounds. Current trend seems to move forward image domain
understanding based binarization and the control of different source image types and
qualities.
The state-of-the-art techniques are able to adapt to some degree of errors in a
defined category, and focus on few image types. In images needing multi-thresholding,
20

the problem seems to be ever harder to solve, since the complexity of image contents,
including textual documents has increased rapidly.
Figure 2.6: Example of good binarization on degraded sample image.

Some document directed binarization algorithms have been developed. O'Gorman
proposes a global approach calculated from a measure of local connectivity information.
The thresholds are found at the intensity levels aiming to preserve the connectivity of
regions. Liu et al. propose a method for document image binarization focused on noisy
and complex background problems. They use grey-scale and run-length histogram
analysis in a method called &object attribute thresholding. It identifies a set of global
thresholds using global techniques which is used for final threshold selection utilizing
local features.
Yang et al.'s thresholding algorithm uses a statistical measurement, called
&largest static state difference'. The method aims to track changes in the statistical signal
pattern, dividing the level changes to static or transient according to a grey-level
variation. The threshold value is calculated according to static and transient properties
separately at each pixel. Stroke connectivity preservation issues in textual images are
examined by Chang et al. They propose an algorithm that uses two different components:
the background noise elimination using grey-level histogram equalization and
enhancement of grey-levels of characters in the neighbor-hood using an edge image
composition technique.
The binary partitioning ' is made according to smoothed and equalized histogram
information calculated in five different steps. Pavlidis presents a technique based on the
observation that after blurring a bi-level image, the intensity of original pixels is related
with the sign of the curvature of the pixels of the blurred image. This property is used to
21

construct the threshold selection of partial histograms in locations where the curvature is
significant.
Rosenfeld and Smith presented a global thresholding algorithm to deal with noise
problem using an iterative probabilistic model when separating background and object
pixels. A relaxation process was used to reduce errors by first classifying pixels
probabilistically and adjusting their probabilities using the neighbouring pixels.
This process is finally iterated leading to threshold selection, where the
probabilities of the background and the object pixels are increased and will be ruled
accordingly to non-object and object pixels. The thresholding algorithm by Perez and
Gonzalez was designed to manage situations where imperfect illumination occurs in an
image. The bimodal reflectance distribution is utilized to present grey-scale with two
components: reflectance r and illumination i, used also in homomorphic filtering.
The algorithm is based on the model of Taylor series expansion and uses no a
priori knowledge of the image. The illumination is assumed to be relatively smooth,
whereas the reflectance component is used to track down changes. The threshold value is
chosen from the probabilistic criterion of occurring two dimensional threshold selection
functions. This can be calculated in raster-scan fashion.
The illumination problem is emphasized in the thresholding algorithm, called
edge level thresholding', presented by Parker et al. Their approach uses the principles
that objects provide high spatial frequency while illumination consists mainly of low
spatial frequencies. The algorithm first identifies objects using Shen Castan edge detector.
The grey-levels are then examined in small windows for finding highest and lowest
values that indicate object and background. The average of these values is used to
determine the threshold. The selected value is then fitted to all pixels as a surface leading
the values above to be judged as a part of an object and a value lower than threshold
belongs to background.
Shapiro et al. introduce a global thresholding scheme, where the independency is
stressed
in
the
object/
background
areas
ratio,
intensity
transition
slope,
object/background shape and noise-insensitivity. The threshold selection is done by

choosing a value that maximizes the global non-homogeneity. This is obtained as an
22

integral of weighted local deviations, where the weight function assign higher standard
weight deviation in case of background/object transitions than in homogeneous areas.
Pikaz and Averbuch propose an algorithm to perform thresholding for scenes
containing distinct objects. The sequence of graphs is constructed using the size of
connected objects in pixels as a classifier. The threshold selection is gained from
calculating stable states on the graph.
The algorithm can be adapted to select multi-level thresholds by selecting highest
stable state candidate in each level. Henstock and Chelberg propose a statistical modelbased threshold selection. The weighted sum of two gamma densities, used for decreasing
the computational load instead of normal distributions, are fitted to the sum of edge and
non-edge density functions using a five-parameter model.
The parameters are estimated using an expectation maximization-style two-step
algorithm. The fitted weighted densities separate the edge pixels from non-edge pixels of
intensity images. The enhanced speed entropic threshold selection algorithm is proposed
by Chen et al. They reduce the image grey-scale levels by quantization and produce a
global threshold candidate vector from quantized image.
The final threshold selection is estimated only from the reduced image using the
candidate vector. The reduction in computational complexity is in the order of magnitude
of O(G8@3) of the number of grey-scale values, using O notation. The quality of
binarization is sufficient for preliminary image segmentation purposes.
Yanowitz and Bruckstein proposed an image segmentation algorithm based on
adaptive binarization, where different image quality problems were taken into
consideration. Their algorithm aimed to separate objects in illuminated or degraded
conditions. The technique uses variating thresholds, whose values are judged by edge
analysis processing combined with grey-level information and construction of
interpolated threshold surface. The image is then segmented using the gained threshold
surface by identifying the objects by post-validation. The authors indicated that validation
can be performed with most of the segmentation methods.
23

Our approach
For document image binarization, we propose a new method that first performs a
rapid classification of the local contents of a page to background, pictures and text. Two
different approaches are then applied to define a threshold for each pixel: a soft decision
method (SDM) for background and pictures, and a specialized text binarization method
(TBM) for textual and line drawing areas. The SDM includes noise filtering and signal
tracking capabilities, while the TBM is used to separate text components from
background in bad conditions, caused by uneven illumination or noise. Finally, the
outcomes of these algorithms are combined.
Utilizing proper ways to benchmark the algorithm results against ground-truth and
other measures is important for guiding the algorithm selection process and directions
that future research should take. A well-de-fined performance evaluation shows which
capabilities of the algorithm still need refinement and which capabilities are sufficient for
a given situation. The result of benchmarking offers information of the suitability of the
technique to certain image domains and quality.
However, it is not easy to see the algorithm quality directly from a set of
performance values. In this paper we use a goal-directed evaluation process with
specially developed document image binarization metrics and measures for comparing
the results against a number of well-known and well-performed techniques in the
literature.
2.4.2 Overview of the binarization technique
Our binarization technique is aimed to be used as a first stage in various document
analysis, processing and retrieval tasks. Therefore, the special document characteristics,
like textual properties, graphics, line-drawings and complex mixtures of their layoutsemantics should be included in the requirements. On the other hand, the technique
should be simple while taking all the document analysis demands into consideration. Fig.
4 presents the general approach of the binarization processing flow.
Since typical document segmentation and labelling for content analysis is out of
question in this phase, we use a rapid hybrid switch that dispatches the small, resolution
adapted windows to textual (1) and non-textual (2) threshold evaluation techniques. The
24

switch was developed to cover most generic appearances of typical document layout
types and can easily be modified for others as well.
Figure 2.7: Overview of the binarization algorithm.

The threshold evaluation techniques are adapted to textual and non-textual area
properties, with the special tolerance and detection to different basic defect types that are
usually introduced to images. The outcome of these techniques represents a threshold
value proposed for each pixel, or every nth pixel, decided by the user.
Figure 2.8: Interpolation options for binarization computation.

These values are used to collect the final outcome of the binarization by a
threshold control module. The technique also enables the utilization of multi-thresholds
region by region of globally, if desired.
2.5 Adaptive binarization
The document image contains different surface (texture) types that can be divided
into uniform, differentiating and transiently changing. The texture contained in pictures
and background can usually be classified to uniform or differentiating categories, while
the text, line drawings, etc. have more transient properties by nature. Our approach is to
analyze the local document image surface in order to decide on the binarization method
needed (Fig. 4). During this decision, a hybrid switching' module selects one of two
25

specialized binarization algorithms to be applied to the region. The goal of the
binarization algorithms is to produce an optimal threshold value for each pixel. A fast
option is to compute first a threshold for every nth pixel and then use interpolation for the
rest of the pixels.
The binarization method can also be set to bypass the hybrid switch phase. Then
the user can choose which algorithm is selected for thresholding. All other modules
function in the same way as in hybrid conditions. The following subsection describes the
region type and switching algorithms. The two different binarization algorithms are then
discussed in detail. The final binarization is performed using the proposed threshold
values. This process is depicted in the last subsection.
2.5.1 Region analysis and switching
Threshold computation is preceded by the selection of the proper binarization method
based on an analysis of local image properties. First, the document image is tiled to equal
sized rectangular windows of 10x20 pixels wide, corresponding to the resolution that
linearly varies between' 75 and(300 dpi. Two simple features are then computed for each
window; these results are used to select the method. The first feature is simply the
average grey value of a window. The second feature, transient difference', measures local
changes in contrast.
The difference values are accumulated in each sub-window and then scaled between 0
and 1. Using the limits of 10, 15 and 30% of scaled values, the transient difference
property is defined as &uniform', &near-uniform', &differing' or &transient'. This coarse
division is made according to average homogeneity on the surface. According to these
labels, a vote is given to corresponding binarization method that is to be used in a
window. The labels uniform' and &near-uniform' correspond to background and &scene'
pictures, and give votes to the SDM. The labels &differing' and &transient' give their
votes to the TBM method. Selection of a binarization algorithm is then performed as
following example rules (1, 2) show:
1. If the average is high and a global histogram peak is in the same quarter of the
histogram and transient difference is transient, then use SDM.
26

2. If the average is medium and a global histogram peak is not in the same quarter of
the histogram and transient difference is uniform, then use TBM. An example
result of image partitioning. The white regions are guided to the SDM algorithm,
while the grey regions are binarized with the TBM algorithm
2.6 Binarization of non-textual components

As in soft control applications, our algorithm first analyses the window surface by
calculating descriptive characteristics. Then, the soft control algorithm is applied to every
nth pixel. The result is a local threshold based on local region characteristics. To ensure
local adaptivity of threshold selection, we use two different types of locally calculated
features: weighted bound' and threshold difference'.
2.7 Binarization of textual components

For text binarization we use a modified version of Niblack's algorithm. The idea
of Niblack's method is to vary the threshold over the image, based on the local mean, m,
and local standard deviation, s, computed in a small neighbor-hood of each pixel. A
threshold for each pixel is computed from T=m+k*s, where k is a user defined parameter
and gets negative values. This method does not work well for cases in which the
background contains light texture as the grey values of these unwanted details easily
exceed threshold values.
This results in costly post processing as demonstrated. In our modification, a
threshold is computed with the dynamic range of standard deviation, R. Furthermore, the
local mean is utilized to multiply terms R and a fixed value k. This has the effect of
amplifying the contribution of standard deviation in an adaptive manner. Consider, for
example, a dark text on light dirty-looking background (e.g., stains in a bad copy). The
m-coefficient decreases the threshold value in background areas. This efficiently removes
the effect of stains in a thresholded image.
In our experiments, we used R=128 with 8-bit gray level images and k=0.5 to
obtain good results. The algorithm is not too sensitive to the value of parameter k. Eq. (2.
9) presents the textual binarization formula.
27
(2.9)
Where m(x, y) and s(x, y) are as in Niblack's formula. R is the dynamic range of standard
deviation, and the parameter k gets positive values.
2.8 Interpolative threshold selection

After thresholding guided by the surface type, the final thresholds are calculated
for background, textual, graphics and line drawing regions. A fast option is to compute
first a threshold for every nth pixel and then using interpolation for the rest of the pixels.
The control algorithm has two modes depending on the value of n. If n=1, the threshold
values gained from SDM and TBM algorithms are combined directly.
If n>1, threshold values for non-base pixels are calculated using the surrounding
threshold values. We have two options to calculate the non-base pixel thresholds: bilinear
interpolation and simple averaging.
In the interpolation method, the threshold value for a non-base pixel is gained by
computing the surrounding base pixels distance to the current one, and using these values
as weights. This approach gives a more precise, weighted threshold value for each pixel.
In the simple averaging method, the average of the surrounding four n pixel threshold
candidate values is calculated and used as a final threshold for each non-base pixel
between the selected base pixels. This approach is used to lower the computational load
and is suitable for most images, especially for those with random noise and n larger than
five pixels.
2.9 EDGE DETECTION

DEFINITION
An edge is a set of connected pixels that lie on the boundary between two regions.
Fundamentally, an edge is a "local" concept whereas a region boundary, owing to the way
it is defined, is a more global idea. A reasonable definition of "edge" requires the ability
to measure gray-level transitions in a meaningful way. We start by modeling an edge
intuitively. This will lead us to formalism in which "meaningful" transitions in gray levels
can be measured. Intuitively, an ideal edge has the properties of the model shown in Fig.
28

2.9(a). An ideal edge according to this model is a set of connected pixels (in the vertical
direction here), each of which is located at an orthogonal step transition in gray level (as
shown by the horizontal profile in the figure).
In practice, optics, sampling, and other image acquisition imperfections yield
edges that are blurred, with the degree of blurring being determined by factors such as the
quality of the image acquisition system, the sampling rate, and illumination conditions
under which the image is acquired. As a result, edges are more closely modeled as having
a "ramp like" profile, such as the one shown in Fig.2.9 (b).
2.9(a) Model of an ideal digital edge
2.9(b) Model of a ramp edge.
Figure 2.9: Edge detection
2.10 FIRST AND SECOND GRADIENTS

The slope of the ramp is inversely proportional to the degree of blurring in the
edge. In this model, we no longer have a thin (one pixel thick) path. Instead, an edge
point now is any point contained in the ramp, and an edge would then be a set of such
points that are connected. The "thickness" of the edge is determined by the length of the
ramp, as it transitions from an initial to a final gray level. This length is determined by the
slope, which, in turn, is determined by the degree of blurring. This makes sense: Blurred
edges lend to be thick and sharp edges tend to be thin. Figure 2.10(a) shows the image
from which the close-up in Fig. 2.10(b) was extracted. Figure 2.10(b) shows a horizontal
29

gray-level profile of the edge between the two regions. This figure also shows the first
and second derivatives of the gray-level profile. The first derivative is positive at the
points of transition into and out of the ramp as we move from left to right along the
profile; it is constant for points in the ramp; and is zero in areas of constant gray level.
The second derivative is positive at the transition associated with the dark side of the
edge, negative at the transition associated with the light side of the edge, and zero along
the ramp and in areas of constant gray level. The signs of the derivatives in Fig. 2.10(b)
would be reversed for an edge that transitions from light to dark.
We conclude from these observations that the magnitude of the first derivative can
be used to detect the presence of an edge at a point in an image (i.e. to determine if a
point is on a ramp). Similarly, the sign of the second derivative can be used to determine
whether an edge pixel lies on the dark or light side of an edge. We note two additional
properties of the second derivative around an edge:
It produces two values for every edge in an image (an undesirable feature) and
An imaginary straight line joining the extreme positive and negative values of the second
derivative would cross zero near the midpoint of the edge. This zero-crossing property of
the second derivative is quite useful for locating the centers of thick edges.
(a) Two regions separated by a vertical edge
(b) Detail near the edge, showing a gray-level profile, and the first and second derivatives
of the profile
First-order derivatives of a digital image are based on various approximations of
the 2-D gradient. The gradient of an image f (x, y) at location (x, y) is defined as the
vector
[]
f
Gx
f=
= x
f
Gy
y
[ ]
(2.10)
30
Figure 2.10: First and second derivatives

It is well known from vector analysis that the gradient vector points in the
direction of maximum rate of change of f at coordinates (x, y). An important quantity in
edge detection is the magnitude of this vector, denoted by f, where
f =mag ( f )= [ Gx 2 +G y 21/ 2 ]
(2.11)
This quantity gives the maximum rate of increase of f (x, y) per unit distance in
the direction of f. It is a common (although not strictly correct) practice to refer to f
31

also as the gradient. The direction of the gradient vector also is an important quantity. Let
(x, y) represent the direction angle of the vector f at (x, y). Then, from vector
analysis,
( x , y ) =tan
Gy
Gx
( )
(2.12)
Where the angle is measured with respect to the x-axis. The direction of an edge
at (x, y) is perpendicular to the direction of the gradient vector at that point. Computation
of the gradient of an image is based on obtaining the partial derivatives f/x and f/y at
every pixel location. Let the 3x3 area shown in below Figure. Represent the gray levels in
a neighborhood of an image. One of the simplest ways to implement a first-order partial
derivative at point Z5 is to use the following Roberts cross-gradient operators:
Gx =( z 9z 5)
G y =( z 8z 6)
2.10.1 PREWITT AND SOBEL MASKS:
These derivatives can be implemented for an entire image by using the masks
shown in Fig. 2.11(b). Masks of size 2 X 2 are awkward to implement because they
do not have a clear center. An approach using masks of size 3 X 3 is given by
32
Figure 2.11: A 3 X 3 region of an image (the zs are gray-level values) and various masks
used to compute the gradient at point labeled Z5.
G y =( z 9 + z 6 + z 3 )(z1 + z 4 + z 7 )
(2.13)
A weight value of 2 is used to achieve some smoothing by giving more

importance to the center point. Figures 3.18(last two tables), called the Sobel operators,
and are used to implement these two equations. The Prewitt and Sobel operators are
among the most used in practice for computing digital gradients. The Prewitt masks are
simpler to implement than the Sobel masks, but the latter have slightly superior noisesuppression characteristics, an important issue when dealing with derivatives. Note that
the coefficients in all the masks shown in Fig3.18 sum to 0, indicating that they give a
response of 0 in areas of constant gray level, as expected of a derivative operator. The
masks just discussed are used to obtain the gradient components Gx and Gy. Computation
of the gradient requires that these two components be combined. However, this
implementation is not always desirable because of the computational burden required by
33

squares and square roots. An approach used frequently is to approximate the gradient by
absolute values:
f |Gx|+|G y|
(2.14)
This equation is much more attractive computationally, and it still preserves

relative changes in gray levels. However, this is not an issue when masks such as the
Prewitt and Sobel masks are used to compute Gx and Gy. It is possible to modify the 3 X
3 masks in Fig. 2.11 so that they have their strongest responses along the diagonal
directions. The two additional Prewitt and Sobel masks for detecting discontinuities in the
diagonal directions are shown in Fig. 2.12.
Prewitt
Figure 2.12 Prewitt masks for detecting diagonal edges
CHAPTER 3
ADAPTIVE IMAGE CONTRAST
This section describes the proposed document image binarization techniques.
Given a degraded document image, an adaptive contrast map is first constructed and the
text stroke edges are then detected through the combination of the binarized adaptive
contrast map and the canny edge map. The text is then segmented based on the local
threshold that is estimated from the detected text stroke edge pixels. Some postprocessing is further applied to improve the document binarization quality.
3.1 Contrast Image Construction

The image gradient has been widely used for edge detection and it can be used to
detect the text stroke edges of the document images effectively that have a uniform
34

document background. On the other hand, it often detects many non-stroke edges from
the background of degraded document that often contains certain image variations due to
noise, uneven lighting, bleed-through, etc. To extract only the stroke edges properly, the
image gradient needs to be normalized to compensate the image variation within the
document background. In our earlier method, the local contrast evaluated by the local
image maximum and minimum is used to suppress the background variation as described
in Equation 1.2.
In particular, the numerator (i.e. the difference between the local maximum and
the local minimum) captures the local image difference that is similar to the traditional
image gradient. The denominator is a normalization factor that suppresses the image
variation within the document background. For image pixels within bright regions, it will
produce a large normalization factor to neutralize the numerator and accordingly result in
a relatively low image contrast.
For the image pixels within dark regions, it will produce a small denominator and
accordingly result in a relatively high image contrast. However, the image contrast in
Equation 1.2 has one typical limitation that it may not handle document images with the
bright text properly.
This is because a weak contrast will be calculated for stroke edges of the bright
text where the denominator in Equation 1.2 will be large but the numerator will be small.
To overcome this over-normalization problem, we combine the local image contrast with
the local image gradient and derive an adaptive local image contrast as follows:
C a ( i , j )= C (i , j ) + ( 1 ) (I max ( i, j ) I min (i , j ) )
(3.1)
where C(i, j ) denotes the local contrast in Equation 3.1 and (Imax(i, j ) Imin(i,
j )) refers to the local image gradient that is normalized to [0, 1]. The local windows size
is set to 3 empirically. is the weight between local contrast and local gradient that is
controlled based on the document image statistical information.
Ideally, the image contrast will be assigned with a high weight (i.e. large ) when
the document image has significant intensity variation. So that the proposed binarization
technique depends more on the local image contrast that can capture the intensity
35

variation well and hence produce good results. Otherwise, the local image gradient will
be assigned with a high weight. The proposed binarization technique relies more on
image gradient and avoid the over normalization problem of our previous method. We
model the mapping from document image intensity variation to by a power function as
follows:
Std
128
( )
(3.2)
Where Std denotes the document image intensity standard deviation, and is a predefined parameter. The power function has a nice property in that it monotonically and
smoothly increases from 0 to 1 and its shape can be easily controlled by different . can
be selected from [0,], where the power function becomes a linear function when = 1.
Therefore, the local image gradient will play the major role in Equation 3.1 when is
large and the local image contrast will play the major role when is small.
3.2 Text Stroke Edge Pixel Detection

The purpose of the contrast image construction is to detect the stroke edge pixels
of the document text properly. The constructed contrast image has a clear bi-modal
pattern, where the adaptive image contrast computed at text stroke edges is obviously
larger than that computed within the document background. We therefore detect the text
stroke edge pixel candidate by using Otsus global thresholding method.
As the local image contrast and the local image gradient are evaluated by the
difference between the maximum and minimum intensity in a local window, the pixels at
both sides of the text stroke will be selected as the high contrast pixels.
The binary map can be further improved through the combination with the edges
by Cannys edge detector, because Cannys edge detector has a good localization property
that it can mark the edges close to real edge locations in the detecting image. In addition,
canny edge detector uses two adaptive thresholds and is more tolerant to different
imaging artifacts such as shading.
36
Figure 3.1: Binary contrast maps
Figure 3.2: canny edge maps
Figure 3.3: Combined edge maps of the sample document images

It should be noted that Cannys edge detector by itself often extracts a large
amount of non-stroke edges as illustrated in Fig. 3.2 without tuning the parameter
manually. In the combined map, we keep only pixels that appear within both the high
contrast image pixel map and canny edge map. The combination helps to extract the text
stroke edge pixels accurately as shown in Fig. 3.3.
3.3. Local Threshold Estimation

The text can then be extracted from the document background pixels once the
high contrast stroke edge pixels are detected properly. Two characteristics can be
observed from different kinds of document images: First, the text pixels are close to the
detected text stroke edge pixels. Second, there is a distinct intensity difference between
the high contrast stroke edge pixels and the surrounding background pixels. The
37

document image text can thus be extracted based on the detected text stroke edge pixels
as follows:
(3.3)
Where Emean and Estd are the mean and standard deviation of the intensity of the
detected text stroke edge pixels within a neighborhood window W, respectively. The
neighborhood window should be at least larger than the stroke width in order to contain
stroke edge pixels. So the size of the neighborhood window W can be set based on the
stroke width of the document image under study, EW, which can be estimated from the
detected stroke edges as stated in Algorithm 1. Since we do not need a precise stroke
width, we just calculate the most frequently distance between two adjacent edge pixels
(which denotes two sides edge of a stroke) in horizontal direction and use it as the
estimated stroke width.
First the edge image is scanned horizontally row by row and the edge pixel
candidates are selected as described in step 3. If the edge pixels, which are labeled 0
(background) and the pixels next to them are labeled to 1 (edge) in the edge map (Edg),
are correctly detected, they should have higher intensities than the following few pixels
(which should be the text stroke pixels).
So those improperly detected edge pixels are removed in step 4. In the remaining
edge pixels in the same row, the two adjacent edge pixels are likely the two sides of a
stroke, so these two adjacent edge pixels are matched to pairs and the distance between
them are calculated in step 5.
Algorithm 1 Edge Width Estimation
Require: The Input Document Image I and Corresponding Binary Text Stroke Edge
Image Edg
Ensure: The Estimated Text Stroke Edge Width EW
1. Get the width and height of I
2. for Each Row i = 1 to height in Edg do
3. Scan from left to right to find edge pixels that meet the following criteria:
a) its label is 0 (background);
38

b) The next pixel is labeled as 1(edge).
4. Examine the intensities in I of those pixels selected in Step 3, and remove those
pixels that have a lower intensity than the following pixel next to it in the same
row of I.
5. Match the remaining adjacent pixels in the same row into pairs, and calculate the
distance between the two pixels in pair.
6. end for
7. Construct a histogram of those calculated distances.
8. Use the most frequently occurring distance as the estimated stroke edge width
EW.
After that a histogram is constructed that records the frequency of the distance
between two adjacent candidate pixels. The stroke edge width EW can then be
approximately estimated by using the most frequently occurring distances of the adjacent
edge pixels.
3.4 Post-Processing
Once the initial binarization result is derived from Equation 3.3 as described in
previous subsections, the binarization result can be further improved by incorporating
certain domain knowledge as described in Algorithm 2. First, the isolated foreground
pixels that do not connect with other foreground pixels are filtered out to make the edge
pixel set precisely.
Second, the neighborhood pixel pair that lies on symmetric sides of a text stroke
edge pixel should belong to different classes (i.e., either the document background or the
foreground text). One pixel of the pixel pair is therefore labeled to the other category if
both of the two pixels belong to the same class. Finally, some single-pixel artifacts along
the text stroke boundaries are filtered out by using several logical operators as described
in [4].
Algorithm 2 Post-Processing Procedure
Require: The Input Document Image I , Initial Binary Result B and Corresponding
Binary Text Stroke Edge Image Edg
39

Ensure: The Final Binary Result B f
1.
2.
3.
4.
5.
Find out all the connect components of the stroke edge pixels in Edg.
Remove those pixels that do not connect with other pixels.
for Each remaining edge pixels (i, j ): do
Get its neighborhood pairs: (i 1, j ) and (i + 1, j ); (i, j 1) and (i, j + 1)
if The pixels in the same pairs belong to the same class (both text or background)
then
6. Assign the pixel with lower intensity to foreground class (text), and the other to
background class.
7. end if
8. end for
9. Remove single-pixel artifacts along the text stroke boundaries after the document
thresholding.
10. Store the new binary result to B f.
3.5. Quality Parameters

3.5.1. Precision and Recall
In pattern recognition and information retrieval with binary classification, precision (also
called positive predictive value) is the fraction of retrieved instances that are relevant,
while recall (also known as sensitivity) is the fraction of relevant instances that are
retrieved. Both precision and recall are therefore based on an understanding and measure
of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7
dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct,
but 3 are actually cats, the program's precision is 4/7 while its recall is 4/9. When a search
engine returns 30 pages only 20 of which were relevant while failing to return 40
additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3.
In statistics, if the null hypothesis is that all and only the relevant items are
retrieved, absence of type I and type II errors corresponds respectively to maximum
precision (no false positive) and maximum recall (no false negative). The above pattern
recognition example contained 7 4 = 3 type I errors and 9 4 = 5 type II errors.
Precision can be seen as a measure of exactness or quality, whereas recall is a measure of
completeness or quantity.
40

In simple terms, high precision means that an algorithm returned substantially
more relevant results than irrelevant, while high recall means that an algorithm returned
most of the relevant results.
For classification tasks, the terms true positives, true negatives, false positives,
and false negatives compare the results of the classifier under test with trusted external
judgments. The terms positive and negative refer to the classifier's prediction (sometimes
known as the expectation), and the terms true and false refer to whether that prediction
corresponds to the external judgment (sometimes known as the observation).
Precision and recall are then defined as
(3.5)
(3.6)
Recall in this context is also referred to as the true positive rate or sensitivity, and
precision is also referred to as positive predictive value (PPV).
3.5.2. F-Measure
In statistical analysis of binary classification, the F1 score (also F-score or Fmeasure) is a measure of a test's accuracy. It considers both the precision p and
the recall r of the test to compute the score: p is the number of correct positive results
divided by the number of all positive results, and r is the number of correct positive
results divided by the number of positive results that should have been returned. The
F1 score can be interpreted as a weighted average of the precision and recall, where an
F1 score reaches its best value at 1 and worst at 0.
41

The traditional F-measure or balanced F-score (F1 score) is the harmonic mean of
precision and recall:
(3.7)
3.5.3. Pseudo F-Measure

It was motivated by the fact that each character has a unique silhouette which can
be represented by its skeleton. In this respect, we assume that a perfect recall can be
achieved in the case that each skeleton constituent of the ground truth has been detected.
Compared with the typical F-Measure as presented in 3.5.2, there exist a difference which
concerns an alternate measure for recall, namely pseudo-Recall (p-Recall) which is
based on the skeletonized ground truth image.
The skeletonized ground truth image is defined by the following equations
(3.8)
Taking into account the skeletonized ground truth image, we are able to
automatically measure the performance of any binarization algorithm in terms of recall.
P-Recall is defined as the percentage of the skeletonized ground truth image SG that is
detected in the resulting MxN binary image B. p-Recall is given by the following
equation:
(3.9)
(3.10)
3.5.4. PSNR
42

(3.11)
Where
(3.12)
PSNR is a measure of how close is an image to another. Therefore, the higher the
value of PSNR, the higher the similarity of the two MxN images is. We consider that the
difference between foreground and background equals to C.
3.5.5. NRM
The negative rate metric NRM is based on the pixel-wise mismatches between the
GT and prediction. It combines the false negative rate NRFN and the false positive rate
NRFP. It is denoted as follows:
(3.13)
Where
and
(3.14)
NTP denotes the number of true positives, N FP denotes the number of false
positives, NTN denotes the number of true negatives, NFN denotes the number of false
negatives.
In contrast to F-Measure and PSNR, the binarization quality is better for lower
NRM.
3.5.6. Misclassification penalty metric (MPM)
The Misclassification penalty metric MPM evaluates the prediction against the
Ground Truth (GT) on an object-by-object basis. Misclassification pixels are penalized by
their distance from the ground truth objects border.
43
(3.15)
Where
(3.16)
CHAPTER 4
RESULTS & DISCUSSIONS
Table 1: Evaluation Results of the Dataset of DIBCO 2009
Methods
F-Measure
PF-measure
PSNR
NRM
MPM
DRD
Adaptive
BERN
SAUV
Proposed
66.22528
62.15014
66.95062
64.41689
0.89023
0.47091
0.91148
1.63160
9.401
6.753
9.431
12.4043
0.292
0.355
0.287
0.2839
0.052
0.121
0.054
0.0185
161.99
298.842
161.21
80.7279
44
Figure 4.1: Original Image

The above figure shows the original degraded color stoke image
Figure 4.2: Grayscale Image

The above figure shows the original degraded grayscale stoke image
Figure 4.3: Binarized Image

The above figure shows binarized image of proposed method. From the table 4.1
we can say that our proposed method gives better performance compared to other
techniques like BERN, adaptive and SAUV techniques
45
CHAPTER 5
CONCLUSION & FUTURE SCOPE
5.1. Conclusion
In this work presents an adaptive image contrast based document image
binarization technique that is tolerant to different types of document degradation such as
uneven illumination and document smear. The proposed technique is simple and robust,
only few parameters are involved. Moreover, it works for different kinds of degraded
document images. The proposed technique makes use of the local image contrast that is
evaluated based on the local maximum and minimum. The proposed method has been
tested on the various datasets. Experiments show that the proposed method outperforms
most reported document binarization methods in term of the F-measure, pseudo Fmeasure, PSNR, NRM, MPM and DRD.
5.2. Future Scope

46

In future, the performance of image binarization can be improved in terms of Fmeasure, pseudo F-measure, PSNR, NRM, MPM and DRD by using automatic parameter
tuning.
REFERENCES
[1] B. Gatos, K. Ntirogiannis, and I. Pratikakis, ICDAR 2009 document image
binarization contest (DIBCO 2009), in Proc. Int. Conf. Document Anal. Recognit., Jul.
2009, pp. 13751382.
[2] I. Pratikakis, B. Gatos, and K. Ntirogiannis, ICDAR 2011 document image
binarization contest (DIBCO 2011), in Proc. Int. Conf. Document Anal. Recognit., Sep.
2011, pp. 15061510.
[3] I. Pratikakis, B. Gatos, and K. Ntirogiannis, H-DIBCO 2010 handwritten document
image binarization competition, in Proc. Int. Conf. Frontiers Handwrit. Recognit., Nov.
2010, pp. 727732.
[4] S. Lu, B. Su, and C. L. Tan, Document image binarization using background
estimation and stroke edges, Int. J. Document Anal. Recognit., vol. 13, no. 4, pp. 303
314, Dec. 2010.
47

[5] B. Su, S. Lu, and C. L. Tan, Binarization of historical handwritten document images
using local maximum and minimum lter, in Proc. Int. Workshop Document Anal. Syst.,
Jun. 2010, pp. 159166.
[6] G. Leedham, C. Yan, K. Takru, J. Hadi, N. Tan, and L. Mian, Comparison of some
thresholding algorithms for text/background segmentation in difcult document images,
in Proc. Int. Conf. Document Anal. Recognit., vol. 13. 2003, pp. 859864.
[7] M. Sezgin and B. Sankur, Survey over image thresholding techniques and
quantitative performance evaluation, J. Electron. Imag., vol. 13, no. 1, pp. 146165, Jan.
2004.
[8] O. D. Trier and A. K. Jain, Goal-directed evaluation of binarization methods, IEEE
Trans. Pattern Anal. Mach. Intell., vol. 17, no. 12, pp. 11911201, Dec. 1995.
[9] O. D. Trier and T. Taxt, Evaluation of binarization methods for document images,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 3, pp. 312315, Mar. 1995.
[10] A. Brink, Thresholding of digital images using two-dimensional entropies, Pattern
Recognit., vol. 25, no. 8, pp. 803808, 1992.
[11] J. Kittler and J. Illingworth, On threshold selection using clustering criteria, IEEE
Trans. Syst., Man, Cybern., vol. 15, no. 5, pp. 652655, Sep.Oct. 1985.
[12] N. Otsu, A threshold selection method from gray level histogram, IEEE Trans.
Syst., Man, Cybern., vol. 19, no. 1, pp. 6266, Jan. 1979.
[13] N. Papamarkos and B. Gatos, A new approach for multithreshold selection,
Comput. Vis. Graph. Image Process., vol. 56, no. 5, pp. 357370, 1994.
[14] J. Bernsen, Dynamic thresholding of gray-level images, in Proc. Int. Conf. Pattern
Recognit., Oct. 1986, pp. 12511255.
[15] L. Eikvil, T. Taxt, and K. Moen, A fast adaptive method for binarization of
document images, in Proc. Int. Conf. Document Anal. Recognit., Sep. 1991, pp. 435
443.
48
APPENDIX A
SOFTWARE REQUIREMENT
The MATLAB
MATLAB is a high performance language for technical computing. It integrates
computation visualization and programming in an easy to use environment. MATLAB
stands for matrix laboratory. It was written originally to provide easy access to matrix
software developed by LINPACK (linear system package) and EISPACK (Eigen system
package) projects. MATLAB is therefore built on a foundation of sophisticated matrix
software in which the basic element is matrix that does not require pre dimensioning.
Typical uses of MATLAB

The typical usage areas of MATLAB are
Math and computation

49
Algorithm development
Data acquisition
Data analysis ,exploration and visualization
Scientific and engineering graphics

MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems,
especially those with matrix and vector formulations, in a fraction of the time it would
take to write a program in a scalar non-interactive language such as C or FORTRAN.
MATLAB features a family of add-on application-specific solutions called toolboxes.
Very important to most users of MATLAB, toolboxes allow you to learn and apply
specialized technology. Toolboxes are comprehensive collections of MATLAB functions
(M-files) that extend the MATLAB environment to solve particular classes of problems.
Areas in which toolboxes are available include signal processing, image processing,
control systems, neural networks, fuzzy logic, wavelets, simulation, and many others.
Features of MATLAB
Advance algorithm for high performance numerical computation, especially in the

field matrix algebra.
A large collection of predefined mathematical functions and the ability to define

ones own functions.
Two-and three dimensional graphics for plotting and displaying data
Powerful, matrix or vector oriented high level programming language for

individual applications.
50

MATLAB
MATLAB
Programming language
User written / built in functions
Graphics
2-D graphics
3-D graphics
Color and lighting
Animation
Computation
Linear algebra
Signal processing
Quadrature
Etc
External interface
Interface with C and
FORTRAN
Programs
Tool boxes
Signal processing
Image processing
Control systems
Neural Networks
Communications
Robust control
Statistics
Fig A.1: Features and capabilities of MATLAB
MATLAB DEVELOPMENT ENVIRONMENT

The MATLAB system consists of five main parts
Development Environment
The MATLAB Mathematical Function
The MATLAB Language

51
The Graphical User Interface(GUI) construction
The MATLAB Application Program Interface (API)

Development Environment
This is the set of tools and facilities that help you use MATLAB functions and files.
Many of these tools are graphical user interfaces. It includes the MATLAB desktop and
Command Window, a command history, an editor and debugger, and browsers for
viewing help, the workspace, files, and the search path.
The MATLAB Mathematical Function
This is a vast collection of computational algorithms ranging from elementary functions
like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like
matrix inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.
The MATLAB Language
This is a high-level matrix/array language with control flow statements, functions,
data structures, input/output, and object-oriented programming features. It allows both
"programming in the small" to rapidly create quick and dirty throw-away programs, and
"programming in the large" to create complete large and complex application programs.
The GUI construction
MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well
as annotating and printing these graphs. It includes high-level functions for twodimensional and three dimensional data visualization, image processing, animation, and
presentation graphics. It also includes low-level functions that allow you to fully
customize the appearance of graphics as well as to build complete graphical user
interfaces on your MATLAB applications.
52

The Graphical User Interface (GUI) is an interactive system that helps to establish good
communication between the processor and organizer. The functional operation of the GUI
is compatible with the Applets in JAVA. The MATLAB Toolbox provides more functions
to create GUI main frames.
The GUIs can be created by the GUIDE (Graphical User Interface Development
Environment) which is a package in MATLAB Toolbox. The GUI makes the process so
easy to operate and reduces the risk. GUIDE, the MATLAB Graphical User Interface
development environment, provides a set of tools for creating graphical user interfaces
(GUIs). These tools greatly simplify the process of designing and building GUIs. We can
use the GUIDE tools to develop.
Lay out the GUI: Layout Editor, we can lay out a GUI easily by clicking and
dragging GUI components -- such as panels, buttons, text fields, sliders, menus, and so
on -- into the layout area.
Program the GUI: GUIDE automatically generates an M-file that controls how
the GUI operates. The M-file initializes the GUI and contains a framework for all the
GUI callbacks -- the commands that are executed when a user clicks a GUI component.
Using the M-file editor, we can add code to the callbacks to perform the functions.
GUIDE stores a GUI in two files, which are generated the first time when we save or run
the GUI:
A FIG-file, with extension .fig, which contains a complete description of the GUI
layout and the components of the GUI: push buttons, menus, axes, and so on.
An M-file, with extension .m, which contains the code that controls the GUI,
including the callbacks for its components.
These two files correspond to the tasks of lying out and programming the GUI. When we
lay out of the GUI in the Layout Editor, our work is stored in the FIG-file. When we
program the GUI, our work is stored in the M-file.
The MATLAB Application Program Interface (API)
53

This is a library that allows you to write C and FORTRAN programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking),
calling MATLAB as a computational engine, and for reading and writing MAT-files.
MATLAB WORKING ENVIRONMENT

MATLAB Desktop
Mat lab Desktop is the main Mat lab application window. The desktop contains five sub
windows, the command window, the workspace browser, the current directory window,
the command history window, and one or more figure windows, which are shown only
when the user displays a graphic.
Fig A.2: MATLAB Desktop

The command window is where the user types MATLAB commands and expressions at
the prompt (>>) and where the output of those commands is displayed. MATLAB defines
the workspace as the set of variables that the user creates in a work session.
54

The workspace browser shows these variables and some information about them. Double
clicking on a variable in the workspace browser launches the Array Editor, which can be
used to obtain information and income instances edit certain properties of the variable.
The current Directory tab above the workspace tab shows the contents of the current
directory, whose path is shown in the current directory window. For example, in the
windows operating system the path might be as follows: C:\MATLAB\Work, indicating
that directory work is a subdirectory of the main directory MATLAB; which is
installed in drive C. clicking on the arrow in the current directory window shows a list of
recently used paths. Clicking on the button to the right of the window allows the user to
change the current directory.
MATLAB uses a search path to find M-files and other MATLAB related files, which are
organize in directories in the computer file system. Any file run in MATLAB must reside
in the current directory or in a directory that is on search path. By default, the files
supplied with MATLAB and math works toolboxes are included in the search path. The
easiest way to see which directories are on the search path. The easiest way to see which
directories are soon the search paths, or to add or modify a search path, is to select set
path from the File menu the desktop, and then use the set path dialog box. It is good
practice to add any commonly used directories to the search path to avoid repeatedly
having the change the current directory.
The Command History Window contains a record of the commands a user has entered in
the command window, including both current and previous MATLAB sessions.
Previously entered MATLAB commands can be selected and re-executed from the
command history window by right clicking on a command or sequence of commands.
This action launches a menu from which to select various options in addition to executing
the commands. This is useful to select various options in addition to executing the
commands. This is a useful feature when experimenting with various commands in a
work session.
Using the MATLAB Editor to create M-Files
55

The MATLAB editor is both a text editor specialized for creating M-files and a graphical
MATLAB debugger. The editor can appear in a window by itself, or it can be a sub
window in the desktop. M-files are denoted by the extension .m, as in pixelup .The
MATLAB editor window has numerous pull-down menus for tasks such as saving,
viewing, and debugging files. Because it performs some simple checks and also uses
color to differentiate between various elements of code, this text editor is recommended
as the tool of choice for writing and editing M-functions. To open the editor, type edit at
the prompt opens the M-file filename. m in an editor window, ready for editing. As noted
earlier, the file must be in the current directory, or in a directory in the search path.
Getting Help
The principal way to get help online is to use the MATLAB help browser, opened
as a separate window either by clicking on the question mark symbol (?) on the desktop
toolbar, or by typing help browser at the prompt in the command window. The help
Browser is a web browser integrated into the MATLAB desktop that displays a Hypertext
Markup Language (HTML) documents. The Help Browser consists of two panes, the
help navigator pane, used to find information, and the display pane, used to view the
information. Self-explanatory tabs other than navigator pane are used to perform a search.
56

Document

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Document

Загружено:

Авторское право:

Доступные форматы

Robust Document Image Binarization Techniques

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

Robust Document Image Binarization Techniques

Robust Document Image Binarization Techniques

1.2 Related Work

Robust Document Image Binarization Techniques

Where C ( i , j ) denotes the contrast of an image pixel (i, j),

Bernsens method is simple,

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

1.3. Tools Used

1.4 Organization of the Report

Robust Document Image Binarization Techniques

2.1 IMAGE NOISE

Robust Document Image Binarization Techniques

2.2 TYPES OF IMAGE NOISE

Robust Document Image Binarization Techniques

Figure 2.1 (a) Original image (b) Additive noise image

Robust Document Image Binarization Techniques

Robust Document Image Binarization Techniques

Speckle noise in SAR is a multiplicative noise, i.e. it is in direct proportion to the

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

represents the grey level,

Department of E.C.E, MRITS

the mean value and the standard deviation.

Robust Document Image Binarization Techniques

Figure 2.3(a) original image (b) Gaussian noise image.

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

We refer to a specific thresholding method, which was programmed in the

Robust Document Image Binarization Techniques

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

Robust Document Image Binarization Techniques

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

2.3.4 Thresholding Based on Attribute Similarity

mk = p ( g) g k bk =Pf mkf + Pb mkb

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

Figure 2.4: Taxonomy of thresholding schemes

Robust Document Image Binarization Techniques

Figure 2.5: Examples of document analysis problem types in binarization.

Robust Document Image Binarization Techniques

Figure 2.6: Example of good binarization on degraded sample image.

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

object/background shape and noise-insensitivity. The threshold selection is done by

Robust Document Image Binarization Techniques

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

Robust Document Image Binarization Techniques

Figure 2.7: Overview of the binarization algorithm.

Figure 2.8: Interpolation options for binarization computation.

Robust Document Image Binarization Techniques

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

2.6 Binarization of non-textual components

2.7 Binarization of textual components

Department of E.C.E, MRITS

Robust Document Image Binarization Techniques

2.8 Interpolative threshold selection