Академический Документы
Профессиональный Документы
Культура Документы
CHAPTER 1
INTRODUCTION
1.1.
Overview
The Image segmentation is an essential task in the fields of image processing and
computer vision. It is a process of partitioning the digital images and is used to locate the
boundaries into a finite number of meaning full regions and easier to analyze. The
Simplest method for image segmentation is thresholding. Thresholding is an important
technique in image segmentation, enhancement and object detection. The output of the
thresholding process is a binary image whose gray level value 0 (black) will indicate a
pixel belonging to a print, legend, drawing, or target and a gray level value 1 (white) will
indicate the background. The main complexity coupled with thresholding in documents
applications happen when the associated noise process is non-stationary.
The factors that make difficult thresholding action are ambient illumination,
variance of gray levels within the object and the background, insufficient contrast, object
shape and size non-commensurate with the spectacle. The lack of objective measures to
assess the performance of thresholding algorithms is another handicap. Many methods
have been reported in the literature. It can extract the object from the background by
grouping the intensity values according to the thresholding value.
Thresholding divides the image into patches, and each patch is thresholding by a
threshold value that depends on the patch contents. In order to decrease the effects of
noise, common practice is to first smooth a boundary prior to partitioning. The
Binarization technique is aimed to be used as a primary phase in various manuscript
analysis, processing and retrieval tasks. So, the unique manuscript characteristics, like
textual properties, graphics, line drawings and complex mixtures of the layout-semantics
should be included in the requirements.
On the other hand, the technique should be simple while taking all the document
analysis demands into consideration. The threshold evaluation techniques are adapted to
(a)
(b)
(c)
(d)
Department of E.C.E, MRITS
(e)
Figure 1.1: Five degraded document image examples (a)(d) are taken from DIBCO
series datasets and (e) is taken from Bickley diary dataset.
Though document image binarization has been studied for many years, the
thresholding of degraded document images is still an unsolved problem due to the high
inter/intra-variation between the text stroke and the document background across
different document images.
As illustrated in Fig. 1.1, the handwritten text within the degraded documents
often shows a certain amount of variation in terms of the stroke width, stroke brightness,
stroke connection, and document background. In addition, historical documents are often
degraded by the bleed through as illustrated in Fig. 1.1(a) and (c) where the ink of the
other side seeps through to the front. In addition, historical documents are often degraded
by different types of imaging artifacts as illustrated in Fig. 1.1(e). These different types of
document degradations tend to induce the document thresholding error and make
degraded document image binarization a big challenge to most state-of-the-art
techniques.
The recent Document Image Binarization Contest (DIBCO) held under the
framework of the International Conference on Document Analysis and Recognition
Department of E.C.E, MRITS
(1.1)
I max (i , j ) and
I min ( i, j )
denote the maximum and minimum intensities within a local neighborhood windows of
(i, j), respectively. If the local contrast C (i, j) is smaller than a threshold, the pixel is set
as background directly. Otherwise it will be classified into text or background by
comparing with the mean of
I max (i , j)
I min (i , j)
and
but cannot work properly on degraded document images with a complex document
background. We have earlier proposed a novel document image binarization method by
using the local image contrast that is evaluated as follows
C ( i , j )=
I max ( i, j ) I min (i , j)
I max ( i , j )+ I min (i , j ) +
(1.2)
Where is a positive but infinitely small number that is added in case the local
maximum is equal to 0. Compared with Bernsens contrast in Equation 1.1, the local
image contrast in Equation 1.2 introduces a normalization factor (the denominator) to
compensate the image variation within the document background. Take the text within
shaded document areas such as that in the sample document image in Figure 1.1(b) as an
example. The small image contrast around the text stroke edges in Equation 1.1 (resulting
CHAPTER 2
Department of E.C.E, MRITS
LITERATURE SURVEY
.
main source of noise in digital images arises during image acquisition (digitization) or
during image transmission. The performance of image sensor is affected by variety of
reasons such as environmental condition during image acquisition or by the quality of the
sensing element themselves.
Image noise styles may be divided differently according to different criterion. The
criterions include: the causes of image noises generation, the shape of the noise
amplitude distribution over time, noise spectrum shape and the relationship between
noise and signal, and so on. For example, image noise can be divided into additive noise
and multiplicative noise according to the relationship between noise and signal. There are
many types of image noise. Such as additive noise, multiplicative noise, salt and pepper
noise, Gaussian noise.
Image noise is random (not present in the object imaged) variation of brightness
or color information in images, and is usually an aspect of electronic noise. It can be
produced by the sensor and circuitry of a scanner or digital camera. Image noise can also
originate in film grain and in the unavoidable shot noise of an ideal photon detector.
Image noise is an undesirable by-product of image capture that adds spurious and
extraneous information. The original meaning of "noise" was and remains "unwanted
signal"; unwanted electrical fluctuations in signals received by AM radios caused audible
acoustic noise ("static"). By analogy unwanted electrical fluctuations themselves came to
be known as "noise". Image noise is, of course, inaudible.
The magnitude of image noise can range from almost imperceptible specks on a
digital photograph taken in good light, to optical and radio astronomical images that are
almost entirely noise, from which a small amount of information can be derived by
sophisticated processing (a noise level that would be totally unacceptable in a photograph
since it would be impossible to determine even what the subject was)
Department of E.C.E, MRITS
( zu) / 2
1
P ( z )=
e
2
(2.1)
Where z represents gray level, is the mean of average value of z , and is its
standard deviation. Salt and pepper noise refers to a wide variety of processes that result
in the same basic image degradation: only a few pixels are noisy, but they are very noisy.
The PDF of Salt and pepper noise is given by
Pa
P (z) = bo
for z = a
for z = b
0
(2.2)
otherwise
2.1 (a)
2.1 (b)
10
The signal and the noise are statistically independent of each other.
The sample mean and variance of a single pixel are equal to the mean and
variance of the local area that is centered on that pixel.
The Lee filter converts the multiplicative model into an additive one, thereby
reducing the problem of dealing with speckle noise to a known tractable case. The
speckle can also represent some useful information, particularly when it is linked to
the laser speckle and to the dynamic speckle phenomenon, where the changes of
the speckle pattern, in time, can be a measurement of the surface's activity.
2.2.3 Salt and Pepper Noise
Salt and pepper noise is a form of noise typically seen on images. It represents
itself as randomly occurring white and black pixels. An effective noise reduction method
for this type of noise involves the usage of a median filter, morphological filter or a contra
harmonic mean filter. Salt and pepper noise creeps into images in situations where quick
transients, such as faulty switching, take place. In this noisy image the white dot
represents the salt and black dot represent the pepper which are represented from 0 to 256
bits i.e. in binary form. The salt pixel is represented as all zero in the binary codes where
as for pepper they are represented as all ones in binary codes. Image noise is random (not
present in the object imaged) variation of brightness or color information in images, and
is usually an aspect of electronic noise. It can be produced by the sensor and circuitry of
a scanner or digital camera. Image noise can also originate in film grain and in the
unavoidable shot noise of an ideal photon detector. Image noise is an undesirable byproduct of image capture that adds spurious and extraneous information.
11
2.2 (a)
2.2(b)
Figure 2.2(a) Original image (b) Salt and pepper noise image
2.2.4 Gaussian Noise
Gaussian noise is statistical noise that has its probability density function equal to
that of the normal distribution, which is also known as the Gaussian distribution. In other
words, the values that the noise can take on are Gaussian-distributed. A special case
is white Gaussian noise, in which the values at any pair of times are identically
distributed and statistically
independent and
hence uncorrelated.
In
applications,
Gaussian noise is most commonly used as additive white noise to yield additive white
Gaussian noise.
The probability density function of a Gaussian random variable is given by:
(2.3)
Where
12
2.3 (b)
2.3 Thresholding
We categorize the thresholding methods in to groups according to the information
they are exploiting. These categories are:
1. Histogram shape-based methods, where, for example, the peaks, valleys and
curvatures of the smoothed histogram are analyzed
2. Clustering-based methods, where the gray-level samples are clustered in two parts
as background and foreground (object), or alternately are modeled as a mixture of
two Gaussians
3. Entropy-based methods result in algorithms that use the entropy of the foreground
and background regions, the cross-entropy between the original and binarized
image, etc.
4. Object attribute-based methods search a measure of similarity between the graylevel and the binarized images, such as fuzzy shape similarity, edge coincidence,
etc.
In the sequel, we use the following notation. The histogram and the probability mass
function (PMF) of the image are indicated, respectively, by h (g) and by p (g), g50...G,
where G is the maximum luminance value in the image, typically 255 if 8-bit
quantization is assumed. If the gray value range is not explicitly indicated as [gmin,
gmax], it will be assumed to extend from 0 to G. The cumulative probability function is
defined as
g
P ( g )= P(i)
(2.4)
i=0
It is assumed that the PMF is estimated from the histogram of the image by
normalizing it to the total number of samples. In the context of document processing, the
foreground (object) becomes the set of pixels with luminance values less than T, while
the background pixels have luminance value above this threshold.
13
Pf ( T )=Pf = p ( g ) , Pb (T )=P b=
g =0
p ( g)
g =T +1
(2.5)
14
(2.6)
In our work, we have found that g51 yields good results. Variations on this theme
are provided in Boukharouba, Rebordao, and Wendel, 36 where the cumulative
distribution of the image is first expanded in terms of Ts Chebyshev functions, followed
by the curvature analysis.
Tsai obtains a smoothed histogram via Gaussians, and the resulting histogram is
investigated for the presence of both valleys and sharp curvature points. We point out that
the curvature analysis becomes effective when the histogram has lost its bimodality due
to the excessive overlapping of class histograms.
In a similar vein, Carlotto and Olivo (ShapeOlivio) consider the multi-scale
analysis of the PMF and interpret its fingerprints, that is, the course of its zero crossings
and extreme over the scales. Using a discrete dyadic wavelet transform, one obtains the
multi-resolution analysis of the histogram, ps(g)5p(g)*cs(g), s51,2..., where p0(g)5p(g) is
the original normalized histogram. The threshold is defined as the valley (minimum)
point following the first peak in the smoothed histogram. This threshold position is
successively refined over the scales starting from the coarsest resolution.
Thus starting with the valley point T(k) at the kth coarse level, the position is
backtracked to the corresponding extreme in the higher resolution histograms p(k21)
15
16
17
{ [ ] } { [
T
TC ( T )=C b ( T )+ C f ( T )=log
g=0
p (g )
P(T )
log
g=T +1
p ( g)
1P(T )
]}
2
(2.7)
18
(2.8)
2.4 Binarization
Many binarization techniques that are used in processing tasks are aimed at
simplifying and unifying the image data at hand. The simplification is performed to
benefit the oncoming processing characteristics, such as computational load, algorithm
complexity and real-time requirements in industrial-like environments.
One of the key reasons when the binarization step fails to provide the subsequent
processing a high-quality data is caused by the different types and degrees of degradation
introduced to the source image.
19
20
21
in
the
object/
background
areas
ratio,
intensity
transition
slope,
22
23
24
25
26
27
(2.9)
Where m(x, y) and s(x, y) are as in Niblack's formula. R is the dynamic range of standard
deviation, and the parameter k gets positive values.
28
29
[]
f
Gx
f=
= x
f
Gy
y
[ ]
(2.10)
30
(2.11)
This quantity gives the maximum rate of increase of f (x, y) per unit distance in
the direction of f. It is a common (although not strictly correct) practice to refer to f
Department of E.C.E, MRITS
31
Gy
Gx
( )
(2.12)
Where the angle is measured with respect to the x-axis. The direction of an edge
at (x, y) is perpendicular to the direction of the gradient vector at that point. Computation
of the gradient of an image is based on obtaining the partial derivatives f/x and f/y at
every pixel location. Let the 3x3 area shown in below Figure. Represent the gray levels in
a neighborhood of an image. One of the simplest ways to implement a first-order partial
derivative at point Z5 is to use the following Roberts cross-gradient operators:
Gx =( z 9z 5)
G y =( z 8z 6)
2.10.1 PREWITT AND SOBEL MASKS:
These derivatives can be implemented for an entire image by using the masks
shown in Fig. 2.11(b). Masks of size 2 X 2 are awkward to implement because they
do not have a clear center. An approach using masks of size 3 X 3 is given by
32
Figure 2.11: A 3 X 3 region of an image (the zs are gray-level values) and various masks
used to compute the gradient at point labeled Z5.
G y =( z 9 + z 6 + z 3 )(z1 + z 4 + z 7 )
(2.13)
33
(2.14)
Prewitt
Figure 2.12 Prewitt masks for detecting diagonal edges
CHAPTER 3
ADAPTIVE IMAGE CONTRAST
This section describes the proposed document image binarization techniques.
Given a degraded document image, an adaptive contrast map is first constructed and the
text stroke edges are then detected through the combination of the binarized adaptive
contrast map and the canny edge map. The text is then segmented based on the local
threshold that is estimated from the detected text stroke edge pixels. Some postprocessing is further applied to improve the document binarization quality.
34
(3.1)
where C(i, j ) denotes the local contrast in Equation 3.1 and (Imax(i, j ) Imin(i,
j )) refers to the local image gradient that is normalized to [0, 1]. The local windows size
is set to 3 empirically. is the weight between local contrast and local gradient that is
controlled based on the document image statistical information.
Ideally, the image contrast will be assigned with a high weight (i.e. large ) when
the document image has significant intensity variation. So that the proposed binarization
technique depends more on the local image contrast that can capture the intensity
Department of E.C.E, MRITS
35
( )
(3.2)
Where Std denotes the document image intensity standard deviation, and is a predefined parameter. The power function has a nice property in that it monotonically and
smoothly increases from 0 to 1 and its shape can be easily controlled by different . can
be selected from [0,], where the power function becomes a linear function when = 1.
Therefore, the local image gradient will play the major role in Equation 3.1 when is
large and the local image contrast will play the major role when is small.
36
37
(3.3)
Where Emean and Estd are the mean and standard deviation of the intensity of the
detected text stroke edge pixels within a neighborhood window W, respectively. The
neighborhood window should be at least larger than the stroke width in order to contain
stroke edge pixels. So the size of the neighborhood window W can be set based on the
stroke width of the document image under study, EW, which can be estimated from the
detected stroke edges as stated in Algorithm 1. Since we do not need a precise stroke
width, we just calculate the most frequently distance between two adjacent edge pixels
(which denotes two sides edge of a stroke) in horizontal direction and use it as the
estimated stroke width.
First the edge image is scanned horizontally row by row and the edge pixel
candidates are selected as described in step 3. If the edge pixels, which are labeled 0
(background) and the pixels next to them are labeled to 1 (edge) in the edge map (Edg),
are correctly detected, they should have higher intensities than the following few pixels
(which should be the text stroke pixels).
So those improperly detected edge pixels are removed in step 4. In the remaining
edge pixels in the same row, the two adjacent edge pixels are likely the two sides of a
stroke, so these two adjacent edge pixels are matched to pairs and the distance between
them are calculated in step 5.
Algorithm 1 Edge Width Estimation
Require: The Input Document Image I and Corresponding Binary Text Stroke Edge
Image Edg
Ensure: The Estimated Text Stroke Edge Width EW
1. Get the width and height of I
2. for Each Row i = 1 to height in Edg do
3. Scan from left to right to find edge pixels that meet the following criteria:
a) its label is 0 (background);
Department of E.C.E, MRITS
38
3.4 Post-Processing
Once the initial binarization result is derived from Equation 3.3 as described in
previous subsections, the binarization result can be further improved by incorporating
certain domain knowledge as described in Algorithm 2. First, the isolated foreground
pixels that do not connect with other foreground pixels are filtered out to make the edge
pixel set precisely.
Second, the neighborhood pixel pair that lies on symmetric sides of a text stroke
edge pixel should belong to different classes (i.e., either the document background or the
foreground text). One pixel of the pixel pair is therefore labeled to the other category if
both of the two pixels belong to the same class. Finally, some single-pixel artifacts along
the text stroke boundaries are filtered out by using several logical operators as described
in [4].
Algorithm 2 Post-Processing Procedure
Require: The Input Document Image I , Initial Binary Result B and Corresponding
Binary Text Stroke Edge Image Edg
Department of E.C.E, MRITS
39
Find out all the connect components of the stroke edge pixels in Edg.
Remove those pixels that do not connect with other pixels.
for Each remaining edge pixels (i, j ): do
Get its neighborhood pairs: (i 1, j ) and (i + 1, j ); (i, j 1) and (i, j + 1)
if The pixels in the same pairs belong to the same class (both text or background)
then
6. Assign the pixel with lower intensity to foreground class (text), and the other to
background class.
7. end if
8. end for
9. Remove single-pixel artifacts along the text stroke boundaries after the document
thresholding.
10. Store the new binary result to B f.
40
(3.5)
(3.6)
Recall in this context is also referred to as the true positive rate or sensitivity, and
precision is also referred to as positive predictive value (PPV).
3.5.2. F-Measure
In statistical analysis of binary classification, the F1 score (also F-score or Fmeasure) is a measure of a test's accuracy. It considers both the precision p and
the recall r of the test to compute the score: p is the number of correct positive results
divided by the number of all positive results, and r is the number of correct positive
results divided by the number of positive results that should have been returned. The
F1 score can be interpreted as a weighted average of the precision and recall, where an
F1 score reaches its best value at 1 and worst at 0.
Department of E.C.E, MRITS
41
(3.7)
(3.8)
Taking into account the skeletonized ground truth image, we are able to
automatically measure the performance of any binarization algorithm in terms of recall.
P-Recall is defined as the percentage of the skeletonized ground truth image SG that is
detected in the resulting MxN binary image B. p-Recall is given by the following
equation:
(3.9)
(3.10)
3.5.4. PSNR
Department of E.C.E, MRITS
42
(3.12)
PSNR is a measure of how close is an image to another. Therefore, the higher the
value of PSNR, the higher the similarity of the two MxN images is. We consider that the
difference between foreground and background equals to C.
3.5.5. NRM
The negative rate metric NRM is based on the pixel-wise mismatches between the
GT and prediction. It combines the false negative rate NRFN and the false positive rate
NRFP. It is denoted as follows:
(3.13)
Where
and
(3.14)
NTP denotes the number of true positives, N FP denotes the number of false
positives, NTN denotes the number of true negatives, NFN denotes the number of false
negatives.
In contrast to F-Measure and PSNR, the binarization quality is better for lower
NRM.
3.5.6. Misclassification penalty metric (MPM)
The Misclassification penalty metric MPM evaluates the prediction against the
Ground Truth (GT) on an object-by-object basis. Misclassification pixels are penalized by
their distance from the ground truth objects border.
Department of E.C.E, MRITS
43
(3.15)
Where
(3.16)
CHAPTER 4
RESULTS & DISCUSSIONS
Table 1: Evaluation Results of the Dataset of DIBCO 2009
Methods
F-Measure
PF-measure
PSNR
NRM
MPM
DRD
Adaptive
BERN
SAUV
Proposed
66.22528
62.15014
66.95062
64.41689
0.89023
0.47091
0.91148
1.63160
9.401
6.753
9.431
12.4043
0.292
0.355
0.287
0.2839
0.052
0.121
0.054
0.0185
161.99
298.842
161.21
80.7279
44
45
CHAPTER 5
CONCLUSION & FUTURE SCOPE
5.1. Conclusion
In this work presents an adaptive image contrast based document image
binarization technique that is tolerant to different types of document degradation such as
uneven illumination and document smear. The proposed technique is simple and robust,
only few parameters are involved. Moreover, it works for different kinds of degraded
document images. The proposed technique makes use of the local image contrast that is
evaluated based on the local maximum and minimum. The proposed method has been
tested on the various datasets. Experiments show that the proposed method outperforms
most reported document binarization methods in term of the F-measure, pseudo Fmeasure, PSNR, NRM, MPM and DRD.
46
REFERENCES
[1] B. Gatos, K. Ntirogiannis, and I. Pratikakis, ICDAR 2009 document image
binarization contest (DIBCO 2009), in Proc. Int. Conf. Document Anal. Recognit., Jul.
2009, pp. 13751382.
[2] I. Pratikakis, B. Gatos, and K. Ntirogiannis, ICDAR 2011 document image
binarization contest (DIBCO 2011), in Proc. Int. Conf. Document Anal. Recognit., Sep.
2011, pp. 15061510.
[3] I. Pratikakis, B. Gatos, and K. Ntirogiannis, H-DIBCO 2010 hand- written document
image binarization competition, in Proc. Int. Conf. Frontiers Handwrit. Recognit., Nov.
2010, pp. 727732.
[4] S. Lu, B. Su, and C. L. Tan, Document image binarization using back- ground
estimation and stroke edges, Int. J. Document Anal. Recognit., vol. 13, no. 4, pp. 303
314, Dec. 2010.
47
48
APPENDIX A
SOFTWARE REQUIREMENT
The MATLAB
MATLAB is a high performance language for technical computing. It integrates
computation visualization and programming in an easy to use environment. MATLAB
stands for matrix laboratory. It was written originally to provide easy access to matrix
software developed by LINPACK (linear system package) and EISPACK (Eigen system
package) projects. MATLAB is therefore built on a foundation of sophisticated matrix
software in which the basic element is matrix that does not require pre dimensioning.
49
Algorithm development
Data acquisition
Features of MATLAB
50
MATLAB
Programming language
Graphics
2-D graphics
3-D graphics
Color and lighting
Animation
Computation
Linear algebra
Signal processing
Quadrature
Etc
External interface
Interface with C and
FORTRAN
Programs
Tool boxes
Signal processing
Image processing
Control systems
Neural Networks
Communications
Robust control
Statistics
Development Environment
51
52
Lay out the GUI: Layout Editor, we can lay out a GUI easily by clicking and
dragging GUI components -- such as panels, buttons, text fields, sliders, menus, and so
on -- into the layout area.
Program the GUI: GUIDE automatically generates an M-file that controls how
the GUI operates. The M-file initializes the GUI and contains a framework for all the
GUI callbacks -- the commands that are executed when a user clicks a GUI component.
Using the M-file editor, we can add code to the callbacks to perform the functions.
GUIDE stores a GUI in two files, which are generated the first time when we save or run
the GUI:
A FIG-file, with extension .fig, which contains a complete description of the GUI
layout and the components of the GUI: push buttons, menus, axes, and so on.
An M-file, with extension .m, which contains the code that controls the GUI,
including the callbacks for its components.
These two files correspond to the tasks of lying out and programming the GUI. When we
lay out of the GUI in the Layout Editor, our work is stored in the FIG-file. When we
program the GUI, our work is stored in the M-file.
The MATLAB Application Program Interface (API)
Department of E.C.E, MRITS
53
54
55
56