Вы находитесь на странице: 1из 4

Character Segmentation Techniques for Handwritten Text

- A Survey
Christopher E. Dunn and P. S. P. Wang
College of Computer Science
Northeastern University
Boston, MA 02166
USA

Abstract Another differentiation to make in handwritten data is


whether the input has been acquired by on-line or off-line
This paper is a survey of techniques for segmenting techniques. On-line data is acquired by any device that
images of handwritten text into individual characters. The uniquely orders the stroke information, and makes
topic is broken into two categories: segmentation and segmentation easier since characters are usually written in
segmentation-recognitiontechniques. Several approaches a sequential order of strokes; excluding the cross on a "t"
to each are outlined, and each is analyzed for its relevance and the dot on top of an "i" which tend to come at the end
to printed, cursive, on-line and off-line input data. of a word in cursive writing. Alternatively, off-line data
is typically scanned from a previously written text image.
If the actual stroke sequence is needed, then other
techniques must be used to infer this information.
1 Introduction Related to the above difficulty in separating characters,
is also the problem of eliminating connecting strokes and
In the character recognition, the process of segmenting tails of characters which are extraneous to the characters
the data has become more important as recognition themselves. Some segmentation techniques will attempt
techniques have improved. The unconstrained nature of to identify these ligatures as separate entities rather than
handwritten text has become the next hurdle to overcome part of the characters themselves. Finally, noise due to
[14,161. preprocessing techniques such as thresholding and
Segmentation is needed since handwritten character thinning of the input raster image must be eliminated.
frequently interfere with one another. Common ways in
which characters can interfere include: overlapping, 2 Straight Segmentation
touching, connected, and intersecting pairs (Figure 1) [8].
An additional problem with printed text is the occurrence Straight segmentation techniques are those which can
of broken characters such as multi-stroke characters, e.g. be used as an individual component in a text analysis
"5" and "t", as well as those that are broken by definition process, as opposed to integrated segmentation-recognition
e.g. "it' and "j". techniques which by design depend on the recognition
Overlapping Touching Connected Intersecting process. This type of segmentation is usually designed
with rules that attempt to identify only and all character
segmentation points. It is possible that any technique of
this sort may be integrated with the recognition process as
a verification of its success, but it usage does not depend
Figure 1 on it.
Cursive writing by definition is a connected sequence
of characters, hence all characters will need to be 2.1 Region Finding
segmented at one or more points. In addition to all the
other problems of interference is the fact that the One straight forward approach to segmentation of
representation of cursive text is inherently ambiguous. printed characters is to use a series of region finding,
For example, a "U" followed by an '5" is represented in a region grouping and splitting algorithms. Region finding
similar manner as "w". Compounding this ambiguity is is a simple technique that identifies all disjoint regions in
the imprecision with which writers may form letters such the image. We will assume that all the images discussed
and "m" and "n", where the top is concave as apposed to are unthinned, binary images unless otherwise stated. The
convex such that they appear to be "w" and "U" pixels are originally labeled ON/OFF, where ON signifies
respectively. the data areas. To find regions, the image is examined

511
0-8186-2915-0/92$3.00 Q 1992 IEEE
pixel by pixel until a ON value is found. Once found, it's 2.3 Splitting Regions
labeled with a new region number, and its neighbors are
searched for additional ON values. If a neighbor is ON,
then it is given the same label, and the search proceeds to Touching characters mean that some regions will
its neighbors. Thus the search proceeds recursively on the require splitting. This is typically done by first
neighbors of ON pixels until no ON neighbors are found. identifying the maxima and minima of a contour along the
The algorithm then returns to its search of the entire input bottom and top of the region respectively. The region is
image. The result is that all disjoint regions will be then split by describing a path from a minima point to a
identified and all pixels in any region will be labeled with maxima point, where the minima and maxima are aligned
a unique number. Region finding is sufficient to segment vertically within some threshold distance. If the two
characters which are overlapping [5,9]. If the size of each points are connected by a single solid area, then the
region is calculated then very small regions which are splitting path can be made by bisecting this region.
noise may be eliminated at this point. When the minima and maxima don't align properly, then
one must be chosen and a cut is made vertically through
the adjacent solid area [2,5]. An estimate of the average
2.2 Grouping Regions size of a single character can be used to identify which
regions are candidates for splitting.
In order to deal with broken characters or those which An alternative method for splitting regions has been
have separated parts, grouping procedures are applied. One tried on connected pairs of handwritten digits. This
simple grouping method is to calculate the smallest method first bisects the region with a straight horizontal
bounding box, that completely encloses each region. If line. Then the points at which the line crosses the
for any two regions the bounding box of one region regions' data are calculated. If an even number are found,
completely encloses another region, then the enclosed then the split is started at a point midway between the two
region is relabeled to the value of the enclosing region. middle crossing points. The split follows the middle of
Thus the resulting region is composed of two disjoint the non-data area in both an upward and downward
sub-regions. [7] This is helpful for connecting regions direction. If a data region is found during the split, then
that have been separated due to noise induced by it's cut vertically until a non-data area is again found. The
thresholding procedures which transform grey level images split finishes at the top and bottom of the regions'
to binary ones. An enhancement to this method is to bounding box. With digits there will be only one or two
allow a given percentage of the enclosed region to be crossings of the horizontal line, so the starting point will
outside the bounding box of the enclosing region. be accurate if there are an even number of crossings, and it
Besides a simple enclosure grouping, other regions is known that the region is comprised of two characters.
may be grouped by proximity operators. A proximity For an odd number of crossings, the roughness of the left
operator is set of bounding boxes where each box is and right profile of the region is inspected. Regions that
empirically determined based on relative position of have a rough profile indicate the number of crossings that
individual pen strokes in a character class. For example, the digit may have. This is used to decide the starting
the character "5" can be broken into two regions where point for an odd number of crossings. This splitting
one is the top horizontal line, and the second region is the technique can be considered a context directed technique
rest of the character. These two regions correspond to the since it relies on specific information about the character
two strokes of a pen which are typically made when domain, such as the roughness profiles and the number of
constructing the character. The character "5" is frequently vertical lines in digits. [12,13]
constructed such that the two strokes are not touching. A
proximity operator for the character class "5", would be a 2.4 Tail Removal
set of two bounding boxes, such that when overlaid with
the two regions of a broken "5" the regions would be
enclosed by the corresponding bounding boxes. Extraneous stroke tails may need to be eliminated from
Proximity operators can be constructed by overlaying regions since they may produce errors in the subsequent
several test images of a character class, then computing recognition process. This can be done by calculating the
the bounding box of the composite image of each sub- vertical density (vertical histogram) of pixels over the
region. Matching proximity operators can be done by input region. Tails will be indicated by low density
aligning the input regions and proximity operators via the sections connected to higher density sections. Heuristics
centroid of the two sets. As with the simpler grouping based upon the maximum likely tail length given a
procedure, the matching may be done on the basis of a character set, are then devised to truncate the identified
threshold, correspondingto the percentage of enclosed area tails [5].
for each region and bounding box pair. [5]

578
2.5 Presegmentation hypotheses. Cursive text requires this type of
segmentation due to the inherent ambiguity found when
letters are juxtaposed. This ambiguity derives primarily
An alternative to splitting regions as described above,
from the fact that script letters are connected, and, in
where the rules attempt to find only and all segmentation English script many similar strokes are shared among
points, is to use a simpler algorithm for finding all characters. This makes cursive text unsuitable for straight
"possible" segmentation points with the intention of segmentation techniques. Segmentation-recognition has
identifying which are the actual segmentation points at a
two common properties among its variants. First, the
latter time. This is refered to as presegmentation, and it input word is presegmented into segmentation hypotheses
allows two advantages. One is that most presegmentation such that it is highly probable that all the true
algorithms overestimate the possible segmentation points segmentation points between characters are accounted for.
so that points tend not to be missed as with other direct Second, subsets of segmentation hypotheses are searched
segmentation methods. The other, is that to find the optimal set of segmentation points, based on
presegmentation leads to schemes where evidence for the stroke or character recognition information.
confidence value of each segmentation hypothesis
(presegmentation point) is successively gained through 3.1 Elastic Matching
latter steps in the segmentation process. The result is a
more robust segmentation process, since several rules may One of the first types of segmentaton-recognition
act in concert to increase or decrease the confidence of a
strategies to be investigated was elastic matching of on-
segmentation point rather than a single rule or algorithm line script. [15] In this process text data is gathered via an
as was previously descrikd. on-line device and encoded as a series of descrete vectors of
Presegmentation may be accomplished in several ways. even length. The ends of the vectors are the
With printed characters there is a high probability that presegmentation points. Elastic matching compares the
connected characters will either intersect one another with vector sequence against the vector information of character
a four or three-way junction, or touch each other, with the prototypes. A distance metric is formulated to gauge the
strokes forming a high degree of curvature where they difference between the prototype characters and the input
meet. Presegmentation will consist of identifying all vector sequence. This metric takes into account the
such junctions, or points of high curvature. Notice that difference in relative position and slope of the vectors, and
the character "4"has both these types of features and will is summed over all matched vector pairs. It's termed
be presegmented with the understanding that later rules elastic matching since the prototype vectors are not
must throw out these segmentation hypotheses. In the required to be mapped in a one-to-one relationship with
case of junction points, standard algorithms for thining the input vectors. In the case of an input vector sequence
the data will have the byproduct that junction points are corresponding to one character, the elastic matching
identified. [9] Points of high curvature are found by process is a search to find the optimum mapping between
calculating the derivative of the slope of the input lines, the input and each character prototype based on the
then finding the minima or maxima outside of an distance metric. To optimize the search, a threshold is set
empirically determined threshold value. [4,10,113 such that distance of all partial mappings during the search
must be below the threshold or else they are pruned from
2.6 Rule Based Methods the search process. In general, since a mapping between a
character prototype and the input text may use only a
One way to verify presegmentation points is to portion of the input, then the remaining input may be
used to match against another character prototype. In this
develop heuristics based on the structure of characters in
the domain of interest and the possible ways they may way the input is mapped against all possible
interfere. This has been done for digits [8]. For example, combinations of prototype characters, and with all
if two three-way junctions are connected vertically by a possible segmentation points between the input vectors.
single line, then the line is said to be common to two Further optimization can be done by designing heuristics
digits. On the other hand, if the line connecting the three- based on the average size of an input character to reduce
way junctions is horizontal and near the top of the image, the number of mappings searched.
then it is said to be a connecting ligature. However, these
rule based heuristics may be difficult or impossible to 3.2 Presegmentation at Higher Structural Levels
design for very large sets of character classes.
Most other techniques for segmentation-recognition,
3 Segmentation-Recognition Techniques attempt to presegment the input at a higher structural
level. One such technique for cursive writing uses the
Unlike straight segmentation techniques, segmentation- minima along the input contour. First the input is slant
recognition techniques rely on character recognition corrected so that strokes are alligned vertically. Then two
methods to alter the confidence values of segmentation horizontal reference lines are found such that they allign

579
with the top and bottom of the small lowercase letters (eg. ambiguity of cursive text requires this type of
a,c,e,i,m,n,o,r,s,u,v,w,x). Next, the local minima of the segmentation,
contour are found which lie inside these reference lines.
This allows only the connecting strokes to be used as References
minima, since these allways fall within the region defined
by the reference lines. For each minima found, those R. M. Bozinovic and S . N. Srihari, Off-Line Cursive
which have an additional stroke directly above are thrown Script Word Recognition, IEEE PAMI, 11(1):68-83, Jan.
out. This eliminates presegmentation in the middle of 1989.
some letters (eg. a,c,o,s,x). The slant correction done E. Cohen, J. J. Hull and S . N. Srihari, Reading and
earlier allows this filtering to be performed more easily by Understanding Handwritten Addresses, Proc. USPS Adv.
now only searching vertically from the minima point. Tech. Conf., pp. 822-836, 1990.
Other points may be thrown out based on their proximity S . Edelman, T. Flash and S . Ullman, Reading Cursive
relative to each other, thus allowing only one Handwriting by Alignment of Letter Prototypes,
presegmentationpoint per connecting ligature. [1,4] International Journal of Computer Vision, 5(3):303-
After presegmentation, the text input is represented as a 331, 1990.
sequence of text sections. Letters are then hypothesized J. T. Favata and S . N. Srihari, Recognition of
over all subsequences of text sections via some Handwritten Words for Address Reading, Proc. USPS
recognition procedure. Then the possible letter sequences Adv. Tech. Conf, pp. 191-205, 1990.
are searched as above with a best-first strategy and pruning R. Fenrich, and S . Krishnamoorthy, Segmenting Diverse
of letter sequences whose cost value is above an Quality Handwritten Digit Strings in Near Real-Time,
Proc. USPS Adv. Tech. Conf., pp. 523-537, 1990.
empirically determined threshold. The above method has
been used for off-line cursive script. T. Fujisaki, T. E. Chefalas, J. Kim, C. C. Tappert and C.
G. Wolf, Online Run-on Character Recognizer: Design
Recognition of presegmented text is sometimes the and Performance, IEM Research Report, 1990.
most computationally intensive part of the overall text E. Mandler, Advanced Preprocessing Technique for On-
recognition process, while searching for the appropriate Line Recognition of Handprinted Symbols, Computer
combination of recognized segments is less expcnsive. In Recognition and Human Poduction of Handwriting, Eds.
this case, reducing the complexity of segment recogniLion R. Plamondon, C. Y. Suen and M. L. Simmer, World
will have the most effect to reducing the overall Scientific, pp. 19-36, 1989.
complexity. This is one motivation for presegmenting at B. T. Mitchell and A. M. Gillies, A Model-Based
the stroke level. This has been done with on-line printed Computer Vision System for Recognizing Handwritten
text [6],as well as with off-line cursive text [3]. On-line ZIP Codes, Machine Vision and Applications, 2:231-
printed text is naturally presegmented by the input process 243, 1989.
which includes pen-up and pen-down information. For T. Pavlidis, Algorithms for Graphics and Image
on-line script local minima and maxima, and junction Processing, Computer Science Press, Rockville MD, pp.
points can be used. The segments of curvature between 129-214, 1982.
presegmentation points are matched to a set of basic R. Plamondon and P. Yergeau, A System for the
stroke types via an aliignment procedure. The allignment Analysis and Synthesis of Handwriting, Proc. Int.
process yields a "goodness of fit" value between the Workshop on Frontiers in Handwriting Recognition,
Chateau de Bonas, France, pp. 167-179, 1991.
prototype stroke and the curve segment. Search is then
performed to find the optimum combination of strokes L. R. B. Schomaker and H. Teulings, A Handwriting
Recognition System Based on the Properties and
such that they form a sequence or characters. Architectures of the Human Motor System, Proc.
Int.Workshop on Frontiers in Handwriting Recognition,
4 Conclusions Chateau de Bonas, France, pp. 195-211, 1991.
M. Shridhar and A. Badreldin, Recognition of Isolated
and Simply Connected Handwritten Numerals, Pattern
Straight segmentation is the technique of forming rules Recognition, 19(1):1- 12, 1986.
to identify members of a character set without identifying
their specific classification. It is useful for printed 131 M. Shridhar and A. Badreldin, Context-Directed
Segmentation Algorithm for Handwritten Numerical
character sets but will not work for cursive text. The Strings, Image and Vision Computing, 5(1):3-9, Feb.
primary advantage of straight segmentation is that it 1987.
greatly reduces the complexity of search for a word 141 C. Y. Suen, Ed., Frontiers in Handwriting Recognition,
hypothesis since the character boundaries are pre- Concordia Univ. Press, Montreal 1990.
determined. However, this type of segmenlatim is 151 C. C. Tappert, Cursive Script Recognition by Elastic
subject to error even in the case of printed letters. Matching, IBM J. Res. Development, 26(6):765-771,
Segmentation-recognitionstrategies are more expensive Nov. 1982.
due to the increased complexity of search for finding P. S . P. Wang (Ed.), Character and Handwritting
optimum word hypotheses. However, the inherent Recognition - Expanding Frontiers, World Scientific,
Teaneck, NJ, 1991.

580

Вам также может понравиться