Академический Документы
Профессиональный Документы
Культура Документы
OCR is an abbreviation which stands for Optical Character Recognition. It has been an active subject of research since the early days of computers. Despite its age, it remains one of the most challenging and exciting areas of research in computer science. It has recently grown into a mature discipline !". OCR can be defined as the tas# of transforming text represented in the spatial form of graphical mar#s i.e., handwritten into its symbolic representation in a computer system. $he importance of the OCR emerges from the fact that a paper will become obsolete in the age of the digital computers. $he most of students in a lecture theatre, for example, will feel comfortable towards computer%written documents or notes rather than those handwritten, OCR provides a convenient way for a &uic# converting a handwritten text into a computer%typed text !".
1.2 Objective
1
$he objective of this wor# is to design and develop a new algorithm for detecting . recogni+ing a handwritten 'rabic letter and to display the corresponding character in the computer system.
1.3
et!odolo"ie# $ Tool#
/'$0'1 software will be used to manipulate the image of the letter. $he design and development processes will all ta#e place in /'$0'1 environment. Otsu2s method will be used in thresholding the image. $he proposed features will be extracted using built%in algorithms in /'$0'1. ' new algorithm will be developed to extract the ratio of the letter2s width to its height. $he letters will be clustered and classified manually into classes according to the extracted features. ' feed%forward neural networ# that consists only of one neuron will be used for each class to distinguish between letters in the same class. $he final numeric output of the neural networ# will be translated into a meaningful output that represents the recogni+ed letter. 3inally, an interface that interacts with the user will be implemented enabling the /'$0'1 to read the image directly.
C!+-ter 3 . De#i"n $
'rabic letters and a conceptual design for the system, flowcharts and algorithms developed and specified to achieve our objective.
APPENDI3 A/ contains tables for width to height ratios measured for all
'rabic letters.
APPENDI3 C/ represents the system code. APPENDI3 D/ contains tables for some measured values necessary for the
reader in order to explain some values used in the code.
CHAPTER T1O
3
*iter+ture Revie'
7ere comes a brief description of the literature related to the wor# in the area of the OCR. OCR basically depends on Digital Image 4rocessing for the manipulation of the image and extracting features, and on either 7idden /ar#ov /odels ,7//- or 'rtificial 5eural 5etwor#s ,'55- for the recognition. 1efore that, previous wor#s on OCR for 'rabic letters are discussed.
Digital image processing ,DI4- is a science in which a special #ind of signals #nown as images can be manipulated within a computer system in order to obtain and extract some information that has to do something with the objective,s- of the manipulation. It is a discipline of D;4 ,digital signal processing-. ' digital image is represented as a 8%dimensional matrix in the computer system. It is a function of 8 variables f,x,y- where x and y indicate the row and the column in which the pixel lies respectively. $he value of the function f,x,y- represents the gray level associated with the pixel ,x,y- and it varies from : ,blac#- to 8<< ,white- and the midway is a mixture of white and blac# ,and here comes the name gray%levels-. $he above description applies for the grayscale image. $he colored image is represented as a =%dimensional matrix where the =rd dimension indicates the R91 ,Red, 9reen, and 1lue- components. ' colored image can be visuali+ed as = images placed as layers) each image is a grayscale image with different variations from each other.
3igure ,8.8- ,3rom left to right- red, green, and blue components of the digital image in 8.!
,8.!$he thresholding process is histogram%based) it re&uires #nowing the histogram of the image to define the threshold that can divide the image into its object,s- and bac#ground. ' histogram of a digital image with gray levels in the range : 0%!" is the discrete function6
,8.87
where6 r#6 the #th gray level. n#6 the number of pixels in the image having gray level ,r#-.
3igure ,8.>- ' 8%dimensional image ,left- and its histogram ,right$he threshold value is computed using several mathematical methods. One of these methods is to compute the threshold from the image2s statistics ,the histogram- using Otsu2s method which is described as follows 8"6 (valuate the normali+ed histogram for the image, i.e., treat the histogram as a discrete probability density function as in6
,8.= ;uppose that a threshold ,#- is chosen such that C : is the set of pixels with levels :,!,8,*,#%!" . C! is the set of pixels with levels #,#?!,#?8,*,0%!". Choose the value of ,#- that maximi+es the between%class variance @18.
,8.>8
where6
,8.<-
,8.A-
,8.B,8.C-
,8.D $he value of ,#- is the threshold and it is a normali+ed value from :.: to !.:.
;imple regional descriptors include area, perimeter, compactness, mean, and median of the gray level. $opological descriptors are useful for global description of regions in the image plane. $opology is the study of properties of a figure that are unaffected by any deformation, as long as there is no tearing or joining of the figure =". $opological descriptors include the following6 i. ii. iii. where6 E=CH ,8.!:5umber of 7oles ,7-. 5umber of Connected Components ,C-. (uler 5umber ,(-.
3igure ,8.A- 'rtificial 5euron /odel $he neurons are usually arranged in layers to form a neural networ#. (ach layer of neurons receives outputs form the previous layer as inputs, calculates its outputs, and delivers them to the next layer.
11
3igure ,8.B- 0ayers of neurons $here are several activation functions used in the neural networ#s. ;ome of them are shown in figure 8.C below. $he choice of the activation function is based upon the purpose of the neural networ# design. $he chosen function must be able to classify given input pattern vectors into the desired classes ,which are called targets-.
3igure ,8.C- ;ome of the activation functions used in '55 5eural networ#s can be classified into architectures based on the type of the activation function used. One of the famous architectures is the perceptron which is a neuron with a hard limiting activation function.
algorithms developed to perform the learning process) each group of rules is applicable to certain structures of neural networ#s. $he concept of the learning is based upon the modification of the connecting weights so that the mean s&uare error ,the mean s&uare of the difference between the desired output . the actual output of the networ#- is as less as possible ,the goal is ideally +ero- and the relationship between the input and the output is estimated with more accuracy. $his process is similar to the curve fitting problems in numerical analysis where certain points in an n%dimensional space are given and a curve that passes through them is re&uired to be estimated. $he learning in which the targets are #nown is called supervised learning.
3igure ,8.D- 1loc# diagram for the supervised learning process $raining is the process of collecting as many as possible ,input pattern, output- pairs from the application domain where the neural networ# is intended to be implemented, and presenting them in an appropriate form to it so that its weights are adjusted according to the learning rule embedded within it to map the inputs to the desired outputs. 'n important terminology in the training is the epoch which is defined as a one pass through all the training data.
In this chapter an analysis for 'rabic letters2 shapes in terms of the topological descriptors described earlier is introduced. $his analysis is followed by the conceptual design of the system.
E H K N Q T WWX [ ^
F I L O R U Y \ _
G J M P S V Z ] `
$able ,=.8- Classification of the 'rabic letters into the proposed classes Class 8:8 8!!
14
0etters
T K _ F ` Z S J E Y R
^ M L P Q N \ I a ] V G [ U O WX H
It is obvious that from a simple investigation to the = descriptors C, 7, (" the class which the letter belongs to will be identified, the rest is to find out ,or to recogni+ewhich letter within the class ,this is valid for all classes except the 8 classes ! 8 %!" . = ! 8" since each one of them consists only of one letter as shown in the previous table-. $he ratio of the letter2s width to its height will be ta#en as an additional descriptor to obtain a >%dimensional pattern C, 7, (, R" where ,R- is the ratio mentioned recently. It will be assumed that each letter maintains its own ratio regardless of the handwriting. 3igure ,=.!- illustrates the concept of the ratio.
where
,=.!-
$he ratios were measured for all 'rabic letters typed by computer using = types of fonts6 $raditional, Courier, and $ransparent. It is believed that neural networ#s solve classification problems if they are linearly separable which implies that each 'rabic letter should have a range of ratios that do not intersect with other letters. ;o certain letters from each class were chosen as they satisfy this criterion ,see appendix '-. /easurements for mean, variance, and confidence interval were ta#en for the ratios of a !:: random samples of selected letters. $hese letters were extracted from b'rabic Database 4rojectc developed by students from ;udan dniversity for ;cience . $echnology. ;ome results were excluded as they generated extreme values. ;ome letters were excluded from their classes as they affected the confidence interval for other letters. 'nd as a result some classes will have only one letter ,as shown in tables ,=.=- to ,=.C--. $he red circles indicate which letters were selected.
$able ,=.=- ;ample of letters in class 8:8 0etter /ean eariance /in
J
:.CB=: :.!!!8 :.8::
16
S
!.:!>= :.!8A> :.:C
`
:.AA!= :.:>=8 :.:>=<
's the confidence intervals start to be narrower, the accuracy decreases. ;o the 8 letters ` and S were selected while the letter J was excluded. $able ,=.>- ;ample of letters in class =:= 0etter /ean eariance /in /ax 5o. of ;amples B:f confidence interval A:f confidence interval
M
!.=D8= :.<A8! :.=: =.8: D= :.C!:< g !.DB>!"
^
:.C:>C :.8!D< :.:=!= 8.:BAD C< :.<BBA g !.:=8:"
:.D!B= g !.CAB="
:.A!D= g :.DD:="
;ince until A:f accuracy the intervals for both letters M and ^ did not separate, it is better not to proceed further thus one letter was selected i.e. M. $he same procedure was done for class >:>. Only the letter P was selected ,see table ,=.<--.
17
$able ,=.<- ;ample of letters in class >:> 0etter /ean eariance /in /ax 5o. of ;amples B:f confidence interval A:f confidence interval
P
:.DC=A :.!!<A :.8AD8 !.B!CC CA :.CA>: g !.!:=8"
L
:.C>!: :.:B!8 :.!C8D !.=C!: CC :.BAB= g :.D!>B"
:.CC<D g !.:C!="
:.BC:C g :.D:!8"
$able ,=.A- ;ample of letters in class !:! 0etter /ean eariance /in /ax 5o. of ;amples DD.Cf confidence interval
I
!.A<!8 :.:B>B !.!8<: 8.8AD8 <B !.>!DA g !.CC8C"
]
!.::BC :.:!CD :.B=D! !.8<:: A! :.D>D8 g !.:AA>"
V
:.D=D: :.:=== :.<=<8 !.8AAB <A :.C=<C g !.:>88"
G
:.!ABD :.::!C :.:C>B :.8B:C AC :.!A8= g :.!B=<"
18
:.CC>8 g :.DD=C"
:.DBAB g !.:=CD"
:.CDA8 g :.DC!C"
:.DC=< g !.:=8!"
1oth letters G and I has a high accuracy separated confidence interval ,DD.Cf-. $he 8 letters ] and V had separated intervals with accuracy of B:f.
$able ,=.B- ;ample of letters in class 8!! 0etter /ean eariance /in /ax 5o. of ;amples B:f confidence interval
R
:.DA!< :.!<<< :.!:>8 !.<=== B< :.C::A g !.!88>"
Y
:.B!:C :.!::B :.:C!! !.>:D! CD :.A:AA g :.C!<:"
19
:.C=:! g !.:D8D"
:.A8<B g :.BD<D"
0etters R and Y had separated intervals with an accuracy of A:f. ;o any ratio for those 8 letters outside those regions ,intervals- will be excluded. $able ,=.C- ;ample of letters in class !!: 0etter /ean eariance /in /ax 5o. of ;amples D<f confidence interval
O
!.A>CA :.:A<8 !.!88> 8.88<C <A !.<8:C g !.BBA>"
U
:.C:8! :.:!<= :.A:8D !.:>CC <! :.BB8! g :.C=8!"
[
:.D>>C :.:8DB :.A<=! !.8<:: >! :.CCAA g !.::=:"
In class !!:, all letters were selected since separated confidence intervals with high accuracy for them were achieved ,see table ,=.C--.
$he OCR system is divided into = sub modules6 thresholding, description, and recognition as shown in figure ,=.=-.
21
22
3rom the label matrix, find the pixel list which is defined as the list of all pixels in the region 8".
3rom the pixel list, calculate the width and the height of the region. Calculate the ratio of width to height.
23
will be a scalar &uantity represents the letter. 1oth weight and bias values will be determined upon training.
3igure ,=.A- 5euron /odels used in '55 module6 linear ,left- and perceptron ,right-
5etwor# 5ame 6 class8:8 i55 $raining data 5etwor# $ype $argets /eaning of targets Data collected from previous graphs. !6 S $able ,=.!!- $raining data for class !!: 5etwor# 5ame 6 class!!: i55 $raining data 5etwor# $ype $argets /eaning of targets Data collected from previous graphs. 86 [ >?6 O 0inear One of the numbers ! 8 >?" !6 U 4erceptron One of the numbers : !" :6 `
$able ,=.!8- $raining data for class 8!! 5etwor# 5ame 6 class8!! i55 $raining data Data collected from previous graphs. !6 R 5etwor# $ype 4erceptron $argets One of the numbers : !" /eaning of targets :6 Y
25
26
$he minimum value of the mean s&uare error was :.!:88A and with this value only B.!f of the training samples were misclassified. ;ince the curve was stuc# at this value, it was decided to stop here and ta#e the values of the weight and bias.
$able ,=.!>- $raining results for class8:8i55 5etwor# 5ame 6 class8:8i55 /ax. 5umber of epochs /ean ;&uare (rror ,/;(;amples ;i+e ;amples misclassified (rror probability jeight 1ias !8 :.: D< : :.: !.=:A %!
27
$he mean s&uare error was expected to reach +ero since the neuron used was a perceptron and the input data was linearly separable i.e. ratios have separated regions for each letter in class 8:8. $able ,=.!<- $raining results for class!!:i55 5etwor# 5ame 6 class!!:i55 /ax. 5umber of epochs /ean ;&uare (rror ,/;(;amples ;i+e ;amples misclassified (rror probability jeight 1ias >:: :.:AC> >! : :.: =.8B<D %!.=BD=
28
In table ,=.!<- the value of the mean s&uare error obtained was less than :.! which ensures that probability of misclassification is +ero. ;o the values of the weight and bias were ta#en. $able ,=.!A- $raining results for class8!!i55 5etwor# 5ame 6 class8!!i55 /ax. 5umber of epochs /ean ;&uare (rror ,/;(;amples ;i+e ;amples misclassified (rror probability jeight 1ias !: :.: =8 : :.: !.!<C= %!
$he solution for class 8!! was achieved in only !: epochs because of the linear separation of the input data. 3igures ,=.B- to ,=.!:- show the performance value ,mean s&uare error- vs. number of epochs for each one of the class 8:8, !:!, !!:, and 8!! '55 respectively.
29
30
31
%.1 Im-lement+tion
In order to implement . run this OCR system, the developed /%files ,please refer to appendix C to see their names . code- need to be placed in /'$0'12s wor# directory ,version B.: or more-. $o run the system, bOCRc or bocrc is typed in the command window and a dialog box will appear as shown in 3igure ,>.!-. $hrough this dialog box the user can browse for his desired image file. Once bopenc is clic#ed, the OCR system is executed and the result of the recognition is printed in /'$0'12s command window.
3igure ,>.!- Dialog box that appears when running the system
32
3igure ,>.8- ;napshot of the system execution $able ,>.!- displays the message that will be displayed according to each 'rabic letter provided that the system recogni+ed it.
$able ,>.!- the messages that must be printed in correspondence to each letter letter message 'lif $eh $heh keem 7ah Dal $hal ;een letter message ;ad Dad $ah $hah laf 7aa jaw
G M P S V ] ` I
O R U Y H WX [
$he forms were collected. ;canners were used for data ac&uisition of images. Images of each letter were subjected to the system. $able ,>.8- shows the results obtained from the testing. 3igure ,>.=- displays the results graphically.
$able ,>.8- $est results 0etter /isclassified in the correct class ,f/isclassified in the correct ratio region ,f!
34
!>
M P S V ] ` I O R U Y H WX [
g g B 8C A8 8! 8 A !: =D == g g >8
35
3igure ,>.=- $est results It is clear that the recognition percentage varies along the letters. $he samples that the OCR system could not recogni+e can be classified into 8 categories6 i. ii. $hose which were not classified in the correct C, 7, (" class. $hose which were not recogni+ed by their ratios but classified successfully ,they had the right C, 7, (" class-. $he first category was due to handwriting errors such as6 imperfect holes, unnecessary additional holes, connected dots, disconnected components ,writing without raising the pen to write dots or bham+ac-) these errors are responsible for mista#ing a letter2s class for another. 3igure ,<.!- shows examples for handwritten letters which lead to this misclassification.
36
3igure ,<.!- 7andwritten letters which lead to misclassification $he second category of errors was because it was assumed that bthe letters maintains their ratios regardless of the handwritingc. $his assumption was proved from the results to be wrong but not completely) there were 8 letters which complied with this assumption, they are6
leaning letters with slope more than the usual) this leaning results in a value of ratio out of the expected region for a certain letter thus mista#ing the classification.
3igure ,<.8- (xamples for extremely leaning letters 9enerally there is a slight percentage of errors in which the following error message appears instead of misclassifying the input letter6
$his error is due to the variation of the in# level along the handwritten letter which results in disconnecting the existing components into several ones ,this disconnection happens after thresholding- and additional connected components emerge.
37
3igure ,<.=- 'n example of in# variation along the letter2s body ,left- and its effect in introducing more connected components after thresholding ,right-
$he variable manswer2 in the code ,please refer to the code in appendix C- is assigned a certain value according to the value of the variable mc2 which indicates the number of connected components. It is impossible to ta#e into account all possible values of mc2 which are infinite and writing commands such as the following commands6 if ,chh<- first second third fourth fifth" hD.4ixel0ist) end) if ,chhA- first second third fourth fifth sixth" hD.4ixel0ist) end) if ,chhB- first second third fourth fifth sixth seventh" hD.4ixel0ist) end) f *******.etc. $hat is because the field D.4ixel0ist contains a number of matrices in the form x y" e&ual to the number of connected components and with different lengths so it is impossible to assign it all to a single variable ,because this generates a compile%time error- unless each matrix in this field is assigned to a single variable as shown in the previous commands. ;ince the written code tests only if the variable mc2 is !, 8, =, or >, this error occurs only when c n >. $his error can be considered to belong to the first category of errors mentioned earlier as it only occurs when the number of connected components is greater than > or in other words, the letter belongs to other class than the correct one.
38
U Y H WX [
] ` I O R
G M P S V
'lthough the system can achieve its objective this way, OCR systems ,and any system that deals with human beings in general- are intended to be easy in terms of applicability and more convenient to the user so it is not comforting to force the user to comply with the system rather than forcing ourselves as system designers to comply with user2s needs. It can also be concluded that the ratio alone does not represent an enough classifier for 'rabic letters except in very few letters as discussed before.
39
In the positive side, it can be concluded that the objectives have been achieved partially) image type constraints and image resi+ing re&uirements were eliminated.
40
RE0ERENCE7
!" 4eter 1urrow, Arabic Handwriting Recognition, /aster of ;cience, ;chool of Informatics, dniversity of (dinburgh, 8::>. 8" Rafael C. 9on+ale+, Richard (. joods, ;teven 0. (ddins, Digital Image Processing using MATLAB, 8::>. =" Rafael C. 9on+ale+, Richard (. joods6 Digital Image Processing, 8nd edition, 8::8. >" 1ishop, Christopher6 eural etwor!s "or Pattern Recognition, Oxford, !DD<. <" 1en orpse, 4atric# van der ;magt6 An Introduction to 5ovember !DDA. eural etwor!s, Cth edition,
41