Вы находитесь на странице: 1из 4

International Journal of Computer Trends and Technology- volume4Issue3- 2013

Performance of English Character Recognition with and without Noise


Priya Sharma#1, Randhir Singh*2
Lecturer, Department of ECE SAI Polytechnic, Badhani, Punjab, India * Professor & Head, Department of ECE SSCET, Badhani Punjab, India
Abstract Character recognition has been one of the most interesting and challenging research areas in the recent years. Many researchers develop scripts based on the different approaches used for the design of character recognition system. This paper provides a survey and classification of various character recognition techniques and describes a technique for converting textual content from a paper document into machine readable form. KeywordsCR, Character Segmentation, Features. Recognition, Pre-processing,
#

them manually, for example, postmans manual processing for recognition and sorting of postal addresses and zip code. Character recognition systems translate such scanned images of printed, typewritten or handwritten documents into machine encoded text. This translated machine encoded text can be easily edited, searched and can be processed in many other ways according to requirements. It also requires tinny size for storage in comparison to scanned documents. 2) Online Character Recognition: The online mode of recognition is mostly used to recognize only handwritten characters. In this the handwriting is captured and stored in digital form via different means. Usually, a special pen is used in conjunction with an electronic surface. As the pen moves across the surface, the two- dimensional coordinates of successive points are represented as a function of time and are stored in order. Recently, due to increased use of handheld devices online handwritten recognition attracted attention of worldwide researchers. This online handwritten recognition aims to provide natural interface to users to type on screen by handwriting on a pad instead of by typing using keyboard. The online handwriting recognition has great potential to improve user and computer communication. In online handwriting recognition, it is very natural for the user to detect and correct misrecognized characters on the spot by verifying the recognition results as they appear. The user is encouraged to modify his writing style so as to improve recognition accuracy. Also, a machine can be trained to a particular users style. Samples of his misrecognized characters are stored to aid subsequent recognition. Thus both writer adaptation and machine adaptation is possible. II. HISTORY OF CHARACTER RECOGNITION

I. INTRODUCTION Character Recognition is the process of translating images of handwritten, typewritten, or printed text into a format understood by machines for the purpose of editing, indexing/searching, and a reduction in storage size [1]. Character Recognition is a field of research in pattern recognition, artificial intelligence and machine vision. A CR system enables you to take a book or a magazine article, feed it directly into an electronic computer file, and then edit the file using a word processor. All CR systems include an optical scanner for reading text, and sophisticated software for analyzing images. Most CR systems use a combination of hardware (specialized circuit boards) and software to recognize characters, although some inexpensive systems do it entirely through software. Advanced roman CR systems can read text in large variety of fonts, but they still have difficulty with handwritten text. Character recognition is mainly of two types online and offline. In online character recognition, data is captured during the writing process with the help of a special pen on electronic surface. In offline recognition, prewritten data generally written on a sheet of paper is scanned. 1) Offline Character Recognition: Generally all printed or type-written characters are classified in offline mode. Off-line handwritten character recognition refers to the process of recognizing characters in a document that have been scanned from a surface such as a sheet of paper and are stored digitally in gray scale format. The storage of scanned documents have to be bulky in size and many processing applications as searching for a content, editing, maintenance are either hard or impossible. Such documents require human beings to process

In this section we look at the history of CR [4, 5,7], its development, recognition methods, computer technologies, and the differences between humans and machines [2, 3, 6, 8, 9]. It is always fascinating to be able to find ways of enabling a computer to mimic human functions, like the ability to read, to write, to see things, and so on. CR research and development can be traced back to the early 1950s, when scientists tried to capture the images of characters and texts,

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 400

International Journal of Computer Trends and Technology- volume4Issue3- 2013


first by mechanical and optical means of rotating disks and photomultiplier, flying spot scanner with a cathode ray tube lens, followed by photocells and arrays of them. At first, the scanning operation was slow and one line of characters could be digitized at a time by moving the scanner or the paper medium. Subsequently, the inventions of drum and flatbed scanners arrived, which extended scanning to the full page. Then, advances in digital-integrated circuits brought photo-arrays with higher density, faster transports for documents and higher speed in scanning and digital conversions. These important improvements greatly accelerated the speed of character recognition and reduced the cost, and opened up the possibilities of processing a great variety of forms and documents. Throughout the 1960s and 1970s, new CR applications sprang up in retail businesses, banks, hospitals, post offices; insurance, railroad, and aircraft companies; newspaper publishers, and many other industries [4, 5].In parallel with these advances in hardware development, intensive research on character recognition was taking place in the research laboratories of both academic and industrial sectors [7, 8]. Although both recognition techniques and computers were not that powerful in the early days (1960s), CR machines tended to make lots of errors when the print quality was poor, caused either by wide variations in type fonts and roughness of the surface of the paper or by the cotton ribbons of the typewriters [6]. To make CR work efficiently and economically, there was a big push from CR manufacturers and suppliers toward the standardization of print fonts, paper, and ink qualities for CR applications. New fonts such as OCRA and OCRB were designed in the 1970s by the American National Standards Institute (ANSI) and the European Computer Manufacturers Association (ECMA), respectively. These special fonts were quickly adopted by the International Standards Organization (ISO) to facilitate the recognition process [4, 5, 7, 8]. As a result, very high recognition rates became achievable at high speed and at reasonable costs. Such accomplishments also brought better printing qualities of data and paper for practical applications. Actually, they completely revolutionalized the data input industry [7] and eliminated the jobs of thousands of keypunch operators who were doing the really mundane work of keying data into the computer. III. STEPS OF CHARACTER RECOGNITION The character recognition system involves many steps to completely recognize and produce machine encoded text. The computer actually recognizes the characters in the document through a revolutionizing technique called Character Recognition. The various phases involved in character recognition are termed as: Pre-processing, Segmentation, Feature extraction and Classification. The block diagram of proposed recognition system is shown in figure 1.

SCANNED DOCUMENT

PRE-PROCESSING

SEGMENTATION

FEATURE EXTRACTION

CLASSIFICATION
Figure 1: Block diagram of the character recognition system

(A) PRE-PROCESSING The image is taken and is converted to gray scale image. The gray scale image is then converted to binary image. This process is called Digitization of image. Practically any scanner is not perfect; the scanned image may have some noise. This noise may be due to some unnecessary details present in the image.

Figure 2: Digitized image The denoised image thus obtained is saved for further processing. Now, all the templates of the alphabets that are pre-designed are loaded into the system. (B) SEGMENTATION In segmentation, the position of the object i.e., the character in the image is found out and the size of the image is cropped to that of the template size. Segmentation can be external and

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 401

International Journal of Computer Trends and Technology- volume4Issue3- 2013


internal. External segmentation is the isolation of various writing units, such as paragraphs, sentences or words. In internal segmentation an image of sequence of characters is decomposed into sub-images of individual character. format. Figure 4.1 to figure 4.30 shows the results of our character recognition system. We evaluate the performance of system with or without noise.

Figure 3: Segmented image (C) FEATURE E XTRACTION We have used following listed features for our experiment. First three types of features namely zone density, projection histograms and 8-directional zone density features can be categorized as statistical features while fourth type to tenth type of features can be categorized as geometric features. On the basis of these types of features we have formed different combinations. The characters are classified using each of these feature vectors in neural network classifier. (D) CLASSIFICATION The classification is done by using the single MLPNN with Gradient descent with momentum and adaptive learning backpropagation algorithm. In hidden layer and output, the sigmoid activation function is used. The features computed are used for classification. In the present study, we have divided the features into three equal halves for training, validation and testing. We have used the extracted features separately for training, validation and testing as well as combined features obtained after appending. IV. RESULTS The experiment results are mainly based on two programs: the feature extraction and image recognition. For image recognition, the recognition rate is achieved by using the Character Recognition engine to recognize the 26 capital English letters images of different resolutions. The evaluation process includes two steps. First, we collected image features with the 26 capital English letters images with resolution of 256 256, and saved it to a model library which is in the MATLAB data file

Figure 4: Step of CR applied to input image

Figure 5: Results in notepad

Figure 6: Step of CR applied to input noisy image

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 402

International Journal of Computer Trends and Technology- volume4Issue3- 2013

Figure 5: Results in notepad

V. CONCLUSIONS A number of techniques that are used for character recognition have been discussed. The main research is currently going on in extending Character Recognition to all the popular native languages of India like Punjabi, Telugu, Tamil etc., Template matching method which is easy to implement due to its algorithmic simplicity and higher degree of flexibility to the change of recognition target classes. This paper gives the performance of a character recognition based on template matching. REFERENCES
[1] Krunal M. Patel and Amrut N. Patel, Approaches for Multi-Font/Size Character Recognition: A Review Quest International Multidisciplinary Research Journal Volume I , Issue II, December 2012. H. Bunke and P. S. P. Wang. Handbook of Character Recognition and Document Image Analysis. World Scientific Publishing, Singapore, 1997. S. Mori, H. Nishida, and H. Yamada. Optical Character Recognition, Wiley Interscience, New Jersey, 1999. Optical Character Recognition and the Years Ahead. The Business Press, Elmhurst, IL, 1969. Pas dauteur. Auerbach on Optical Character Recognition. Auerbach Publishers, Inc., Princeton, 1971. S. V. Rice, G. Nagy, and T. A. Nartker. Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Boston, 1999. H. F. Schantz. The History of OCR. Recognition Technologies Users Association, Boston, 1982. C. Y. Suen. Character recognition by computer and applications. In T. Y. Young and K. S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, Inc., Orlando, FL, 1986, pp. 569 586.

[2]

[3] [4] [5] [6]

[7] [8]

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 403