Вы находитесь на странице: 1из 33

Optical Character Recognition

Click to edit Master subtitle style The Final Presentation

4/11/12

Curtain Raiser
OCR -a Software of great utility.

It provides user with a facility of creating a meaning

full text document from a image with out actually typing it.

Provide functions for better image processing.

Allows the user to change the appeared result

according to requirement.

4/11/12

Idea Behind Developing the Tool


In our country where the technology has taken a

long time to reach the homes of commons, the record maintenance has always been done on paper.

With the advancement in technology the need of

such a software tool has now aroused which can help the gruesome task of conversion of written document in electronic or digital format. document digitizing and preservation to handwritten text recognition.
4/11/12

The applications of this technique range from

Practical applications
One can always write a document at some place

where he is not possessing a computer and rest assured that the document can be digitized without any further time investment. converted to the digital format without a manual operator actually typing the whole text

All the paperwork in the offices could be easily

4/11/12

Existing software system?


The existing software systems work on older methods

of artificial intelligence and are very less efficient.

Also many of the tools doesnt allow the user to give

more than just a single character, therefore debarring the user from recognizing the text. the task of Optical Character Recognition. They require the users to buy the complete hardware.

Some hardware dependent systems are dedicated for

4/11/12

We present a better tool!!!


The tool we bring for the user is a far more capable

software that provides the user with liberty of doing this mammoth task of digitizing a paper based document in a very friendly and easy manner.

Also the tool has a dedicated module for handwriting

recognition which makes it even more desirable in case the user wants a digitized document of his own handwritten text.

4/11/12

Construction Process

1. Information Gathering 2. Technology used

4/11/12

Information gathering
For this project the information we collected was mainly from the internet.
We searched for existing softwares on the internet. We discussed the project scope with the regular users

of word processors and people related to programming specially in field of artificial networks. we develop the project.

We discussed the problems we would encounter while

4/11/12

Technologies used
We developed the tool mainly using
Java concepts of Core, Java Swings IDE for Java programming- Netbeans. MySQL for back end database connectivity.

4/11/12

System Analysis Information flow representation


1. Use case Diagram 2. Activity Diagram 3. Class Diagram 4. Sequence Diagram

4/11/12

Use Case diagram

4/11/12

Activity diagram

4/11/12

Activity Diagram for handwritten users

4/11/12

Class diagram

4/11/12

Sequence diagram

4/11/12

Sequence diagram

4/11/12

Architecture Design

1. Architecture Behavioral Diagram 2. Modular Approach 3. Algorithm design for operation

4/11/12

Architectural behavior of the software

4/11/12

Description of the diagram


Firstly the image gets loaded in the initial module

from where it reaches the pixel extractor module in its original form i.e. in image format. equivalent array form of the image pixels. The image is converted into a grey scaled version of the input image. A corresponding array of the then grey scaled image. where the system evaluates each pixel of the input image and separates the pixels forming the text and the background. 4/11/12

The pixel extractor module then brings out the

This array subjected to the segmentation module

Continued
Each of the separate array so formed is fed to the

neural network where each pixel value forms an input node and at output nodes are those nodes which are obtained from the database. winning neuron from the output nodes.

The SOM then identifies the character and suggest the This output neuron called as the winning neuron

signifies a character and is sent to the text editor.

4/11/12

Modular approach
Modules used
a) Image loading/ processing module b) Pixel extractor c) Segmentation module d) Scanning e) Self Organizing map f) Conversion to text g) Spell checker h) Saving
4/11/12

Testing

1. Purpose of Testing 2. Test Cases

4/11/12

Purpose of Testing
Software testing an unavoidable step in software

quality assurance and quality control tasks.

Testing is a process of executing a program with the

intent of finding an error, eliminating errors to produce an error free software which meets the specification.

Its objective is to identify the faults as quickly as

possible after they occur and identify the cause of the fault so that the remedial steps can be taken.

It is important for making the project more robust.


4/11/12

Some test cases


Test case no 1.

Its expected outcome is of course the same characters

written in a text format ie.

The quick brown fox jumps over the lazy dog.


What we achieved was

The quick bl-n lx jumps oVer the la_ dog


4/11/12

Test cases continued


Test case no 2. is for the login for handwriting

recognition. As an input we gave username and corresponding password . she holds a valid account and must be denied entry if they dont have a account. the picture upload module. else
4/11/12

Expected outcome was that the user gets entry if he or

Output an authenticated user gets transported to

Test cases continued


If the person is not an authenticated

4/11/12

Test cases continued


Test case no 3. for spell check module

The input given in the text pane was a word with wrong spelling like brwn
Expected outcome was the suggestion of the word

wrong and all those which contain the letters b, r, w, n.

4/11/12

Test cases continued


The output found as

4/11/12

Test cases continued


Test case no 4. for opening the image.

Validaton-unless the input is taken as image no button should work. Expexted- Message should be displayed asking user to load the image first of all. Output- An alert appears asking user to open an image first.

4/11/12

Limitations of the software

4/11/12

limitations
The image should be of identifiable quality. The image should be in valid image file format ie file

formats like .jpg, .bmp, .png only are usable.


Hand-written document should be readable. For handwriting scanning the system needs to be

trained first by the users handwritten documents image. Only then the system would be able to recognize the input image. handwriting.

Image should not contain text contained in cursive Input image file should be aligned upside down. 4/11/12

Future Scope

Things that could be added at some later Point in time to enhance the functionality of the project

4/11/12

In future application can be enhanced.


Application will become more efficient to scan

character in a more time efficient manner.


Text editor will become more efficient so that it

automatically detects the wrong guess characters and correct it automatically

We hope to include a technology that would be able to

recognize cursive handwriting as well.

4/11/12

Вам также может понравиться