Академический Документы
Профессиональный Документы
Культура Документы
Submitted by
ABSTRACT
Inspired by the well-known iPhone app Word Lens, we intend to develop an
Android-platform based text translation application that is able to recognize the
text captured by a mobile phone camera, translate the text, and display the
translation result back onto the screen of the mobile phone at real time. Text
data present in images and video contain useful information for automatic
annotation, indexing, and structuring of images. We describe a system for
smartphones which detects and extracts text, and translates it into user-friendly
language. Our Optical Character Recognition (OCR) is based on Tesseract
OCR, and translation is done through Google Translate.
TABLE OF CONTENTS
CHAPTER NO.
NO.
TITLE
ABSTRACT
ACKNOWLEDGMENT
DECLARATION
1. INTRODUCTION
1.1 General
1.2 . . . . . . . . .
1.2.1 General
1.2.2.1 General
1.2.2.2. . .
1.2.2.3
1.3 . . .. . . .. . . . . . .
13
1.4 . . . . . . . . . . . . .
15
2. REQUIREMENTS
2.1
2.2 . .
19
2.2.
PAGE
ii
iii
iv
1
2
5
8
10
12
16
17
20
ACKNOWLEDGMENT
ii
We take this opportunity to express our deepest gratitude to those who have
generously helped us in providing the valuable knowledge and expertise during
the course of project. We express our sincere gratitude to Mr. Amit Kumar
Srivastava, for his thorough guidance and Dr.Shishir Kumar (H.O.D C.S.E) for
his efforts for providing us with this project. Finally, we would like to thank
each and every person who has contributed in any of the ways in the training.
DECLARATION
iii
Date: __________
SIDDHANT SHARMA
UTKARSH KUMAR AGARWAL
VIKASH CHANDRA
Certified that the above statement made by the student is correct to the best
of knowledge and belief.
CHAPTER 1 : INTRODUCTION
The motivation of a real time text translation mobile application is to help tourists navigate in
a foreign language environment. The application we developed enables the users to get text
translate as ease as a button click. The camera captures the text and returns the translated
result at real time.
The system we developed includes automatic text detection, OCR (optical character
recognition), text correction, and text translation. Although the current version of our
application would be limited to only selected languages, it can be easily extended into a much
wider range of language sets.
1.1. Text in images
A variety of approaches to text information extraction (TIE) from images and video have
been proposed for specific applications including page segmentation, license plate location,
and content-based image/video indexing . In spite of such extensive studies, it is still not easy
to design a general-purpose TIE system. This is because there are so many possible sources
of variation when extracting text from a shaded or textured background, or from images
having variations in font size, style, color, orientation, and alignment.
1.2. What is text information extraction (TIE)?
A TIE system receives an input in the form of a still image or a sequence of images (frames
of videos). The images can be in gray scale or colour, compressed or un-compressed, and the
text in the images may or may not move. The TIE problem can be divided into the following
sub-problems:
i.
ii.
iii.
iv.
v.
detection,
localization,
tracking,
extraction and enhancement,
recognition (OCR)
iv
OUTPUT
intensity values and returning them to the next stage for further processing. This stage uses
Region Based Methods for text localization. Region based methods use the properties of the
color or gray scale in a text region or their differences with the corresponding properties of
the background.
2.4. Segmentation
After the text is localized, the text segmentation step deals with the separation of the text
pixels from the background pixels. The output of this step is a binary image where black text
characters appear on a white background. This stage included extraction of actual text regions
by dividing pixels with similar properties into contours or segments and discarding the
redundant portions of frame.
2.5. Recognition
This stage work would be the final work of the project. This stage includes actual recognition
of extracted characters by combining various features extracted in previous stages to give
actual text with the help of a supervised neural network. In this stage, the output of the
segmentation stage is considered and the characters contained in the image are compared
with the pre-defined neural network training set and depending on the value of the character
appearing in the image, the character representing the closest training set value is displayed
as recognised character.
2.5.1 OPTICAL CHARACTER RECOGNITION via Tesseract
Tesseract is an open-source OCR engine developed at HP labs and now
maintained by Google. It is one of the most accurate open source OCR
engine with the ability to read a wide variety of image formats and
convert them to text in over 60 languages. The Tesseract library would be
used in our project. Tesseract algorithm would assume its input is a binary
image and would do its own preprocessing first followed by a recognition
stage.
10
Fig 3. DPI
11
3.2 Binarisation
This is converting an image to black and white. Tesseract does this internally, but it
can make mistakes, particularly if the page background is of uneven darkness.
Adaptive Thresholding is essential.
Fig 4. Binarisation
3.3 Noise
Noise is random variation of brightness or colour in an image, that can make the text
of the image more difficult to read. Certain types of noise cannot be removed by
Tesseract in the binarisation step, which can cause accuracy rates to drop.
12
Fig 5. Orientation
3.5 Borders
Scanned pages often have dark borders around them. These can be erroneously picked
up as extra characters, especially if they vary in shape and gradation.
3.6 Spaces between words
Fully justified text in narrow columns can have vastly varying spacing on different
lines.
13
CHAPTER 4 : PLATFORM/TECHNOLOGY
4.1. Operating System
Windows XP/Vista/7
4.2. Language
Java Platform, Standard Edition or Java SE is a widely used platform for programming in
the Java language. It is the Java Platform used to deploy portable applications for general
use. In practical terms, Java SE consists of a virtual machine, which must be used to run
Java programs, together with a set of libraries (or "packages") needed to allow the use of
file systems, networks, graphical interfaces, and so on, from within those programs.
MATLAB
Image Processing Toolbox of MATLAB provides a comprehensive set of referencestandard algorithms, functions, and apps for image processing, analysis, visualization, and
algorithm development. You can perform image analysis, image segmentation, image
enhancement, noise reduction, geometric transformations, and image registration.
14
CHAPTER 4 : APPLICATIONS
4.1. Applications
1. In the modern TV programs, there are more and more scrolling video texts which can
provide important information (i.e. latest news occurred) in the TV programs. Thus
this text can be extracted.
2. Extracting the number of a vehicle in textual form:
It can be used for tracking of a vehicle.
It can extract and recognize a number which is written in any format or font.
3. Extracting text from a presentation video.
4. Multi-language translator for reading symbols and traffic sign boards in foreign
countries.
5. Wearable or portable computers: with the rapid development of computer hardware
technology, wearable computers are now a reality (Google Goggles).
15
CHAPTER 5 : CONCLUSION
Problem statement of the project has been discussed and explained in the report. Potential
problems that could affect the result of translation has been discussed. Recognition would be
the final stage of the project which would be done after the full completion of pre-processing
of image.
We would implement the project considering English language; it can be further extended to
other languages. If enlarged in future implementations, it will largely improve the efficiency
of the algorithm.
16
BOOK:
An Introduction To Digital Image Processing With Matlab -Alasdair McAndrew