Вы находитесь на странице: 1из 58

CHAPTER ONE

INTRODUCTION
1.1 Project Overview :
This project will demonstrate a kind of editing of both image, text , and
voice technologies. The user will be able to output the text that is contained
in an image or written in the editor and read this text by using the speech
recognition. Also the ability of having an edited text in a file format of editing
and save this file in a specific place under the name of recent documents
that you got from this editor.
This project will explore these ideas by developing Optical Character
Recognition (OCR) software, and then demonstrating that software through
a basic implementation of a text to Speech conversion system . The system
will load an image of any type of format, extract the text founded in this
image , and then read this text and sore this edited text in a file. Also the
user can write or copy and paste a text on the editor directly.

1.2

Problem :
Because of the high speed of information technology in the world ,

there is a strong connection between technology and the other fields in our
life. Technology , software and hardware , are used in many places by
different age slides of the community, adults and children, but the main
problem is that there is a specific slide of people gets a difficulty in dealing
with technology. This slide is blind people. So our project came to help this
slide of community by making a conversion of edited text into speech to be
listened by the blind people.

Also the another aim of making our project is that there is many
images contained text which sometimes the user need it to his different
purpose. In this case , our project helps the user to get this text , contained
in an image , by using the technique of Optical Character Recognition (OCR).

1.3

Objectives :
A full realization of this concept would involve a few distinct steps :

To develop a text from an image by OCR system.


To develop text recognition software that can be gotten from an image

or even directory written into text editor system.


To develop a read the text contained in the text editor by using Speech

Recognition System.
To develop the above system to exist on a programmable OCR such
that it operates independently of an external computing source, and
interacts with its software inputs and outputs independently.
Such a system would be integrated in the users sources, use speakers

in the computer as output sources, and would issue control files to software
already installed in the computer. There are different significant factors to be
considered while designing both Optical Character Recognition and Text to
speech systems that will produce clear text and speech outputs.

1.4

Introduction To OCR :
The goal of Optical Character Recognition (OCR) is to classify optical

patterns (often contained in a digital image) corresponding to alphanumeric


or other characters. The process of OCR involves several steps including
segmentation, feature extraction, and classification. Each of these steps is a
field unto itself, and is described briefly here in the context of a Matlab
implementation of OCR.

1.5

Text-to-Speech Software :
A Text-To-Speech (TTS) recognition is computer based system that

should be able to read any text aloud, whether it was straight bring in the
computer by an operator or scanned and submitted to an Optical Character
Recognition system. In the context of TTS synthesis, it is very complicated to
record and accumulate all the words of the language.
So it is in effect more appropriate to define TTS as the automatic
production of speech by using the concept of grapheme and phonemes text
of the sentences to complete.

1.6

Project Methodologies :

1.6.1

OCR Methodology :

OCR software has been around as long as computers have to connect the
printed world with the electronic one. Traditional document imaging methods
use templates and algorithms in a two-dimensional environment to recognize
objects and patterns. OCR methods today recognize a spectrum of colors,
and they can distinguish between the background and the forefront in
documents. They de-skew, de-speckle and use 3-D image correction in order
to work with lower resolution images taken from mediums such as faxes, the
internet and cell phone cameras.
OCR software uses two different kinds of optical character recognition:
feature extraction and matrix matching. Feature extraction recognizes
shapes using statistical and mathematical techniques to detect edges,
corners and ridges in a text font to identify the letters in a word, sentence
and paragraph. OCR software achieves the best results when the image has
the following conditions:

Is a clean, straight image.

Uses a very distinguishable font such as Arial or Helvetica.

Uses black letters on a clear background for better results.

Has at least 300 dpi resolution.

However, these conditions are not always possible. The best OCR techniques
can still read words accurately in less ideal circumstances using matrix
matching.
One example of OCR is shown below. A portion of a scanned image of
text, borrowed from the web, is shown along with the corresponding (human
recognized) characters from that text.

Figure 1.1 : Scanned image of text and its corresponding recognized


representation.

1.6.2

Text to Speech Methodology :

A Text-To-Speech (TTS) recognition is computer based system that


should be able to read any text aloud, whether it was straight bring in the
computer by an operator or scanned and submitted to an Optical Character
Recognition system. In the context of TTS synthesis, it is very complicated to
record and accumulate all the words of the language. So it is in effect more
appropriate to define TTS as the automatic production of speech by using the
concept of grapheme and phonemes text of the sentences to complete.

Figure 1.2 : TTS System.

1.7

Speech Synthesis :
Synthesized speech can be created by concatenating part of recorded

speech which is stored in a database. The power of a speech synthesizer is


moderator by its similarity to the human being voice, and by its ability to be
understood. The mainly significant qualities of a speech synthesis system are
naturalness and Intelligibility.
Naturalness expresses how intimately the output sounds like human
speech, whereas intelligibility is the easiness with which the output is
understood. The perfect speech synthesizer is providing both natural and
intelligible speech hence speech synthesis systems usually try to maximize
both characteristics. There are different significant factors to be considered
while designing a Text to speech system that will produce clear speech.

Figure 1.3 : Flowchart of Text to Speech Recognition.

1.7.1

Text To Speech System :

TTS Synthesizer is a computer based system that should be


understand any text clearly whether it was establish in the computer by an
operator or scanned and submitted to an Optical Character Recognition
(OCR) system. The intention of a text to speech system is to convert an
random given wording into a speak waveform. Most important workings of
text to speech system are Text processing and Speech production. The two
primary methods for producing synthetic speech waveforms are
concatenative synthesis and formant synthesis. We are used Concatenative
synthesis for our TTS. Concatenative synthesis is stand on the concatenation
6

of piece of recorded words. Usually concatenative synthesis constructs the


most normal sounding synthesized words.

1.7.2

Speech Generation Component :

Given order of phonemes, the idea of the speech generation


component is to synthesize the acoustic waveform Speech generation has
been attempted by concatenating the recorded words . Recent state of art
language synthesis produces natural sounding speech by using huge amount
of speech pieces. Storage of huge number of pieces and their retrieval in real
time is feasible due to availability of cheap memory and computation power.
The problem related to the unit selection speech synthesis system are
consider in three things that are choice of unit size, generation of speech
database and criteria for selection of a unit.

1.7.3

Speech Synthesis Process :

This TTS system is able to read any written text. This procedure is
called text normalization, preprocessing and tokenization. In this system, we
have developed a phonetic based text to speech synthesis system. We can
improve the speech quality using matlab language . The following figure
shows the block diagram for TTs system .

Figure 1.4 : Block Diagram for Text to speech Synthesis.

Figure 1.5 : Flow chart for TTS with example.

1.8

Speech Synthesis Technology :


Research in the area of speech synthesis has been going on for

decades. As we found out with our research, numerous models and theories
exist for the best way implementing a speech synthesis system. Although the
models seemed intuitive from a high level perspective they quickly grew in
complexity as we got closer to implementation.

1.9 MATLAB Overview :


Matlab is widely used in all areas of applied mathematics, in education
and research at universities, and in the industry. Matlab stands for MATrix
LABoratory and the software is built up around vectors and matrices. This
makes the software particularly useful for linear algebra but matlab is also a
great tool for solving algebraic and differential equations and for numerical
integration. Matlab has powerful graphic tools and can produce nice pictures

in both 2D and 3D. It is also a programming language, and is one of the


easiest programming languages for writing mathematical programs. Matlab
also has some tool boxes useful for signal processing, image processing,
optimization, etc.
Matlab is a high-performance language for technical computing. It
integrates computation, visualization, and programming in an easy-to-use
environment where problems and solutions are expressed in familiar
mathematical notation. Typical uses include:

Math and computation


Algorithm development
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including Graphical User Interface
building.

Matlabis an interactive system whose basic data element is an array


that does not require dimensioning. This allows you to solve many technical
computing problems, especially those with matrix and vector formulations, in
a fraction of the time it would take to write a program in a scalar
noninteractive language such as C or Fortran.
Matlab was originally written to provide easy access to matrix software
developed by the LINPACK and EISPACK projects, which together represent
the state-of-the-art in software for matrix computation.
Matlab has evolved over a period of years with input from many users.
In university environments, it is the standard instructional tool for
introductory and advanced courses in mathematics, engineering, and
science. In industry, Matlab is the tool of choice for high-productivity
research, development, and analysis.

Matlab features a family of application-specific solutions called


toolboxes. Very important to most users of matlab, toolboxes allow you
to learn and apply specialized technology. Toolboxes are comprehensive
collections of matlab functions (M-files) that extend the matlab environment
to solve particular classes of problems. Areas in which toolboxes are
available include signal processing, control systems, neural networks, fuzzy
logic, wavelets, simulation, and many others.

1.10

History of Matlab :

Cleve Moler, the chairman of the computer science department at


the University of New Mexico, started developing matlab in the late 1970s.
He designed it to give his students access to LINPACK and EISPACK without
them having to learn Fortran. It soon spread to other universities and found a
strong audience within the applied mathematics community. Jack Little, an
engineer, was exposed to it during a visit Moler made to Stanford
University in 1983. Recognizing its commercial potential, he joined with
Moler and Steve Bangert. They rewrote matlab in C and
founded MathWorks in 1984 to continue its development. These rewritten
libraries were known as JACKPAC. In 2000, matlab was rewritten to use a
newer set of libraries for matrix manipulation, LAPACK.
Matlab was first adopted by researchers and practitioners in control
engineering, Little's specialty, but quickly spread to many other domains. It
is now also used in education, in particular the teaching of linear
algebra and numerical analysis, and is popular amongst scientists involved
in image processing.

1.11

SQL Server Overview :

Generically, any database management system (DBMS) that can


respond to queries from client machines formatted in the SQL language.
When capitalized, the term generally refers to either of two database
10

management products from Sybase and Microsoft. Both companies


offer client-server DBMS products called SQL Server.

1.12 The History of SQL Server :


IBM invented a computer language back in the 1970s designed
specifically for database queries called SEQUEL, which stood for Structured
English Query Language. Over time the language has been added to, so that
it is not just a language for queries but can also be used to build databases
and manage security of the database engine. IBM released SEQUEL into the
public domain, where it became known as SQL.

Because of this heritage you can pronounce it as "sequel" or spell it out


as "S-Q-L" when talking about it. Various versions of SQL are used in today's
database engines. Microsoft SQL Server uses a version called Transact-SQL.
Although you will use Transact-SQL in this book and learn the basics of the
language, the emphasis in this book is on installing, maintaining, and
connecting to SQL Server. Sams Publishing also has a book titled Teach
Yourself Transact-SQL in 21 Days, which has more details on the language
and its usage.
Microsoft initially developed SQL Server (a database product that
understands the SQL language) with Sybase Corporation for use on the IBM
OS/2 platform. When Microsoft and IBM split, Microsoft abandoned OS/2 in
favor of its new network operating system, Windows NT Advanced Server. At
that point, Microsoft decided to further develop the SQL Server engine for
Windows NT by itself. The resulting product was Microsoft SQL Server 4.2,
which was updated to 4.21. After Microsoft and Sybase parted ways, Sybase
further developed its database engine to run on Windows NT (Sybase System
10 and now System 11), and Microsoft developed SQL Server 6.0then SQL
11

Server 6.5, which also ran on top of Windows NT. SQL Server 7.0 now runs on
Windows NT as well as on Windows 95 and Windows 98.
Although you can run SQL Server 7.0 on a Windows 9x system, you do
not get all the functionality of SQL Server. When running it on the Windows
9x platform, you lose the capability to use multiple processors, Windows NT
security, NTFS (New Technology File System) volumes, and much more. We
strongly urge you to use SQL Server 7.0 on Windows NT rather than on
Windows 9x. Windows NT has other advantages as well. The NT platform is
designed to support multiple users. Windows 9x is not designed this way,
and your SQL Server performance degrades rapidly as you add more users.
SQL Server 7.0 is implemented as a service on either NT Workstation or
NT Server (which makes it run on the server side of Windows NT) and as an
application on Windows 95/98. The included utilities, such as the SQL Server
Enterprise Manager, operate from the client side of Windows NT Server or NT
Workstation. Of course, just like all other applications on Windows 9x, the
tools run as applications.
A service is an application NT can start when booting up that adds
functionality to the server side of NT. Services also have a generic application
programming interface (API) that can be controlled programmatically.
Threads originating from a service are automatically given a higher priority
than threads originating from an application.

1.13 SQL Server 2008 R2 :


Microsoft SQL Server 2008 R2 is the most advanced, trusted, and
scalable data
platform released to date. Building on the success of the original SQL Server
2008

12

release, SQL Server 2008 R2 has made an impact on organizations worldwide


with its groundbreaking capabilities, empowering end users through selfservice business intelligence (BI), bolstering efficiency and collaboration
between database administrators (DBAs) and application developers, and
scaling to accommodate the most demanding data workloads.
This chapter introduced the new SQL Server 2008 R2 features,
capabilities, and editions from a DBAs perspective. It also discusses why
Windows Server 2008 R2 is recommended as the underlying operating
system for deploying SQL Server 2008 R2. Last, SQL Server 2008.

CHAPTER TWO

PROJECT ANALYSIS
2.1 The Classification Process :

13

There are two steps in building a classifier, training and testing. These
steps can be broken down further into sub-steps :
1.

Training :

a. Pre-processing Processes the data so it is in a suitable form for use.


b. Feature extraction Reduce the amount of data by extracting relevant
informationUsually results in a vector of scalar values.
c. Model Estimation from the finite set of feature vectors, need to estimate
a model (usually statistical) for each class of the training data.
2. Testing :
a. Pre-processing.
b. Feature extraction (both same as above).
c. Classification Compare feature vectors to the various models and find
the closest match. One can use a distance measure.

Figure 2.1 : The pattern classification process.

2.2

OCR Pre-processing :

These are the pre-processing steps often performed in OCR :

Binarization Usually presented with a grayscale image, binarization is then simply a


matter of choosing a threshold value.
Morphological Operators Remove isolated specks and holes in characters, can use the
majority operator.
14

Segmentation Check connectivity of shapes, label, and isolate. Can use Matlab 6.1s

bwlabel and regionprops functions. Difficulties with characters that arent connected,
e.g. the letter i, a semicolon, or a colon (; or :).
Segmentation is by far the most important aspect of the pre-processing stage. It allows
the recognizer to extract features from each individual character. In the more complicated case of
handwritten text, the segmentation problem becomes much more difficult as letters tend to be
connected to each other.

2.3

OCR Feature extraction :

Given a segmented (isolated) character, the useful features for recognition are :
1. Moment based features :
Think of each character as a Notepad. The 2-D moments of the character are:

From the moments we can compute features like:


1. Total mass (number of pixels in a binarized character)
2. Centroid - Center of mass
3. Elliptical parameters
i. Eccentricity (ratio of major to minor axis)
ii. Orientation (angle of major axis)
4. Skewness
15

5. Kurtosis
6. Higher order moments
2. Hough and Chain code transform
3. Fourier transform and series

2.4

OCR - Model Estimation :

Given labeled sets of features for many characters, where the labels correspond to the particular
classes that the characters belong to, we wish to estimate a statistical model for each character
class. For example, suppose we compute two features for each realization of the characters 0
through 9. Plotting each character class as a function of the two features we have:

Figure 2.2 : Character classes plotted as a function of two features.

16

Figure 2.3 : Flowchart of recognizing words

The Optical Character Recognition deals with recognition of optically processed characters.
Reliably interpreting text from real-world photos is a challenging problem due to variations in
environmental factors even it becomes easier using the best open source OCR engine.

17

CHAPTER THREE

PROJECT DESIGN
The project Design with the GUI (Graphical
User Interface) :

Figure 3.1 : The main GUI of the project.

Load Image :

18

Figure 3.2 : Loading an image from computer into the application.

The matlab code :


[filename, pathname] =
uigetfile({'*.jpg';'*.bmp';'*.gif';'*.tif'}, 'Pick an
Image File');
if (filename==0)
warndlg('You did not selected any file ') ; % fille
is not selected
end
img=imread([pathname,filename]);
h = waitbar(0,'Please wait...');
steps = 100;
for step = 1:steps
% computations take place here
waitbar(step / steps)
end
close(h)
set(handles.btnConvert,'Enable','on');
set(handles.path,'Enable','on');
set(handles.imageInfo,'Enable','on');
set(handles.img_display,'Visible','on');
set(handles.text1,'String',[filename]);
set(handles.text1,'FontSize',14);
set(handles.path,'String',[pathname]);
axes(handles.img_display);
imagesc(img);
address = cat(2,pathname,filename);
imagen=imread(address);
% Show image
imshow(imagen);

19

Recognize Text :
"In Folder " letters_numbers

Figure 3.3 : Recognize text pattern.

Create Templates :
%CREATE TEMPLATES
%Letter
clc;
close all;
A=imread('letters_numbers\A.bmp');B=imread('letters_num
bers\B.bmp');
C=imread('letters_numbers\C.bmp');D=imread('letters_num
bers\D.bmp');
E=imread('letters_numbers\E.bmp');F=imread('letters_num
bers\F.bmp');
G=imread('letters_numbers\G.bmp');H=imread('letters_num
bers\H.bmp');
I=imread('letters_numbers\I.bmp');J=imread('letters_num
bers\J.bmp');
K=imread('letters_numbers\K.bmp');L=imread('letters_num
bers\L.bmp');
20

M=imread('letters_numbers\M.bmp');N=imread('letters_num
bers\N.bmp');
O=imread('letters_numbers\O.bmp');P=imread('letters_num
bers\P.bmp');
Q=imread('letters_numbers\Q.bmp');R=imread('letters_num
bers\R.bmp');
S=imread('letters_numbers\S.bmp');T=imread('letters_num
bers\T.bmp');
U=imread('letters_numbers\U.bmp');V=imread('letters_num
bers\V.bmp');
W=imread('letters_numbers\W.bmp');X=imread('letters_num
bers\X.bmp');
Y=imread('letters_numbers\Y.bmp');Z=imread('letters_num
bers\Z.bmp');
%lower case letters
a=imread('letters_numbers\a.png');b=imread('letters_num
bers\b.png');
c=imread('letters_numbers\c.png');d=imread('letters_num
bers\d.png');
e=imread('letters_numbers\e.png');f=imread('letters_num
bers\f.png');
g=imread('letters_numbers\g.png');h=imread('letters_num
bers\h.png');
i=imread('letters_numbers\i.png');j=imread('letters_num
bers\j.png');
k=imread('letters_numbers\k.png');l=imread('letters_num
bers\l.png');
m=imread('letters_numbers\m.png');n=imread('letters_num
bers\n.png');
o=imread('letters_numbers\o.png');p=imread('letters_num
bers\p.png');
q=imread('letters_numbers\q.png');r=imread('letters_num
bers\r.png');
s=imread('letters_numbers\s.png');t=imread('letters_num
bers\t.png');
u=imread('letters_numbers\u.png');v=imread('letters_num
bers\v.png');
w=imread('letters_numbers\w.png');x=imread('letters_num
bers\x.png');
21

y=imread('letters_numbers\y.png');z=imread('letters_num
bers\z.png');
%Number
one=imread('letters_numbers\1.bmp');
two=imread('letters_numbers\2.bmp');
three=imread('letters_numbers\3.bmp');four=imread('lett
ers_numbers\4.bmp');
five=imread('letters_numbers\5.bmp');
six=imread('letters_numbers\6.bmp');
seven=imread('letters_numbers\7.bmp');eight=imread('let
ters_numbers\8.bmp');
nine=imread('letters_numbers\9.bmp');
zero=imread('letters_numbers\0.bmp');
%*-*-*-*-*-*-*-*-*-*-*letter=[A B C D E F G H I J K L M...
N O P Q R S T U V W X Y Z];
number=[one two three four five...
six seven eight nine zero];
lowercase = [a b c d e f g h i j k ...
l m n o p q r s t u v w x y z];
character=[letter number lowercase];
templates=mat2cell(character,42,[24 24 24 24 24 24 24
...
24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24]);
save ('templates','templates')
clear all

22

Read Letter :
%function read_letter
function letter=read_letter(imagn,num_letras)
% Computes the correlation between template and input
image
% and its output is a string containing the letter.
% Size of 'imagn' must be 42 x 24 pixels
% Example:
% imagn=imread('D.bmp');
% letter=read_letter(imagn)
%load templates
global templates
comp=[ ];
for n=1:num_letras
sem=corr2(templates{1,n},imagn);
comp=[comp sem];
end

%pause(1)

vd=find(comp==max(comp));
%*-*-*-*-*-*-*-*-*-*-*-*-*if vd==1
letter='A';
elseif vd==2
letter='B';
elseif vd==3
letter='C';
elseif vd==4
letter='D';
elseif vd==5
letter='E';
elseif vd==6
letter='F';
elseif vd==7
23

letter='G';
elseif vd==8
letter='H';
elseif vd==9
letter='I';
elseif vd==10
letter='J';
elseif vd==11
letter='K';
elseif vd==12
letter='L';
elseif vd==13
letter='M';
elseif vd==14
letter='N';
elseif vd==15
letter='O';
elseif vd==16
letter='P';
elseif vd==17
letter='Q';
elseif vd==18
letter='R';
elseif vd==19
letter='S';
elseif vd==20
letter='T';
elseif vd==21
letter='U';
elseif vd==22
letter='V';
elseif vd==23
letter='W';
elseif vd==24
letter='X';
elseif vd==25
letter='Y';
elseif vd==26
letter='Z';
%*-*-*-*-*
24

elseif vd==27
letter='1';
elseif vd==28
letter='2';
elseif vd==29
letter='3';
elseif vd==30
letter='4';
elseif vd==31
letter='5';
elseif vd==32
letter='6';
elseif vd==33
letter='7';
elseif vd==34
letter='8';
elseif vd==35
letter='9';
elseif vd==36
letter='0';
%********
elseif vd==37
letter='a';
elseif vd==38
letter='b';
elseif vd==39
letter='c';
elseif vd==40
letter='d';
elseif vd==41
letter='e';
elseif vd==42
letter='f';
elseif vd==43
letter='g';
elseif vd==44
letter='h';
elseif vd==45
letter='i';
elseif vd==46
25

letter='j';
elseif vd==47
letter='k';
elseif vd==48
letter='l';
elseif vd==49
letter='m';
elseif vd==50
letter='n';
elseif vd==51
letter='o';
elseif vd==52
letter='p';
elseif vd==53
letter='q';
elseif vd==54
letter='r';
elseif vd==55
letter='s';
elseif vd==56
letter='t';
elseif vd==57
letter='u';
elseif vd==58
letter='v';
elseif vd==59
letter='w';
elseif vd==60
letter='x';
elseif vd==61
letter='y';
elseif vd==62
letter='z';
else
letter='l';
%*-*-*-*-*
End

26

Lettere crope :
%function letter_in_a_line
function [fl re space]=letter_crop(im_texto)
% Divide letters in lines
im_texto=clip(im_texto);
num_filas=size(im_texto,2);
%figure,imshow(im_texto); %title('line sent in the
function letter');
for s=1:num_filas
s;
sum_col = sum(im_texto(:,s));
if sum_col==0
k = 'true';
nm=im_texto(:,1:s-1); % First letter matrix
%figure,imshow(nm);
%title('first letter in the function
letter_in_a_line');
%pause(1);
rm=im_texto(:,s:end);% Remaining line matrix
%figure,imshow(rm);
%title('remaining letters in the function
letter_in_a_line');
%pause(1);
fl = clip(nm);
%pause(1);
re=clip(rm);
space = size(rm,2)-size(re,2);
%*-*-*Uncomment lines below to see the result*%subplot(2,1,1);imshow(fl);
%subplot(2,1,2);imshow(re);
break
else
fl=im_texto;%Only one line.
re=[ ];
space = 0;
end
end
function img_out=clip(img_in)
27

[f c]=find(img_in);
img_out=img_in(min(f):max(f),min(c):max(c));
Lines Crop :
function [fl re]=lines(im_texto)
% Divide text in lines
% im_texto->input image; fl->first line; re->remain
line
% Example:
% im_texto=imread('TEST_3.jpg');
% [fl re]=lines(im_texto);
% subplot(3,1,1);imshow(im_texto);title('INPUT IMAGE')
% subplot(3,1,2);imshow(fl);title('FIRST LINE')
% subplot(3,1,3);imshow(re);title('REMAIN LINES')
im_texto=clip(im_texto);
num_filas=size(im_texto,1);
for s=1:num_filas
if sum(im_texto(s,:))==0
nm=im_texto(1:s-1, :); % First line matrix
rm=im_texto(s:end, :);% Remain line matrix
fl = clip(nm);
re=clip(rm);
%*-*-*Uncomment lines below to see the result**-*-*%
subplot(2,1,1);imshow(fl);
%
subplot(2,1,2);imshow(re);
break
else
fl=im_texto;%Only one line.
re=[ ];
end
end
function img_out=clip(img_in)
[f c]=find(img_in);
img_out=img_in(min(f):max(f),min(c):max(c));%Crops
image

28

Figure 3.4 : Recognize text in the project.

% --- Executes on button press in btnConvert.


function btnConvert_Callback(hObject, eventdata,
handles)
% hObject
handle to btnConvert (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
% Convert to gray scale
pathname=get(handles.path,'String');
filename=get(handles.text1,'String');
address = cat(2,pathname,filename);
imagen=imread(address);
if size(imagen,3)==3 %RGB image
imagen=rgb2gray(imagen);
end
% Convert to BW
threshold = graythresh(imagen);
imagen =~im2bw(imagen,threshold);
% Remove all object containing fewer than 30 pixels
imagen = bwareaopen(imagen,30);
%Storage matrix word from image
29

%Storage matrix word from image


word=[ ];
text=[ ];
re=imagen;
text='';
% Load templates
load templates
global templates
% Compute the number of letters in template file
num_letras=size(templates,2);
while 1
%Fcn 'lines' separate lines in text
[fl re]=lines(re);
imgn=fl;
n=0;
%Uncomment line below to see lines one by one
%figure,imshow(fl);pause(2)
%-------------------------------------------------spacevector = [];
spaces betweeen

% to compute the total


% adjacent letter

rc = fl;
while 1
%Fcn 'letter_crop' separate letters in a line
[fc rc space]=letter_crop(rc); %fc = first
letter in the line
%rc = remaining
cropped line
%space = space
between the letter
%
cropped and
the next letter
%uncomment below line to see letters one by one
%figure,imshow(fc);pause(0.5)
img_r = imresize(fc,[42 24]);
%resize letter
so that correlation
%can be performed
30

n = n + 1;
spacevector(n)=space;
%Fcn 'read_letter' correlates the cropped letter
with the images
%given in the folder 'letters_numbers'
letter = read_letter(img_r,num_letras);
%letter concatenation
word = [word letter];
if isempty(rc)
more characters
break;
end
end

%breaks loop when there are no

%-------------------------------------------------%
max_space = max(spacevector);
no_spaces = 0;
for x= 1:n
%loop to introduce space at requisite
locations
if spacevector(x+no_spaces)> (0.75 * max_space)
no_spaces = no_spaces + 1;
for m = x:n
word(n+x-m+no_spaces)=word(n+xm+no_spaces-1);
end
word(x+no_spaces) = ' ';
spacevector = [0 spacevector];
end
end
%fprintf(fid,'%s\n',lower(word));%Write 'word' in
text file (lower)
%fprintf(fid,'%s\n',word);%Write 'word' in text
file (upper)
text = char(text, word);

31

% Clear 'word' variable


word=[ ];
%*When the sentences finish, breaks the loop
if isempty(re) %See variable 're' in Fcn 'lines'
break
end
end
h = waitbar(0,'Please wait...');
steps = 100;
for step = 1:steps
% computations take place here
waitbar(step / steps)
end
close(h)
set(handles.text2,'String',text);
set(handles.text2,'FontSize',24);
set(handles.Speak,'Enable','on');
guidata(hObject, handles);
Save to NotePad :

Figure 3.5 : Save to Notepad file format.

32

% --- Executes on button press in btnOpen.


function btnOpen_Callback(hObject, eventdata, handles)
% hObject
handle to btnOpen (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
value=get(handles.text2,'String');
setappdata(0,'txt',value)
file_fig();

Figure 3.6 : Saving a text file.

% --- Executes on button press in btnOk.


function btnOk_Callback(hObject, eventdata, handles)
% hObject
handle to btnOk (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
%Opens text.txt as file for write
fname=get(handles.edit_name,'String');
filename=strcat(fname,'.txt');
pathname=get(handles.edit_location,'String');
filepath=fullfile(pathname,filename);
if isequal(exist(filepath,'file'),2)
button = questdlg('file name already exist ', ...
'Warning','Override','Cancle','Cancle');
switch button
33

case 'Override'
fid = fopen(filepath, 'wt');
case 'Cancle'
return;
end
else
fid = fopen(filepath, 'wt');
end
h = waitbar(0,'Please wait...');
steps = 100;
for step = 1:steps
% computations take place here
waitbar(step / steps)
end
close(h)
%fprintf(fid,'%s\n',lower(word));%Write 'word' in text
file (lower)
txt=getappdata(0,'txt');
rmappdata(0,'txt');
nRows = size(txt, 1) ;
stxt='';
if nRows>1
for k=1:nRows
fprintf(fid,'%s\n',txt(k,:));%Write 'word' in
text file (upper)
stxt=strcat(stxt,32,txt(k,:),10);
end
else
fprintf(fid,'%s\n',txt);
stxt=txt;
end
fclose(fid);
date1=date;
decr=get(handles.edit_note,'String');
if strcmp(decr,'Write Note here ...')
decr=NaN;
end

34

%data1 = cell(1,6);
columns={'id','name_file','text','path_file','time','no
te'};
data1={handles.lastid fname stxt pathname date1 decr};
conn = database('dbFiles','sa','123');
insert(conn,'File_Data',columns,data1);
close(conn)
% Update handles structure
guidata(hObject, handles);
%Open 'text.txt' file
winopen(filepath)
close

Figure 3.7 : Edited text in a Notepad file format.

Load Text File :

Figure 3.8 : Loading a text file (Notepad file format).

% --- Executes on button press in load.


function load_Callback(hObject, eventdata, handles)
35

% hObject
handle to load (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
[filename,pathname] = uigetfile('*.txt;','select txt
file');
filepath=fullfile(pathname,filename);
h = waitbar(0,'Please wait...');
steps = 100;
for step = 1:steps
% computations take place here
waitbar(step / steps)
end
close(h);
%# preassign s to some large cell array
txt=cell(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen(filepath,'r');
tline = fgetl(fid);
while ischar(tline)
txt{lineCt} = tline;
lineCt = lineCt + 1;
%# grow s if necessary
if lineCt > sizS
txt = [txt;cell(10000,1)];
sizS = sizS + 10000;
end
tline = fgetl(fid);
end
%# remove empty entries in s
txt(lineCt:end) = [];
set(handles.text2,'String',txt)
set(handles.Speak,'Enable','on')
fclose(fid)
Loading file in edit tool :

36

Figure 3.9 : Loading a text of notepad file format in the edit tool.

Text To Speech

% --- Executes on button press in Speak.


function Speak_Callback(hObject, eventdata, handles)
% hObject
handle to Speak (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
text=get(handles.text2,'String');
nRows = size(text, 1) ;
if isempty(text)
text = 'Write something to speak';
end
try
NET.addAssembly('System.Speech');
Speaker =
System.Speech.Synthesis.SpeechSynthesizer;
for n=1:nRows
rwtxt=text(n,:);
if ~isa(rwtxt,'cell')
rwtxt = {rwtxt};
37

end
for k=1:length(rwtxt)
Speaker.Speak (rwtxt{k});
end
end
catch
warning(['Not working !!']);
end
Design DataBase (using SQL Srver 2008 R) :

Table Name : File_Data :

Figure 3.10 : File data.

Some Data in a Table :

Figure 3.11 : Some data in a database table.

Microsoft SQL Server ODBC in Matlab for Windows


:
38

Figure 3.12 : Database explorer in matlab.

List of Text in Database :

Figure 3.13 : List of text in the database.

On Opening Form :
39

% --- Executes just before list_files is made visible.


function list_files_OpeningFcn(hObject, eventdata,
handles, varargin)
% This function has no output args, see OutputFcn.
% hObject
handle to figure
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
% varargin
command line arguments to list_files (see
VARARGIN)
handles.edit=0;
conn = database('dbFiles','sa','123');
curs = exec(conn,['select * from File_Data']);
setdbprefs('DataReturnFormat','cellarray')
curs=fetch(curs);
a=curs.Data;
if ~isequal('No Data',a)
set(handles.listbox1,'String',a(:,2))
set(handles.listbox1,'Value',1)
set(handles.edit_id,'String',a(1,1))
set(handles.edit_name,'String',a(1,2))
set(handles.edit_date,'String',a(1,5))
set(handles.edit_location,'String',a(1,4))
set(handles.edit_text,'String',a(1,3))
if isempty(a(1,6))
set(handles.edit_note,'String','There is no
note');
else
set(handles.edit_note,'String',a(1,6));
end
end
% Choose default command line output for list_files
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);

Open File :
40

Figure 3.14 : Open file by using notepad file format.

% --- Executes on button press in btn_open.


function btn_open_Callback(hObject, eventdata, handles)
% hObject
handle to btn_open (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
id=get(handles.edit_id,'String');
if ~isempty(id)
fname=get(handles.edit_name,'String');
fname=strcat(fname,'.txt');
pathname=get(handles.edit_location,'String');
filepath=fullfile(pathname,fname);
txt=get(handles.edit_text,'String');
ee=exist(filepath{1},'file');
if isequal(ee,2)
winopen(filepath{1})
else
button = questdlg(['filse has been damged or change
it location. ',char(10),'What you want to do?'], ...
'Warning','Create','Delete','Cancle','Cancle');
switch button
case 'Create'
fid = fopen(filepath{1}, 'wt')
nRows = size(txt, 1) ;
41

for k=1:nRows
fprintf(fid,'%s\n',txt{k,:});%Write
'word' in text file (upper)
end
fclose(fid);
winopen(filepath{1});
case 'Delete'
button = questdlg(['Are you sure you want
to delete?'], ...
'Warning','OK','Cancle','Cancle');
switch button
case 'OK'
btn_del_Callback(hObject,
eventdata, handles);
case 'Cancle'
return;
end
case 'Cancle'
return;
end
end
end

Edit :

Figure 3.15 : Edited text in notepad file.

% --- Executes on button press in pushbutton5.


function btn_edit_Callback(hObject, eventdata, handles)
% hObject
handle to pushbutton5 (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
GUIDATA)

structure with handles and user data (see

42

id=get(handles.edit_id,'String');
if ~isempty(id)
handles.edit=handles.edit+1;
if handles.edit==1
set(handles.edit_note,'Enable','on')
set(handles.edit_note,'BackgroundColor',[1.0 1.0
1.0]);
else
handles.edit=0;
conn = database('dbFiles','sa','123');
edit_txt=get(handles.edit_note,'String');
if ~isequal(edit_txt,'There is no note')
whereclause=strcat('where id=',id);
update(conn,'File_Data',{'note'},
{edit_txt},whereclause)
set(handles.edit_note,'Enable','inactive')
set(handles.edit_note,'BackgroundColor',[0.961
0.976 0.992]);
helpdlg('You are Done update','Update')
else
set(handles.edit_note,'Enable','inactive')
set(handles.edit_note,'BackgroundColor',[0.961
0.976 0.992]);
end
end
% Update handles structure
guidata(hObject, handles);
end

Delete From Database :


% --- Executes on button press in btn_del.
function btn_del_Callback(hObject, eventdata, handles)
% hObject
handle to btn_del (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles structure with handles and user data (see
GUIDATA)
id=get(handles.edit_id,'String');
if ~isempty(id)
43

button = questdlg(['Are you sure you want to delete?'],


...
'Warning','OK','Cancle','Cancle');
switch button
case 'OK'
id=get(handles.edit_id,'String');
query=strcat('delete from File_Data
where id=',id);
conn =
database('dbFiles','sa','123');
curs = exec(conn,query{1});
curs = exec(conn,['select * from
File_Data']);
setdbprefs('DataReturnFormat','cellarray')
curs=fetch(curs);
a=curs.Data;
if ~isequal('No Data',a{1})
set(handles.listbox1,'String',a(:,2))
set(handles.listbox1,'Value',1)
set(handles.listbox1,'String',a(:,2))
set(handles.listbox1,'Value',1)
set(handles.edit_id,'String',a(1,1))
set(handles.edit_name,'String',a(1,2))
set(handles.edit_date,'String',a(1,5))
set(handles.edit_location,'String',a(1,4))
set(handles.edit_text,'String',a(1,3))
if isempty(a(1,6))
set(handles.edit_note,'String','There is no note');
else
set(handles.edit_note,'String',a(1,6));
end
else
44

set(handles.listbox1,'String','')
set(handles.edit_id,'String','')
set(handles.edit_name,'String','')
set(handles.edit_date,'String','')
set(handles.edit_location,'String','')
set(handles.edit_text,'String','')
set(handles.edit_note,'String','')
end
close(curs)
close(conn)
helpdlg('Delete if Done','Delete')
case 'Cancle'
return;
end end

List of files :

Figure 3.16 : List of files.

% --- Executes on button press in btn_speak.


function btn_speak_Callback(hObject, eventdata,
handles)
% hObject
handle to btn_speak (see GCBO)
45

% eventdata reserved - to be defined in a


future version of MATLAB
% handles
structure with handles and user
data (see GUIDATA)
text=get(handles.edit_text,'String');
if ~isempty(text)
value=get(handles.edit_text,'String');
setappdata(0,'text',value)
close()
ocr_gui()
end

Return to the main Form with the text


:

Figure 3.17 : Returning to the main form with the text .

46

function ocr_gui_OpeningFcn(hObject, eventdata,


handles, varargin)
% This function has no output args, see OutputFcn.
% hObject
handle to figure
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
% varargin
command line arguments to ocr_gui (see
VARARGIN)
% Choose default command line output for ocr_gui
handles.output = hObject;
text=getappdata(0,'text');
if ~isempty(text)
set(handles.text2,'String',text)
set(handles.Speak,'Enable','on')
rmappdata(0,'text');
end
% Update handles structure
guidata(hObject, handles);

47

CHAPTER FOUR

IMPLEMENTATION
4.1 Project Implementation :
1. Loading any image format (bmp, jpg, png etc )

Figure 4.1 : Loading an image into the program.


2. The image will load .

Figure 4.2 : Viewing the image in the program.


3. View the image information by clicking the button called Image Info.

48

Figure 4.3 : Viewing the image information.


4. Convert the image to grayscale and binarize it using the threshold
value (Otsu algorithm).
5. Page layout analysis. In this step we tried to identify the text zones
present in the image. So that only that portion is used for recognition
and rest of the region is left out.
6. Lines detection and removing.
7. Detection of text lines and words. Here we also need to take care of
different font sizes and small spaces between words.
8. Recognition of characters. This is the main algorithm of OCR; an image
of every character must be converted to appropriate character code.
Sometimes this algorithm produces several character codes for
uncertain images. For instance, recognition of the image of "I"
character can produce "I", "|" "1", "l" codes and the final character
code will be selected later.

49

9. Click Recognize Text to get the text

Figure 4.4 : Recognizing text.

Saving results to selected output format, for instance, searchable TXT file format.

.10

.And store (name, Text, Location, path and note) of txt file in database directly
.Cliking on Save to Notepad Will open form to insert name and location of the file (Browse)

50

.11

Figure 4.5 : Saving text in a notepad file.


12.

Click OK to open and save in a file.

If the file name is already in the location you select a message will show
.ask you if you want to override or cancel to rename the file

Figure 4.6 : Warning message of an exit file name .

51

Figure 4.7 : Opening a file in notepad.


Import text to be edited and read in the editor and to be converted into
.voice ( text-to-speech ) conversion

Figure 4.8 : The pattern classification process.


52

.13

: When you select the file ,the contents text well loaded in the edit text

Figure 4.9 : Loading the contents of the file into the edit text.
Using database to view the recent documents that have been saved by
.this program

53

.14

Figure 4.10 : Viewing the recent document using the database.

Open the text you have been saved in database in Notepad

.15

Figure 4.11 : Opening the text of notepad file using database.

54

.You can Edit the note

.16

Figure 4.12 : Editing in the notepad file.

55

Figure 4.13 : Updating the editing.


17.

You can click on speak to load the text in main form.

18.

Absolutely you can delete from the list.

Figure 4.14 : Warning message of deleting file from list.

56

Figure 4.15 : Delete done message.

Conclusion :
In this project, we discussed the topics relevant to the development of TTS systems. We
conducted MOS tests to evaluate the performance of speech synthesizer. This paper describes the
successful completion of a simple text to speech translation by simple matrix operations. Thus
this system is very easy and efficient to implement unlike other methods which involve many
complex algorithms and methods. The next step in improving this system would be
implementing some machine learning algorithms in order to support generalization.

Suggestions for Future Work :


A number of open problems must be solved to allow the development of a truly Image ,

text to speech conversion and recognition system. These problems suggest a variety of research
directions that need to be pursued to make such a system feasible.
First , we will add another feature to our project which is Speech to Text Conversion .
Second , Saving the audio files with different types of audio file formats ,WAV, MP3, VOX,
RAW,...etc, with the help of database programs. Third, opening an audio file and getting the
speech to text conversion of this file. Forth, making the application able to open text in different
text file formats , pdf , docx ,...etc. Fifth, Saving the text files with different types of text file
formats, pdf , docx,...etc, with the help of database programs.
Finally , we are interested to make our project more efficient and getting the use of
different slides of people of the community and spreading its features globally.

57

REFERENCES :
[1] S. D. Shirbahadurkar and D.S.Bormane Subjective and Spectrogram Analysis of Speech
Synthesizer for Marathi TTS Using Concatenative Synthesis. 2010 IEEE International
Conference on Recent Trends in Information, Telecommunication and Computing.
[2] Johnny Kanisha and G.Balakrishanan Speech Transaction for Blinds Using Speech-TextSpeechConversions Advances in Computer Science and Information Technology
Communications in Computer and Information Science Volume 131, 2011, pp 43-48
[3] Hamad, M. Arabic Text-To-Speech Synthesizer, Research and Development (SCOReD),
2011 IEEE Student Conference 9 978-1-4673-0099-5 ) on 19-20 Dec. 2011 409 - 414 .
[4] S.D.Shirbahadurkar and D.S.Bormane, (2009) Marathi Language Speech Synthesizer Using
Concatenative Synthesis Strategy (Spoken in Maharashtra, India), Second International
Conference on Machine Vision, pp. 181-185.
[5] http://code.google.com/p/tesseract-ocr/. Last accessed:
May 12, 2009.
[6] Md. Abul Hasnat, Muttakinur Rahman Chowdhury and Mumit Khan, "Integrating Bangla
script recognition support in Tesseract OCR", Proc. of the Conference on Language and
Technology 2009 (CLT09), Lahore, Pakistan, 2009.
[7] http://code.google.com/p/ocropus/. Last accessed: May
12, 2009.
[8] http://code.google.com/p/banglaocr/. Last accessed: May
12, 2009.
58

Вам также может понравиться