Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION
1.1 Project Overview :
This project will demonstrate a kind of editing of both image, text , and
voice technologies. The user will be able to output the text that is contained
in an image or written in the editor and read this text by using the speech
recognition. Also the ability of having an edited text in a file format of editing
and save this file in a specific place under the name of recent documents
that you got from this editor.
This project will explore these ideas by developing Optical Character
Recognition (OCR) software, and then demonstrating that software through
a basic implementation of a text to Speech conversion system . The system
will load an image of any type of format, extract the text founded in this
image , and then read this text and sore this edited text in a file. Also the
user can write or copy and paste a text on the editor directly.
1.2
Problem :
Because of the high speed of information technology in the world ,
there is a strong connection between technology and the other fields in our
life. Technology , software and hardware , are used in many places by
different age slides of the community, adults and children, but the main
problem is that there is a specific slide of people gets a difficulty in dealing
with technology. This slide is blind people. So our project came to help this
slide of community by making a conversion of edited text into speech to be
listened by the blind people.
Also the another aim of making our project is that there is many
images contained text which sometimes the user need it to his different
purpose. In this case , our project helps the user to get this text , contained
in an image , by using the technique of Optical Character Recognition (OCR).
1.3
Objectives :
A full realization of this concept would involve a few distinct steps :
Recognition System.
To develop the above system to exist on a programmable OCR such
that it operates independently of an external computing source, and
interacts with its software inputs and outputs independently.
Such a system would be integrated in the users sources, use speakers
in the computer as output sources, and would issue control files to software
already installed in the computer. There are different significant factors to be
considered while designing both Optical Character Recognition and Text to
speech systems that will produce clear text and speech outputs.
1.4
Introduction To OCR :
The goal of Optical Character Recognition (OCR) is to classify optical
1.5
Text-to-Speech Software :
A Text-To-Speech (TTS) recognition is computer based system that
should be able to read any text aloud, whether it was straight bring in the
computer by an operator or scanned and submitted to an Optical Character
Recognition system. In the context of TTS synthesis, it is very complicated to
record and accumulate all the words of the language.
So it is in effect more appropriate to define TTS as the automatic
production of speech by using the concept of grapheme and phonemes text
of the sentences to complete.
1.6
Project Methodologies :
1.6.1
OCR Methodology :
OCR software has been around as long as computers have to connect the
printed world with the electronic one. Traditional document imaging methods
use templates and algorithms in a two-dimensional environment to recognize
objects and patterns. OCR methods today recognize a spectrum of colors,
and they can distinguish between the background and the forefront in
documents. They de-skew, de-speckle and use 3-D image correction in order
to work with lower resolution images taken from mediums such as faxes, the
internet and cell phone cameras.
OCR software uses two different kinds of optical character recognition:
feature extraction and matrix matching. Feature extraction recognizes
shapes using statistical and mathematical techniques to detect edges,
corners and ridges in a text font to identify the letters in a word, sentence
and paragraph. OCR software achieves the best results when the image has
the following conditions:
However, these conditions are not always possible. The best OCR techniques
can still read words accurately in less ideal circumstances using matrix
matching.
One example of OCR is shown below. A portion of a scanned image of
text, borrowed from the web, is shown along with the corresponding (human
recognized) characters from that text.
1.6.2
1.7
Speech Synthesis :
Synthesized speech can be created by concatenating part of recorded
1.7.1
1.7.2
1.7.3
This TTS system is able to read any written text. This procedure is
called text normalization, preprocessing and tokenization. In this system, we
have developed a phonetic based text to speech synthesis system. We can
improve the speech quality using matlab language . The following figure
shows the block diagram for TTs system .
1.8
decades. As we found out with our research, numerous models and theories
exist for the best way implementing a speech synthesis system. Although the
models seemed intuitive from a high level perspective they quickly grew in
complexity as we got closer to implementation.
1.10
History of Matlab :
1.11
Server 6.5, which also ran on top of Windows NT. SQL Server 7.0 now runs on
Windows NT as well as on Windows 95 and Windows 98.
Although you can run SQL Server 7.0 on a Windows 9x system, you do
not get all the functionality of SQL Server. When running it on the Windows
9x platform, you lose the capability to use multiple processors, Windows NT
security, NTFS (New Technology File System) volumes, and much more. We
strongly urge you to use SQL Server 7.0 on Windows NT rather than on
Windows 9x. Windows NT has other advantages as well. The NT platform is
designed to support multiple users. Windows 9x is not designed this way,
and your SQL Server performance degrades rapidly as you add more users.
SQL Server 7.0 is implemented as a service on either NT Workstation or
NT Server (which makes it run on the server side of Windows NT) and as an
application on Windows 95/98. The included utilities, such as the SQL Server
Enterprise Manager, operate from the client side of Windows NT Server or NT
Workstation. Of course, just like all other applications on Windows 9x, the
tools run as applications.
A service is an application NT can start when booting up that adds
functionality to the server side of NT. Services also have a generic application
programming interface (API) that can be controlled programmatically.
Threads originating from a service are automatically given a higher priority
than threads originating from an application.
12
CHAPTER TWO
PROJECT ANALYSIS
2.1 The Classification Process :
13
There are two steps in building a classifier, training and testing. These
steps can be broken down further into sub-steps :
1.
Training :
2.2
OCR Pre-processing :
Segmentation Check connectivity of shapes, label, and isolate. Can use Matlab 6.1s
bwlabel and regionprops functions. Difficulties with characters that arent connected,
e.g. the letter i, a semicolon, or a colon (; or :).
Segmentation is by far the most important aspect of the pre-processing stage. It allows
the recognizer to extract features from each individual character. In the more complicated case of
handwritten text, the segmentation problem becomes much more difficult as letters tend to be
connected to each other.
2.3
Given a segmented (isolated) character, the useful features for recognition are :
1. Moment based features :
Think of each character as a Notepad. The 2-D moments of the character are:
5. Kurtosis
6. Higher order moments
2. Hough and Chain code transform
3. Fourier transform and series
2.4
Given labeled sets of features for many characters, where the labels correspond to the particular
classes that the characters belong to, we wish to estimate a statistical model for each character
class. For example, suppose we compute two features for each realization of the characters 0
through 9. Plotting each character class as a function of the two features we have:
16
The Optical Character Recognition deals with recognition of optically processed characters.
Reliably interpreting text from real-world photos is a challenging problem due to variations in
environmental factors even it becomes easier using the best open source OCR engine.
17
CHAPTER THREE
PROJECT DESIGN
The project Design with the GUI (Graphical
User Interface) :
Load Image :
18
19
Recognize Text :
"In Folder " letters_numbers
Create Templates :
%CREATE TEMPLATES
%Letter
clc;
close all;
A=imread('letters_numbers\A.bmp');B=imread('letters_num
bers\B.bmp');
C=imread('letters_numbers\C.bmp');D=imread('letters_num
bers\D.bmp');
E=imread('letters_numbers\E.bmp');F=imread('letters_num
bers\F.bmp');
G=imread('letters_numbers\G.bmp');H=imread('letters_num
bers\H.bmp');
I=imread('letters_numbers\I.bmp');J=imread('letters_num
bers\J.bmp');
K=imread('letters_numbers\K.bmp');L=imread('letters_num
bers\L.bmp');
20
M=imread('letters_numbers\M.bmp');N=imread('letters_num
bers\N.bmp');
O=imread('letters_numbers\O.bmp');P=imread('letters_num
bers\P.bmp');
Q=imread('letters_numbers\Q.bmp');R=imread('letters_num
bers\R.bmp');
S=imread('letters_numbers\S.bmp');T=imread('letters_num
bers\T.bmp');
U=imread('letters_numbers\U.bmp');V=imread('letters_num
bers\V.bmp');
W=imread('letters_numbers\W.bmp');X=imread('letters_num
bers\X.bmp');
Y=imread('letters_numbers\Y.bmp');Z=imread('letters_num
bers\Z.bmp');
%lower case letters
a=imread('letters_numbers\a.png');b=imread('letters_num
bers\b.png');
c=imread('letters_numbers\c.png');d=imread('letters_num
bers\d.png');
e=imread('letters_numbers\e.png');f=imread('letters_num
bers\f.png');
g=imread('letters_numbers\g.png');h=imread('letters_num
bers\h.png');
i=imread('letters_numbers\i.png');j=imread('letters_num
bers\j.png');
k=imread('letters_numbers\k.png');l=imread('letters_num
bers\l.png');
m=imread('letters_numbers\m.png');n=imread('letters_num
bers\n.png');
o=imread('letters_numbers\o.png');p=imread('letters_num
bers\p.png');
q=imread('letters_numbers\q.png');r=imread('letters_num
bers\r.png');
s=imread('letters_numbers\s.png');t=imread('letters_num
bers\t.png');
u=imread('letters_numbers\u.png');v=imread('letters_num
bers\v.png');
w=imread('letters_numbers\w.png');x=imread('letters_num
bers\x.png');
21
y=imread('letters_numbers\y.png');z=imread('letters_num
bers\z.png');
%Number
one=imread('letters_numbers\1.bmp');
two=imread('letters_numbers\2.bmp');
three=imread('letters_numbers\3.bmp');four=imread('lett
ers_numbers\4.bmp');
five=imread('letters_numbers\5.bmp');
six=imread('letters_numbers\6.bmp');
seven=imread('letters_numbers\7.bmp');eight=imread('let
ters_numbers\8.bmp');
nine=imread('letters_numbers\9.bmp');
zero=imread('letters_numbers\0.bmp');
%*-*-*-*-*-*-*-*-*-*-*letter=[A B C D E F G H I J K L M...
N O P Q R S T U V W X Y Z];
number=[one two three four five...
six seven eight nine zero];
lowercase = [a b c d e f g h i j k ...
l m n o p q r s t u v w x y z];
character=[letter number lowercase];
templates=mat2cell(character,42,[24 24 24 24 24 24 24
...
24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 ...
24 24]);
save ('templates','templates')
clear all
22
Read Letter :
%function read_letter
function letter=read_letter(imagn,num_letras)
% Computes the correlation between template and input
image
% and its output is a string containing the letter.
% Size of 'imagn' must be 42 x 24 pixels
% Example:
% imagn=imread('D.bmp');
% letter=read_letter(imagn)
%load templates
global templates
comp=[ ];
for n=1:num_letras
sem=corr2(templates{1,n},imagn);
comp=[comp sem];
end
%pause(1)
vd=find(comp==max(comp));
%*-*-*-*-*-*-*-*-*-*-*-*-*if vd==1
letter='A';
elseif vd==2
letter='B';
elseif vd==3
letter='C';
elseif vd==4
letter='D';
elseif vd==5
letter='E';
elseif vd==6
letter='F';
elseif vd==7
23
letter='G';
elseif vd==8
letter='H';
elseif vd==9
letter='I';
elseif vd==10
letter='J';
elseif vd==11
letter='K';
elseif vd==12
letter='L';
elseif vd==13
letter='M';
elseif vd==14
letter='N';
elseif vd==15
letter='O';
elseif vd==16
letter='P';
elseif vd==17
letter='Q';
elseif vd==18
letter='R';
elseif vd==19
letter='S';
elseif vd==20
letter='T';
elseif vd==21
letter='U';
elseif vd==22
letter='V';
elseif vd==23
letter='W';
elseif vd==24
letter='X';
elseif vd==25
letter='Y';
elseif vd==26
letter='Z';
%*-*-*-*-*
24
elseif vd==27
letter='1';
elseif vd==28
letter='2';
elseif vd==29
letter='3';
elseif vd==30
letter='4';
elseif vd==31
letter='5';
elseif vd==32
letter='6';
elseif vd==33
letter='7';
elseif vd==34
letter='8';
elseif vd==35
letter='9';
elseif vd==36
letter='0';
%********
elseif vd==37
letter='a';
elseif vd==38
letter='b';
elseif vd==39
letter='c';
elseif vd==40
letter='d';
elseif vd==41
letter='e';
elseif vd==42
letter='f';
elseif vd==43
letter='g';
elseif vd==44
letter='h';
elseif vd==45
letter='i';
elseif vd==46
25
letter='j';
elseif vd==47
letter='k';
elseif vd==48
letter='l';
elseif vd==49
letter='m';
elseif vd==50
letter='n';
elseif vd==51
letter='o';
elseif vd==52
letter='p';
elseif vd==53
letter='q';
elseif vd==54
letter='r';
elseif vd==55
letter='s';
elseif vd==56
letter='t';
elseif vd==57
letter='u';
elseif vd==58
letter='v';
elseif vd==59
letter='w';
elseif vd==60
letter='x';
elseif vd==61
letter='y';
elseif vd==62
letter='z';
else
letter='l';
%*-*-*-*-*
End
26
Lettere crope :
%function letter_in_a_line
function [fl re space]=letter_crop(im_texto)
% Divide letters in lines
im_texto=clip(im_texto);
num_filas=size(im_texto,2);
%figure,imshow(im_texto); %title('line sent in the
function letter');
for s=1:num_filas
s;
sum_col = sum(im_texto(:,s));
if sum_col==0
k = 'true';
nm=im_texto(:,1:s-1); % First letter matrix
%figure,imshow(nm);
%title('first letter in the function
letter_in_a_line');
%pause(1);
rm=im_texto(:,s:end);% Remaining line matrix
%figure,imshow(rm);
%title('remaining letters in the function
letter_in_a_line');
%pause(1);
fl = clip(nm);
%pause(1);
re=clip(rm);
space = size(rm,2)-size(re,2);
%*-*-*Uncomment lines below to see the result*%subplot(2,1,1);imshow(fl);
%subplot(2,1,2);imshow(re);
break
else
fl=im_texto;%Only one line.
re=[ ];
space = 0;
end
end
function img_out=clip(img_in)
27
[f c]=find(img_in);
img_out=img_in(min(f):max(f),min(c):max(c));
Lines Crop :
function [fl re]=lines(im_texto)
% Divide text in lines
% im_texto->input image; fl->first line; re->remain
line
% Example:
% im_texto=imread('TEST_3.jpg');
% [fl re]=lines(im_texto);
% subplot(3,1,1);imshow(im_texto);title('INPUT IMAGE')
% subplot(3,1,2);imshow(fl);title('FIRST LINE')
% subplot(3,1,3);imshow(re);title('REMAIN LINES')
im_texto=clip(im_texto);
num_filas=size(im_texto,1);
for s=1:num_filas
if sum(im_texto(s,:))==0
nm=im_texto(1:s-1, :); % First line matrix
rm=im_texto(s:end, :);% Remain line matrix
fl = clip(nm);
re=clip(rm);
%*-*-*Uncomment lines below to see the result**-*-*%
subplot(2,1,1);imshow(fl);
%
subplot(2,1,2);imshow(re);
break
else
fl=im_texto;%Only one line.
re=[ ];
end
end
function img_out=clip(img_in)
[f c]=find(img_in);
img_out=img_in(min(f):max(f),min(c):max(c));%Crops
image
28
rc = fl;
while 1
%Fcn 'letter_crop' separate letters in a line
[fc rc space]=letter_crop(rc); %fc = first
letter in the line
%rc = remaining
cropped line
%space = space
between the letter
%
cropped and
the next letter
%uncomment below line to see letters one by one
%figure,imshow(fc);pause(0.5)
img_r = imresize(fc,[42 24]);
%resize letter
so that correlation
%can be performed
30
n = n + 1;
spacevector(n)=space;
%Fcn 'read_letter' correlates the cropped letter
with the images
%given in the folder 'letters_numbers'
letter = read_letter(img_r,num_letras);
%letter concatenation
word = [word letter];
if isempty(rc)
more characters
break;
end
end
%-------------------------------------------------%
max_space = max(spacevector);
no_spaces = 0;
for x= 1:n
%loop to introduce space at requisite
locations
if spacevector(x+no_spaces)> (0.75 * max_space)
no_spaces = no_spaces + 1;
for m = x:n
word(n+x-m+no_spaces)=word(n+xm+no_spaces-1);
end
word(x+no_spaces) = ' ';
spacevector = [0 spacevector];
end
end
%fprintf(fid,'%s\n',lower(word));%Write 'word' in
text file (lower)
%fprintf(fid,'%s\n',word);%Write 'word' in text
file (upper)
text = char(text, word);
31
32
case 'Override'
fid = fopen(filepath, 'wt');
case 'Cancle'
return;
end
else
fid = fopen(filepath, 'wt');
end
h = waitbar(0,'Please wait...');
steps = 100;
for step = 1:steps
% computations take place here
waitbar(step / steps)
end
close(h)
%fprintf(fid,'%s\n',lower(word));%Write 'word' in text
file (lower)
txt=getappdata(0,'txt');
rmappdata(0,'txt');
nRows = size(txt, 1) ;
stxt='';
if nRows>1
for k=1:nRows
fprintf(fid,'%s\n',txt(k,:));%Write 'word' in
text file (upper)
stxt=strcat(stxt,32,txt(k,:),10);
end
else
fprintf(fid,'%s\n',txt);
stxt=txt;
end
fclose(fid);
date1=date;
decr=get(handles.edit_note,'String');
if strcmp(decr,'Write Note here ...')
decr=NaN;
end
34
%data1 = cell(1,6);
columns={'id','name_file','text','path_file','time','no
te'};
data1={handles.lastid fname stxt pathname date1 decr};
conn = database('dbFiles','sa','123');
insert(conn,'File_Data',columns,data1);
close(conn)
% Update handles structure
guidata(hObject, handles);
%Open 'text.txt' file
winopen(filepath)
close
% hObject
handle to load (see GCBO)
% eventdata reserved - to be defined in a future
version of MATLAB
% handles
structure with handles and user data (see
GUIDATA)
[filename,pathname] = uigetfile('*.txt;','select txt
file');
filepath=fullfile(pathname,filename);
h = waitbar(0,'Please wait...');
steps = 100;
for step = 1:steps
% computations take place here
waitbar(step / steps)
end
close(h);
%# preassign s to some large cell array
txt=cell(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen(filepath,'r');
tline = fgetl(fid);
while ischar(tline)
txt{lineCt} = tline;
lineCt = lineCt + 1;
%# grow s if necessary
if lineCt > sizS
txt = [txt;cell(10000,1)];
sizS = sizS + 10000;
end
tline = fgetl(fid);
end
%# remove empty entries in s
txt(lineCt:end) = [];
set(handles.text2,'String',txt)
set(handles.Speak,'Enable','on')
fclose(fid)
Loading file in edit tool :
36
Figure 3.9 : Loading a text of notepad file format in the edit tool.
Text To Speech
end
for k=1:length(rwtxt)
Speaker.Speak (rwtxt{k});
end
end
catch
warning(['Not working !!']);
end
Design DataBase (using SQL Srver 2008 R) :
On Opening Form :
39
Open File :
40
for k=1:nRows
fprintf(fid,'%s\n',txt{k,:});%Write
'word' in text file (upper)
end
fclose(fid);
winopen(filepath{1});
case 'Delete'
button = questdlg(['Are you sure you want
to delete?'], ...
'Warning','OK','Cancle','Cancle');
switch button
case 'OK'
btn_del_Callback(hObject,
eventdata, handles);
case 'Cancle'
return;
end
case 'Cancle'
return;
end
end
end
Edit :
42
id=get(handles.edit_id,'String');
if ~isempty(id)
handles.edit=handles.edit+1;
if handles.edit==1
set(handles.edit_note,'Enable','on')
set(handles.edit_note,'BackgroundColor',[1.0 1.0
1.0]);
else
handles.edit=0;
conn = database('dbFiles','sa','123');
edit_txt=get(handles.edit_note,'String');
if ~isequal(edit_txt,'There is no note')
whereclause=strcat('where id=',id);
update(conn,'File_Data',{'note'},
{edit_txt},whereclause)
set(handles.edit_note,'Enable','inactive')
set(handles.edit_note,'BackgroundColor',[0.961
0.976 0.992]);
helpdlg('You are Done update','Update')
else
set(handles.edit_note,'Enable','inactive')
set(handles.edit_note,'BackgroundColor',[0.961
0.976 0.992]);
end
end
% Update handles structure
guidata(hObject, handles);
end
set(handles.listbox1,'String','')
set(handles.edit_id,'String','')
set(handles.edit_name,'String','')
set(handles.edit_date,'String','')
set(handles.edit_location,'String','')
set(handles.edit_text,'String','')
set(handles.edit_note,'String','')
end
close(curs)
close(conn)
helpdlg('Delete if Done','Delete')
case 'Cancle'
return;
end end
List of files :
46
47
CHAPTER FOUR
IMPLEMENTATION
4.1 Project Implementation :
1. Loading any image format (bmp, jpg, png etc )
48
49
Saving results to selected output format, for instance, searchable TXT file format.
.10
.And store (name, Text, Location, path and note) of txt file in database directly
.Cliking on Save to Notepad Will open form to insert name and location of the file (Browse)
50
.11
If the file name is already in the location you select a message will show
.ask you if you want to override or cancel to rename the file
51
.13
: When you select the file ,the contents text well loaded in the edit text
Figure 4.9 : Loading the contents of the file into the edit text.
Using database to view the recent documents that have been saved by
.this program
53
.14
.15
54
.16
55
18.
56
Conclusion :
In this project, we discussed the topics relevant to the development of TTS systems. We
conducted MOS tests to evaluate the performance of speech synthesizer. This paper describes the
successful completion of a simple text to speech translation by simple matrix operations. Thus
this system is very easy and efficient to implement unlike other methods which involve many
complex algorithms and methods. The next step in improving this system would be
implementing some machine learning algorithms in order to support generalization.
text to speech conversion and recognition system. These problems suggest a variety of research
directions that need to be pursued to make such a system feasible.
First , we will add another feature to our project which is Speech to Text Conversion .
Second , Saving the audio files with different types of audio file formats ,WAV, MP3, VOX,
RAW,...etc, with the help of database programs. Third, opening an audio file and getting the
speech to text conversion of this file. Forth, making the application able to open text in different
text file formats , pdf , docx ,...etc. Fifth, Saving the text files with different types of text file
formats, pdf , docx,...etc, with the help of database programs.
Finally , we are interested to make our project more efficient and getting the use of
different slides of people of the community and spreading its features globally.
57
REFERENCES :
[1] S. D. Shirbahadurkar and D.S.Bormane Subjective and Spectrogram Analysis of Speech
Synthesizer for Marathi TTS Using Concatenative Synthesis. 2010 IEEE International
Conference on Recent Trends in Information, Telecommunication and Computing.
[2] Johnny Kanisha and G.Balakrishanan Speech Transaction for Blinds Using Speech-TextSpeechConversions Advances in Computer Science and Information Technology
Communications in Computer and Information Science Volume 131, 2011, pp 43-48
[3] Hamad, M. Arabic Text-To-Speech Synthesizer, Research and Development (SCOReD),
2011 IEEE Student Conference 9 978-1-4673-0099-5 ) on 19-20 Dec. 2011 409 - 414 .
[4] S.D.Shirbahadurkar and D.S.Bormane, (2009) Marathi Language Speech Synthesizer Using
Concatenative Synthesis Strategy (Spoken in Maharashtra, India), Second International
Conference on Machine Vision, pp. 181-185.
[5] http://code.google.com/p/tesseract-ocr/. Last accessed:
May 12, 2009.
[6] Md. Abul Hasnat, Muttakinur Rahman Chowdhury and Mumit Khan, "Integrating Bangla
script recognition support in Tesseract OCR", Proc. of the Conference on Language and
Technology 2009 (CLT09), Lahore, Pakistan, 2009.
[7] http://code.google.com/p/ocropus/. Last accessed: May
12, 2009.
[8] http://code.google.com/p/banglaocr/. Last accessed: May
12, 2009.
58