Speech Recognition

1
SEMINAR REPORT
ON
SPEECH RECOGNITION
SESSION(2009-2010)
SUBMITTED TO THE
BOARD OF RAJASTHAN TECHNICAL UNIVERSITY,KOTA
FOR THE B TECH OF
ELECTRICAL ENGINEERING
SUBMITTED TO!- SUBMITTED BY!-
MR J N MATHUR PRAMOD YADAV
HEAD OF ELECTRICAL FINAL YEAR(EE)
DEPARTMENT
2
SPEECH RECOGNITION
INTRODUCTION
Artificial intelligence (AI) involves two basic ideas. First, it involves
studying the thought processes of human beings. Second, it deals with
representing those processes via machines (like computers, robots, etc).AI is
behaviour of a machine, which, if performed by a human being, would be
called intelligence. It makes machines smarter and more useful, and is less
epensive than natural intelligence.!atural language processing (!"#)
refers to artificial intelligence methods of communicating with a computer in
a natural language like $nglish. %he main ob&ective of a !"# program is to
understand input and initiate action.
' %he input words are scanned and matched against internally stored
known words. Identification of a keyword causes some action to be taken. In
this way, one can communicate with the computer in one(s language. !o
special commands or computer language are re)uired. %here is no need to
enter programs in a special language for creating software.*oice+," takes
speech recognition even further.Instead of talking to your computer, you(re
essentially talking to a web site, and you(re doing this over the phone.-.,
you say, well, what eactly is speech recognition/ Simply put, it is the
process of converting spoken input to tet. Speech recognition is thus
sometimes referred to as speech0to0tet.
1 Speech recognition allows you to provide input to an application with
your voice. 2ust like clicking with your mouse, typing on your keyboard, or
pressing a key on the phone keypad provides input to an application3 speech
recognition allows you to provide input by talking. In the desktop world, you
need a microphone to be able to do this. In the *oice+," world, all you
need is a telephone
4 %his paper characteri5es the methodology of Artificial Intelligence by
looking at research in speech understanding, a field where AI approaches
contrast starkly with the alternatives, particularly engineering approaches.
Four values of AI stand out as influential6 ambitious goals, introspective
plausibility, computational elegance, and wide significance. %he paper also
discusses the utility and larger significance of these values.
7 AI is often defined in terms of the problems it studies. 8ut in fact, AI
is not the study of intelligent behavior etc., it is a way to study it. %his is
3
evident from the way AI is done in practice. %his paper illustrates this by
contrasting AI and alternative approaches to speech understanding. 8y so
doing it brings out some key characteristics of AI methodology. %his paper
is primarily written for a general AI audience interested in methodological
issues, complementing previous work (9ohen :;;:3 8rooks :;;:a). It is
also written for any AI researchers who are contemplating starting a pro&ect
in speech understanding it is intended to be the paper that, if available
earlier, might have saved me four years of wasted effort. %his paper does not
provide technical discussion of specific issues in speech understanding3 for
that, see (<ard :;;=) and the references given below.
H"#$%&'
= %he first speech recogni5er appeared in :;7' and consisted of a
device for the recognition of single spoken digits
.
Another early device was
the I8, Shoebo, ehibited at the :;=4 !ew >ork <orld(s Fair.
? -ne of the most notable domains for the commercial application of
speech recognition in the @nited States has been health care and in particular
the work of the medical transcriptionist. According to industry eperts, at its
inception, speech recognition (SA) was sold as a way to completely
eliminate transcription rather than make the transcription process more
efficient, hence it was not accepted. It was also the case that SA at that time
was often technically deficient. Additionally, to be used effectively, it
re)uired changes to the ways physicians worked and documented clinical
encounters, which many if not all were reluctant to do. %he biggest
limitation to speech recognition automating transcription, however, is seen
as the software. %he nature of narrative dictation is highly interpretive and
often re)uires &udgment that may be provided by a real human but not yet by
an automated system. Another limitation has been the etensive amount of
time re)uired by the user andBor system provider to train the software.
C A distinction in ASA is often made between Dartificial synta
systemsD which are usually domain0specific and Dnatural language
processingD which is usually language0specific. $ach of these types of
application presents its own particular goals and challenges.
4
C()&)*$+&"#$"*# %, $(+ AI A--&%)*(
; %his section discusses the AI approach in terms of four key values of
AI6 ambitious goals, introspective plausibility, computational elegance, and
wide significance. %hese remarks apply specifically to classical AI in its
pure form, but are also more generally relevant, as discussed.
A./"$"%0# G%)1# )23 B%13 L+)-#
:E AI speech research, like AI research more generally, is motivated not
by what can be achieved soon, but by a long0term vision. For speech, this
has been the idea of a system able to produce an optimal interpretation based
on ehaustive processing of the input. $ngineers prefer to set goals towards
which progress can be measured.
AI speech research tends, since eisting systems are nowhere near
optimal, to seek breakthroughs and radically new perspectives. $ngineers
tend to proceed by improving eisting systems. AI speech research, like
other AI research, tries to solve problems in their most general form, often
by trying to construct a single, general0purpose system. $ngineering is in
large part the creation of solutions for specific problems, and the resulting
accumulation of know0how useful for solving other problems.
I2$&%#-+*$"4+ P1)0#"/"1"$'
:: AI speech research, like AI more generally, often makes recourse to
introspection about human abilities. %his subsection illustrates the use of
introspection at four phases6 when setting long0term goals, when setting
short0term goals, for design, and when debugging. <hen setting long0term
research goals, AI speech research, like AI research in general, aims to
duplicate apparent human abilities. In particular, many AI speech
researchers are inspired by the apparent near perfection of human speech
understanding6 the fact that people can understand &ust about anything you
say to them, and even repeat back to you the words you said.
Scientific approaches to human speech understanding, in contrast, find
it more productive to focus on what can be learned from the limitations of
human speech understanding. (A simple demonstration of these limitations
can be had by simply recording a conversation and later listening to it
repeatedly3 you will discover that you missed a lot when hearing it live.) In
general, the feeling that perception and understanding is complete is an
5
illusion of introspection (8rooks :;;:a3 Fennett :;;:). <hen setting short0
term goals, AI speech research, like AI more generally, values systems
which do things which seem, introspectively, clever. Such cleverness is
often found not in the overall performance but in the details of processing in
specific cases.

:' For speech understanding, AI has emphasi5ed the use of reasoning to
compensate for failures of the low level recognition component, often by
selecting or creating word hypotheses for which the recogni5er had little or
no bottom0up evidence. Foing this can involve arbitrarily obscure
knowledge and arbitrarily clever inferences, which makes for impressive
traces of system operation. $ngineers typically design and tune systems to
work well on average3 they seldom show off cleverness in specific cases.
Few engineered speech systems do much eplicit reasoning, and none bother
to eplicitly correct mis0recognitions G rather, they simply barge on to
compute the most likely semantic interpretations. For design, AI speech
research, like AI more generally, takes inspiration from human Hcognitive
architectureI, as revealed by introspection. For speech, this has led to the use
of protocol studies, done with a committee or with one man and a
scratchpad. 8oth suggest a view of speech understanding as problem
solving, and suggest a process where diverse knowledge sources cooperate
by taking turns G for eample, with a partially recogni5ed input leading to a
partial understanding, that understanding being used to Hfigure outI more
words, leading to a better recognition result, then a better understanding, and
so on.
:1 %his has been fleshed out into, for eample, Hblackboard modelsI,
which include a focus on eplicit representation of hypotheses, a focus on
the interaction of knowledge sources, a focus on the scheduling of
processing, and an image of iterative refinement over many cycles of
computation. Scientific approaches, on the other hand, do not consider
introspection to be reliable or complete. For eample, introspection can
easily focus on conscious Hfiguring outI, eplicit decisions, and the role of
grammar, but probably not on automatic processing, parallel processing of
multitudes of hypotheses, and the role of prosody. For development, AI
speech research, like AI more generally, values systems which people can
understand intuitively. %his makes it possible to debug by eamining
behavior on specific cases and ad&usting the system until it works in a way
that introspectively seems right. 8y doing so the developer can see if the
system is a proper implementation of his vision. ,ore important, he can get
6
a feel for whether that vision is workable. In other words, the developer can
use eperience with the internal workings of a program to leverage
introspection about how a cognitive process might operate. %his techni)ue is
perhaps the most distinctive aspect of AI methodology3 it could be called
Hthe hackerJs path to intelligent systemsI or perhaps Hcomputational
introspectionI. $ngineers focus on the desired input0output behavior of a
system and design algorithms to achieve it. %hey typically do not care about
the introspective plausibility of the intermediate steps of the computation.
C%.-0$)$"%2)1 E1+5)2*+
:4 AI speech research, like AI research more generally, postulates that
knowledge is good and that more knowledge is better. In order to bring more
knowledge to bear on specific decisions, integration of knowledge sources is
considered essential. For speech, this means, most typically, wanting to use
the full inventory of higher0level knowledge, including knowledge of
synta, semantics, domain, task and current dialog state, at the earliest stages
of recognition and understanding.
:7 $ngineers focus on performance, for which more knowledge may or
may not be worthwhile, especially when processing time and memory are
finite. <hether a specific type of knowledge is worthwhile, and if so when
to apply it, are determined by eperiment. AI speech research, like other AI
research, involves an aesthetic sense of what designs are good. In particular,
to build systems that can be etended and scaled up, AI researchers
generally feel that elegant designs are re)uired. For speech, the meaning of
HelegantI has changed along with broader fashions in computer science G at
various times it has included6 eplicit (declarative) representation of
hypotheses and control structures, emergent properties, multiprocessor
(distributed) parallelism, fine0grained (connectionist, massive) parallelism,
uniformity, heterogeneity, symbolic reasoning, evidential reasoning, and so
on. $ngineers prefer to actually try to build big systems, rather than &ust
build prototypes and argue that they will scale up.
6"3+ S"52","*)2*+
:= AI speech research, like AI research in general, has a larger purpose6
researchers donJt &ust want to solve the problem at hand, they also want their
solution to inspire other work. AI speech researchers have generally wanted
7
their work to be relevant to other problems in natural language processing
and to linguistics, and have sought out and focused on phenomena of interest
to those fields, such as ambiguity. AI speech researchers have also wanted to
be relevant for the larger AI community. %hey have emphasi5ed analogies
relating speech understanding to other topics, such as search and planning.
%hey have also emphasi5ed ties to general AI issues, such as the Attention
#roblem, the 9onstraint #ropagation #roblem, the 9ontet #roblem, the
Fisambiguation #roblem, the $vidential Aeasoning #roblem, the .nowledge
Integration #roblem, the .nowledge Aepresentation #roblem, the !oisy
Input #roblem, the Aeal0<orld #roblem, the Aeasoning with @ncertainty
#roblem, the Sensor Fusion #roblem, the Signal0to0Symbol #roblem, and a
few others. AI speech researchers have also tried to make their work relevant
for computer science more generally. 8ased on insights from speech
understanding they have called for new architectures for computer networks,
for software systems, and for computer hardware. $ngineers prefer to work
on goals rather than HinterestingI problems. %hey also prefer to work on
speech for its own sake, rather than for what it reveals about other problems.
%hey tend to think in terms of real problems, not abstract ones.
O0$*%.+
:? For speech understanding, and more generally, AI approaches have
seemed more promising than traditional science and engineering. %his is
probably because AI methodology eploits introspection and aesthetic
considerations, both of which seem to provide the right answers with
relatively little effort. Kowever AI methodology has not fulfilled this romise
for speech. %he engineering approach, in contrast, has produced solid and
impressive results. %he significance of this is not obvious. Some AI
researchers believe that this means that the speech community should be
Hwelcomed backI to AI, as argued by several recent editorials. 8ut the
values of AI and mainstream (engineering0style) speech research are so
different, that reconciliation does not seem likely.-ther AI researchers take
the view that the results of engineering work are not interesting3 presumably
meaning that they are compelling neither introspectively or aesthetically.
,any further believe that the AI approach to speech will be vindicated in the
end. A few strive towards this goal (I was one). Kowever, AI goals conflict
with other more important goals, and so it is hard to be optimistic about
future attempts to apply AI methodology to the speech understanding
problem. %he conclusions of this case study are thus in line with the
conclusions of 8rooksJ case study (8rooks :;;:a). For both speech
8
understanding and robotics, AI ethodology turns out to be of little value.
<hether this is also true in other cases is a )uestion of great interest.
L)&5+& S"52","*)2*+
:C %he values of AI outlined in L' actually best characteri5e the
methodology of classical AI, ascendent in the :;?Es, but less popular in
recent years, with the proliferation of variant approaches to AI, including
some, such as connectionism and 8rooks0style: AI (8rooks :;;:a3 8rooks
:;;:b), which re&ect many of the values of mainstream AI. !evertheless,
classical AI methodology still has wide influence. For one thing, the values
of L' were important when the topics and tasks of AI were being
established. As a result, those working on knowledge representation,
planning, reasoning, distributed intelligence, user modeling, etc. today, even
if they do not share the classical values of AI, are working on problems
posed by researchers who did. ,oreover, despite changes in AI
methodology (mostly in the direction of Hmethodological respectabilityI, in
the sense of importing values and techni)ues from engineering and science),
the values of L' are not only still alive, but remain intrinsic to the eistence
of AI as a field. It is true that, at recent AI conferences, most papers do not
eplicitly refer to these values, but the fact that the papers appear at AI
conferences at all is tribute to them. If AI researchers did not rely on
introspection, have grand goals, or value aesthetic considerations, most of
them would drift off to conferences on computational psychology, user
interface engineering, specific applications, etc. And without the belief that
results should have wide significance and utility, groups of AI researchers
would drift off to separate fields of speech, vision, motion, etc. In any case,
it is clear that AI is a distinctive field of study in many ways. %his paper has
been an endeavor to pinpoint &ust what it is that is special about AI.
S-++*( S'2$(+#"#
:; An aspect of sound0processing research that has made faster progress
than ASA is speech synthesis. %aking the knowledge from ASA such as
phonemes and suchMmore stuffN, speech synthesi5ers have become relatively
successful in generating understandable words and sentences. Kowever,
making a computer speak naturally without the stoic voice reminiscent of
old science0fiction robots is still a greater challenge ahead.
9
As a branch of AI, the importance of ASA and speech synthesis lies in
the development of pattern0recognition programs that understands the bits of
data that compose the message. Some of the early technologies from this
field have found their way into the applications market, but they still need to
be refined in order for a computer to communicate intelligently and naturally
like people. #erhaps in the future, a person can carry on an engaging
conversation talking with his personal computer which combines the power
of ASA, language0processing, inferencing through a knowledge base, some
other intelligent component for the computer to be an active
conversationalist and steer the discussion in a certain direction, and a speech
synthesi5er that makes the computer sound like a person.
F0&$(+& "2,%&.)$"%2
'E #opular speech recognition conferences held each year or two include
I9ASS#, $urospeechBI9S"# (now named Interspeech) and the I$$$ ASA@.
9onferences in the field of !atural language processing, such as A9",
!AA9", $,!"#, and K"%, are beginning to include papers on speech
processing. Important &ournals include the I$$$ %ransactions on Speech and
Audio #rocessing (now named I$$$ %ransactions on Audio, Speech and
"anguage #rocessing), 9omputer Speech and "anguage, and Speech
9ommunication. 8ooks like DFundamentals of Speech AecognitionD by
"awrence Aabiner can be useful to ac)uire basic knowledge but may not be
fully up to date (:;;1). Another good source can be DStatistical ,ethods for
Speech AecognitionD by Frederick 2elinek and DSpoken "anguage
#rocessing ('EE:)D by +uedong Kuang etc. ,ore up to date is D9omputer
SpeechD, by ,anfred A. Schroeder, second edition published in 'EE4. %he
recently updated tetbook of DSpeech and "anguage #rocessing ('EEC)D by
2urafsky and ,artin presents the basics and the state of the art for ASA. A
good insight into the techni)ues used in the best modern systems can be
gained by paying attention to government sponsored evaluations such as
those organised by FAA#A (the largest speech recognition0related pro&ect
ongoing as of 'EE? is the OA"$ pro&ect, which involves both speech
recognition and translation components).
': In terms of freely available resources, 9arnegie ,ellon @niversity(s
S#KI!+ toolkit is one place to start to both learn about speech recognition
and to start eperimenting. Another resource (free as in free beer, not free
software) is the K%. book (and the accompanying K%. toolkit). %he A%P%
10
libraries OA, library, and F9F library are also general software libraries
for large0vocabulary speech recognition.
C%2*10#"%2
'' %he main focus in AI when it comes to sound0processing is to make a
computer that can recogni5e what a person says to it. %he reason why this is
done as opposed to making a computer recogni5e the sound of a car or the
sound of a telephone ring is because :) there usually is something
meaningful when someone talks and ') making a computer capable of
automated speech recognition(ASA) would be a net step in man0machine
interface(,,I)"ike vision, the place where sound is actual analy5ed is in
the brain00which precisely makes it difficult to study because the brain is not
understood very well. %hat is why speech researchers are more concerned
with getting a computer to &ust recogni5e speech as opposed to getting a
computer to mimic how people recogni5e speech(i.e. the top0down
approach).
'1 9omputers of today can store many hours of sounds digitally.
Kowever, strict voice0pattern0to0voice0pattern matching is not accurate
enough for a computer to reali5e that the voice it received and the voice it
stored in memory comes from the same person saying the same thing. %his
is not the fault of the computer per se3 people tend to speak a little
differently each time they talk.

Speech Recognition

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Speech Recognition

Загружено:

Авторское право:

Доступные форматы

1

Вам также может понравиться