Академический Документы
Профессиональный Документы
Культура Документы
Slide 1 RS1
Ranveer Singh, 9/28/2011
Language is man's most important means of communication and speech its primary medium. Speech provides an international forum for communication among researchers in the disciplines that contribute to our understanding of the production, perception, processing, learning and use.With due change in the world, the communication style also changed as the world has moved towards the computer hence we need to communicate to the machine. Speech recognition is the major thing in the communication between the humen and the computer. muniction is the major
2 of 23
Speech recognition is the process of converting a speech signal to a sequence of words inthe formof digital data, by means of an algorithm implemented as a computer program. Speech recognition applications that have emerged over the last few years include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), domotic appliance control and content-based spoken audio search (e.g. find a podcast where particular words were spoken).
3 of 23
Speech Recognition are technologies of particular interest, for their support of direct communication between humans and computers, through a communications mode, humans commonly use among themselves and at which they are highly skilled.
Rudnicky, Hauptman, and Lee
http://starbase.cs.trincoll.edu/~ram/cpsc352/
4 of 23
Radio Rex in the 1920s, was the first success story in the field of speech recognition
www.stanford.edu/class/linguist236/lec1.pdf
5 of 23
1936 - AT & Ts Bell labs started study of speech recognition (funded by DARPA) 1974 - optical character recognition 1975 text to speech synthesis ( Kurzweil reading machine) 1978 speak and spell toy released by Texas Instruments 1980 Xerox started producing reading machine Text bridge 1997 Dragon Systems produces first continuous speech recognition product
6 of 23
http://starbase.cs.trincoll.edu
IBM ViaVoice 10
Developed by IBM Marketed (recently) by ScanSoft
Microsoft Speech
Developed by Microsoft Included with Office XP and Office 2003 Considered to be cumbersome
7 of 23
Play back dictation and hear text read aloud Switch between applications using voice commands Manage e-mail Create additional user voice files Browse the Web
9 of 23
Ease of use Robust performance Automatic learning of new words and sounds Grammar for spoken language Control of synthesized voice quality Integrated learning for speech recognition and synthesis
B.S Atal. Speech recognition in 2001: New research directions Proc.Natl.Acad.Sci USA Vol 92, pp 10046-100551Oct1995
10 of 23
Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
11 of 23
Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
12 of 23
Audio server presents raw digitized audio to speech recognizer Swiftus parses the word list to produce a set of feature-value pairs Discourse manager maintains a stack of information about the current conversation Discourse manager and application respond to the user by sending a text string to text to speech manager
Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
13 of 23
Continuous-speech recognizers require grammars that specify every possible utterance a user could say to the application The recognizer grammar should closely synchronize with the Swiftus semantic grammar Solved by inventing Unified Grammar
Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
14 of 23
Semantic representation generated in real time to facilitate conversation Accurate understanding Tolerance of misrecognized words Wide variation among applications Ease of use
Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
15 of 23
VOICE I/P
DATABASE NO
PROCESS NO MATCH
17
MOMORY
conversational pacing explicit error corrections define the functional boundaries of an application
Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
20 of 23
Voice recognition is the identification or verification of an individual identity using speech as the identifying characteristic. To identify its auditory and vocal characteristics. An individual speech spectrum is of the form as shown here.
Medical transcription mainly in radiology and pathology First use of speech recognition in the field of radiology in 1981 Mean accuracy rate of reading pathology reports, using IBM Via Voice Pro software 93.6% compared to human transcription at 99.6%
M. Al.Aynati, K.Chomeyko Comparison of Voice-automated Transcription and Human Transcription in General Pathology ReportsArch Pathol Lab Med. 2003;127:721725)
22 of 23
13% used voice recognition 16% discontinued using voice recognition 21% believed chairside computer use could be improved with better voice recognition Using an automatic speech recognition will be the way to go!!
23 of 23
LIMITATIONS OF S.R.S
The task has been viewed as one of de-sensitising recognisers to variability. It is not entirely clear that this idea models adequately the parallel process in human speech perception.
24 of 23
MERITS OF S.R.S
The uses of speech technology are wide ranging. Most effort at the moment centers around trying to provide voice input and output for information systems - say, over the telephone network. The idea is to make information availavle to those who dont want to face keyboard and screen or cannot face it
25 of 23
BIBLIOGRAPHY AKMAJIAN,ADRIAN,RICHARD A. DEMERS 1979. LINGUISTICS: AN INTRODUCTION TO LANGUAGE AND COMMUNICATIONS. ALLEN, JONATHAN,SHARON 1987. FROM TEXT TO SPEECH: THE MITALK SYSTEM. SPEECH TECHNOLOGY SPEECH. J ACOUSTICAL SOCIETY OF AMERICA BAKER JANETM .1981 .HOW TO ACHIEVE SPEECH RECOGNITION. SPEECH TECHNOLOGY http://www.microsoft.com/speech
26 of 23