Академический Документы
Профессиональный Документы
Культура Документы
Introduction
What is speech recognition?
Automatic speech recognition(ASR) is the process by
which a computer maps an acoustic speech signal to text.
APPLICATIONS
Healthcare
Military
Helicopters
Training air traffic controllers
Telephony and other domains
ASR MODELS
Embedded speech recognition
Speech recognition in the cloud
Distributed speech recognition
Shared speech recognition with user based
adaptation(proposed model of use)
Front-end Process
Involves spectral analysis that derives feature vectors to
capture salient spectral characteristics of speech input.
Backend Process
Combines word-level matching and sentence-level search
to perform an inverse operation to decode the message
from the speech waveform.
Acoustic model
Provides a method of calculating the likelihood of any
feature vector sequence Y given a word W.
Each phone is represented by a HMM.
Language Model
The purpose of the language model is to take advantage of
linguistic constraints to compute the probability of different
word sequences
Assuming a sequence of words, ={1,2,,k}, the
probability () can be expanded as
()=(1,2,,k)
We generally make the simplifying assumption that any
word depends only on the previous 1 words in the
sequence
This is known as an N-gram model
Grammars Use context free grammars represented by
Finite State Automata (FSA)
Decoding algorithm
Asynchronous stack based decoder memory efficient but
complex.
Viterbi based decoder most efficient.
3 types of search implementation
Combination of static graph and static search space
Static graph space with dynamic search space
Dynamic graph
OpenEars
OpenEars is an iOS framework for iPhone voice recognition
and speech synthesis (TTS).
It uses the open source CMU Pocketsphinx, CMU Flite, and
CMUCLMTK libraries.
OpenEars works by doing the recognition inside the iPhone
without using the network.
Sphinx
CMU Sphinx is a open source toolkit for speech recognition
developed by Carnegie Melon University.
CMU Sphinx is a speaker-independent large vocabulary
continuous speech recognizer.
Pocketsphinx lightweight recognizer library written in C.
Sphinx4 adjustable, modifiable recognizer written in
Java.
CeedVocal SDK
CeedVocal SDK is a isolated word speech recognition SDK for
iOS.
It operates locally on the device and supports 6 languages :
English, French, German, Dutch, Spanish and Italian.
Google now
Siri
S-Voice
Dragon Search
Dragon Dictation
Trippo-Mondo
Verbally
References
1. Rethinking Speech Recognition on Mobile Devices, Anuj Kumar, Anuj Tewari,
Seth Horrigan, Matthew Kam, Florian Metze and John Canny.
2. Towards large vocabulary ASR on embedded platforms, Miroslav Novak.
3. Speech Recognition: Statistical Methods, L R Rabiner, B-H Juang.
4. http://www.nuancemobiledeveloper.com, 9th April 2013.
5. http://cmusphinx.sourceforge.net , 9th April 2013.
6. http://www.politepix.com/openears.
7. http://www.creaceed.com/ceedvocal