Академический Документы
Профессиональный Документы
Культура Документы
What is Sound ?
Physical terms
Amplitude
Frequency
Spectrum
1 Minute of Sound
Sampling
Rate
44.1k 2646k 5292k 5292k 10584k
Header Information
Magic Cookie
Sampling Rate
Bits/Sample
Channels
Byte Order
Endian
Compression type
Data
NIST_1A
1024
sample_rate -i 16000
channel_count -i 1
sample_n_bytes -i 2
sample_byte_format -s2 10
sample_sig_bits -i 16
sample_count -i 594400
sample_coding -s3 pcm
sample_checksum -i 20129
end_head
s p ee ch l a b
l to a
transition:
Graphs from Simon Arnfields web tutorial on speech, Sheffield:
http://lethe.leeds.ac.uk/research/cogn/speech/tutorial/
Acoustic Modeling
Describes the sounds that
make up speech
Speech Recognition
P(O|W) P(W)
P(W|O) = ------------------------
P(O)
Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 28 CarnegieMellon
Mechanismofstateoftheartspeechrecognizers
Speechin
Acoustic
analysis
x1 ... xT
P(x1... xT | w1... wk )
Recognition:
Maximize Pronunciationlexicon
Recognized
Sentence
Acoustic Sampling
10 ms frame (ms = millisecond = 1/1000 second)
~25 ms window around frame to smooth signal
processing
25 ms
...
10ms
Result:
a1 a2 a3 Acoustic Feature Vectors
n
P ( X 1 , X 2 , X 3 , , X n ) P ( X i | X 1 , X 2 , X 3 , , X i 1 )
i 1
P ( X i | X 1 , X 2 , X 3 , , X n ) P ( X i | X i 1 )
n
P ( X 1 , X 2 , X 3 , , X n ) P ( X i | X i 1 )
i 1
p( s2 | s1 ) q ( y1 | s2 , s1 )
S1 S2 S3
Speech Acoustic
Transcribe* Train
data models
Text Language
Train
data models
CONVERSATIONAL SPEECH
100
Non-English
English
50
Word Error Rate (%)
READ SPEECH
Standard microphone
Noisy environment
Unlimited Vocabulary
All results are Speaker -Independent
1
1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998
NSA/Wayne/Doddington
Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 43 CarnegieMellon
References
Speech Recognition resource links can be found at:
http://svr-www.eng.cam.ac.uk/comp.speech/Section2/speechlinks.html
An excellent tutorial on speech recognition by Wayne Ward:
http://www-2.cs.cmu.edu/~roni/11761-s01/Presentations/whw%20hmm's%20in%20speech%20recognition%203.0.pdf