Академический Документы
Профессиональный Документы
Культура Документы
IN SPEECH RECOGNITION
Wayne Ward
Carnegie Mellon University
Pittsburgh, PA
Acknowledgements
Much of this talk is derived from the paper
"An Introduction to Hidden Markov Models",
by Rabiner and Juang
and from the talk
"Hidden Markov Models: Continuous Speech
Recognition"
by Kai-Fu Lee
Topics
Markov Models and Hidden Markov Models
HMMs applied to speech recognition
Training
Decoding
Speech Recognition
O 1O 2
OT
Front
End
Analog
Speech
W 1W 2
WT
Match
Search
Discrete
Observations
Word
Sequence
P(W | A) =
P(A | W) P(W)
P(A)
language model
Markov Models
Elements:
S = S 0, S 1,
SN
States :
Transition probabilities : P qt = Si | qt-1 = Sj
P(B | B)
P(A | A)
P(B | A)
A
B
P(A | B)
Markov Assumption:
Transition probability depends only on current state
P q t = S i | q t-1 = S j, qt-2 = S k,
= P q t = S i | qt-1 = S j = a ji
aji 0 j,i
a
N
i =0
ji
=1
0.5
0.5
1
0.5
P(H) = 1.0
P(H) = 0.0
P(T) = 0.0
P(T) = 1.0
States
Transition probabilities
Output prob distributions
(at state j for symbol k)
P O1 | A
P O2 | A
P(yt = Ok | qt = Sj) = bj k
P(B | B)
P(A | A)
P(B | A)
P OM | A
P OM | B
Prob
Obs
P O1 | B
P O2 | B
P(A | B)
8
Prob
Obs
P(R) = 0.31
P(R) = 0.50
P(R) = 0.38
P(B) = 0.50
P(B) = 0.25
P(B) = 0.12
P(Y) = 0.19
P(Y) = 0.25
P(Y) = 0.50
Observation sequence: R B Y Y R
not unique to state sequence
ih
10
Word Model
11
Evaluation
Probability of observation sequence
O = O1 O2
OT
P (O | ) = P (O, Q | )
NT )
Ot , q t = S j | )
Compute recursively:
0 j =
1 if j is start state
0 otherwise
=
t 1
i =0
t ( j )
P (O | ) = T (S N )
ij j
(i )a
b (Ot )
t >0
Computation is O ( N 2T )
13
Forward Trellis
0.6
A 0.8
B 0.2
0.6 * 0.8
0.48
0.4 * 0.3
state 2
0.0
Final
A
t=1
t=0
1.0
A 0.3
B 0.7
0.4
Initial
state 1
1.0
1.0 * 0.3
0.12
14
A
t=2
0.6 * 0.8
0.23
0.4 * 0.3
1.0 * 0.3
0.09
B
t=3
0.6 * 0.2
0.03
0.4 * 0.7
1.0 * 0.7
0.13
OT |q t = S i , )
Compute recursively:
1 if i is end state
0 otherwise
T (i) =
t (i ) = aij b j (Ot +1 ) t +1 ( j )
N
t<T
j=0
P (O | ) = 0 (S 0 ) = T (S N )
15
Computation is O ( N 2T )
Backward Trellis
0.6
A 0.8
B 0.2
0.6 * 0.8
0.22
0.4 * 0.3
state 2
0.06
Final
A
t=1
t=0
0.13
A 0.3
B 0.7
0.4
Initial
state 1
1.0
1.0 * 0.3
0.21
16
A
t=2
0.6 * 0.8
0.28
0.4 * 0.3
1.0 * 0.3
0.7
B
t=3
0.6 * 0.2
0.0
0.4 * 0.7
1.0 * 0.7
1.0
VPt(i) = MAXq0,
qt-1
P(O1O2
Ot, qt=i | )
Recursive Computation:
VPt(j) = MAXi=0,
,N
VPt-1(i) aijbj(Ot)
P(O, Q | ) = V P T (S N )
Save each maximum for backtrace at end
17
t>0
Viterbi Trellis
0.6
A 0.8
B 0.2
0.6 * 0.8
0.48
0.4 * 0.3
state 2
0.0
Final
A
t=1
t=0
1.0
A 0.3
B 0.7
0.4
Initial
state 1
1.0
1.0 * 0.3
0.12
18
A
t=2
0.6 * 0.8
0.23
0.4 * 0.3
1.0 * 0.3
0.06
B
t=3
0.6 * 0.2
0.03
0.4 * 0.7
1.0 * 0.7
0.06
19
Forward-Backward Algorithm
t(i,j) =
t(i)
aijbj(Ot+1)
20
t+1 (j)
Sj
Baum-Welch Reestimation
expected number of trans from Si to Sj
aij =
expected number of trans from Si
(i, j )
T 1
t =0
T 1 N
(i, j )
t =0 j = 0
bj(k) =
t :Ot+1= k
i =0
T 1 N
(i, j )
t =0 i =0
21
Convergence of FB Algorithm
1. Initialize = (A,B)
2. Compute , , and
3. Estimate = (A, B) from
4. Replace with
5. If not converged go to 2
It can be shown that P(O | ) > P(O | ) unless =
22
Phone Model
23
24
Forward-Backward Training
for Continuous Speech
ALL
SHOW
SH
OW
AA
25
ALERTS
AX
ER
TS
Recognition Search
/w/
/ah/
/ts/
/th/
what's
the
display
willamette's
location
kirk's
longitude
sterett's
lattitude
26
/ax/
Viterbi Search
Uses Viterbi decoding
Takes MAX, not SUM
Finds optimal state sequence P(O, Q | )
not optimal word sequence P(O | )
Time synchronous
Extends all paths by 1 time step
All paths have same length (no need to
normalize to compare scores)
27
Word 1
Word 2
Word 1
Word 2
S1
S2
S3
S1
S2
S3
S1
S2
S3
S1
S2
S3
time t
time t+1
29
Score
BackPtr
ParmPtr
S1
S1
S2
S3
TO
BEAM
Dynamically constructed
?
Within threshold ?
Exist in TO beam?
Better than existing
score in TO beam?
time t
time t+1
31
b j (x ) =
c
M
m =1
jm
N [x , jm , U jm ]
m =1
jm
= 1
, N and m= 1,
,M
Model Topologies
Ergodic - Fully connected, each state
has transition to every other state
35