Lecture 5 PDF

Speech Recognition
Lecture 5: Statistical Language Models -

Software Libray
Mehryar Mohri
Courant Institute of Mathematical Sciences
mohri@cims.nyu.edu
Software Library
GRM Library: Grammar Library. General software
collection for constructing and modifying weighted
automata and transducers representing grammars
and statistical language models (Allauzen, MM, and
Roark, 2005).
http://www.research.att.com/projects/mohri/grm
Mehryar Mohri - Speech Recognition page 2 Courant Institute, NYU

This Lecture
Counting
Model creation, shrinking, and conversion
Class-based models

Overview
Generality: to support the representation and use
of the various grammars in dynamic speech
recognition.
Efficiency: to support competitive large-vocabulary
dynamic recognition using automata of several
hundred million states and transitions.
Reliability: to serve as a solid foundation for
research in statistical language modeling.

Content
Statistical Language Models: creating, shrinking,
and converting language models.
Grammar compilation:
• weighted context-dependent rules, weighted
context-free grammars (CFGs).
• regular approximations of CFGs.
Text and grammar processing utilities:
• local grammars, suffix automata.
• local determinization.
• counting and merging.
Language Modeling Tools
Counts: automata (strings or lattices), merging.
Models:
• Backoff or deleted interpolation smoothing.

• Katz or absolute discounting.
• Knesser-Ney models.
Shrinking: weighted difference or relative entropy.
Class-based modeling: straightforward.

Corpus
Input:
Corpus Labels
hello. <s> 1
bye. </s> 2
hello. <unknown> 3
bye bye. hello 4
bye 5
Program:
farcompilestrings -i labels corpus.txt > foo.far
or cat lattice1.fsm ... latticeN.fsm > foo.far

Counting
Weights:
• use fsmpush to remove initial weight and create

a probabilistic automaton.
• counting from far files.

• counts produced in log semiring.
Algorithm:
• applies to all probabilistic automata.

• In particular, no cycle with weight zero or less.
Counting
Program:
grmcount -n 2 -s 1 -f 2 foo.far > foo.2.counts.fsm
grmmerge foo.counts.fsm bar.counts.fsm > foobar.counts.fsm
Graphical representation:
hello/2
2/0
<s>/4
bye/2
hello/2 3/0 </s>/2
0/0 1/0
</s>/4
bye/3 </s>/2
4/0 bye/1

This Lecture
Counting
Class-based models

Creating Back-off Model
Making a backoff model from counts
ram:
Program:
make foo.2g.counts.fsm > foo.2g.lm.fsm
grmmake foo.2.counts.fsm > foo.2.lm.fsm
hical Representation:
bye/1.108 </s>/0.410
</s>/0.810 0/0
!!"#$%%
1 </s>/0.005
bye/0.698
bye/1.098 4 hello/1.504
!!&#&()
!!&#'%& 2
3 hello/0.698

Shrinking
Shrinking a backoffBack-off
model Model
Program:
k -s 4 foo.2g.lm.fsm > foo.2g.s4.lm.fsm
grmshrink -s 4 foo.2.lm.fsm > foo.2.s4.lm.fsm
Representation:
</s>/0.005
</s>/0.810 0/0
bye/1.098
1 !!"#$%"
hello/0.698
hello/1.504
3
!!"#"&'
2 bye/0.698

Back-off Smoothing
Definition: for a bigram model,
 dc(wi−1 wi ) c(wi−1 wi )

if k > 0;
Pr[wi |wi−1 ] = c(wi−1 )
α Pr[wi ] otherwise;

where

1 if k > 5;
dk = (k + 1)nk+1
≈ otherwise.
knk

Conversion of Back-off Model
Interpolated model:
grmconvert foo.lm.fsm -m interpolated > foo.int.lm.fsm
Failure function model - failure class:

• efficient representation of backoff models.
• requires on-the-fly composition for decoding.
grmconvert foo.lm.fsm -e failure_transitions > foo.flm.fsm
Failure function model - basic class:
• state-splitting: correct in tropical semiring.
• pre-optimization: on-the-fly composition not
required.
grmconvert foo.lm.fsm -e state_splitting > foo.elm.fsm

This Lecture
Counting
Class-based models

Class-Based Models
Simple class-based models:
Pr[wi |h] = Pr[wi |Ci ] Pr[Ci |h].
Methods in GRM: no special utility needed.

• create transducer mapping strings to classes.
• use fsmcompose to map from word corpus to
classes.
• build and make model over classes.
• use fsmcompose to map from classes to words.
Generality: classes defined by weighted automata.
YE = {bye, bye bye}
Class-Based Model - Example
epresentation:
Example: BYE = {bye, bye bye}.
Graphical representation: mapping from strings to
classes.
hello:hello/0
bye:BYE/0.693
hello:hello/0
0/0 1/0
bye:!!"

Class-Based
Original counts
Model
Class-based counts –
- Counts
Graphical Representation
hello/2 hello/2
2/0 2/0
<s>/4 <s>/4
bye/2 BYE/2
hello/2 3/0 </s>/2 hello/2 3/0 </s>/2

1/0
0/0 1/0 0/0
</s>/4 </s>/4
bye/3 </s>/2 BYE/2 </s>/2
4/0 bye/1 4/0
Original counts. Class-based counts.

Mehryar
Weighted Mohri
Context-Free Grammars Weighted Context-Free
22 Grammars

Original Model
Models
bye/1.108 </s>/0.410
</s>/0.810 0/0
1 !!"#$%%
</s>/0.005
bye/0.698
bye/1.098 4 hello/1.504
!!&#&()
original model.
!!&#'%& 2
3 hello/0.698
Class-based model – Graphical Representation
</s>/0.005
</s>/0.693 0/0
1 !!"#$%&
</s>/0.005
BYE/0.698 BYE/1.386
Weighted Context-Free hello/1.386
Grammars
4 24
class-based model.
!!"#$%&
!!"#$%& 2
3 hello/0.698

Final Class-Based Model
Final model – Graphical Representation
bye/1.391
!!"#$%&
3 </s>/0.005
bye/0
bye/2.079
</s>/0.693
!!"#$%&
0 1 !!"#$%& </s>/0.005 4/0
5
!!"#$%&
hello/0.698 </s>/0.005
hello/1.386
2
Mohri Weighted Context-Free Grammars 25

Conclusion
GRM Library: general utilities for text and
grammar processing.
• generality: e.g., counts from arbitrary automata

or a class-based-model.
• efficiency: practical, e.g., counting lattices in

1/50th real-time.
• testing: statistical tests for reliability.

References
• Cyril Allauzen, Mehryar Mohri, and Brian Roark. Generalized Algorithms for Constructing
Statistical Language Models. In 41st Meeting of the Association for Computational Linguistics
(ACL 2003), Proceedings of the Conference, Sapporo, Japan. July 2003.
• Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a
Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3):
403-421, 2005.
• Peter F. Brown,Vincent J. Della Pietra, Peter V. deSouza, Jennifer C. Lai, and Robert L.
Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18
(4):467-479.
• Stanley Chen and Joshua Goodman. An empirical study of smoothing techniques for
language modeling. Technical Report, TR-10-98, Harvard University. 1998.
• William Gale and Kenneth W. Church. What’s wrong with adding one? In N. Oostdijk and
P. de Hann, editors, Corpus-Based Research into Language. Rodolpi, Amsterdam.
• Good, I. The population frequencies of species and the estimation of population

parameters, Biometrica, 40, 237-264, 1953.

References
• Frederick Jelinek and Robert L. Mercer. 1980. Interpolated estimation of Markov source
parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in
Practice, pages 381-397.
• Slava Katz . Estimation of probabilities from sparse data for the language model
component of a speech recognizer, IEEE Transactions on Acoustics, Speech and Signal
Processing, 35, 400-401, 1987.
• Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling.
In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing,
volume 1, pages 181-184, 1995.
• David A. McAllester, Robert E. Schapire: On the Convergence Rate of Good-Turing

Estimators. Proceedings of Conference on Learning Theory (COLT) 2000: 1-6.
• Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and
Speech Technology. pages 165-186. Kluwer Academic Publishers, The Netherlands, 2001.
• Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic
dependences in stochastic language modeling. Computer Speech and Language, 8:1-38.

References
• Kristie Seymore and Ronald Rosenfeld. Scalable backoff language models. In Proceedings of
the International Conference on Spoken Language Processing (ICSLP), 1996.
• Andreas Stolcke. 1998. Entropy-based pruning of back-off language models. In Proceedings

of DARPA Broadcast News Transcription and Understanding Workshop, pages 270-274.
• Ian H. Witten and Timothy C. Bell. The zero-frequency problem: Estimating the
probabilities of novel events in adaptive text compression, IEEE Transactions on Information
Theory, 37(4):1085-1094, 1991.

Lecture 5 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Lecture 5 PDF

Загружено:

Авторское право:

Доступные форматы

Speech Recognition

Lecture 5: Statistical Language Models -

Mehryar Mohri - Speech Recognition page 2 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 3 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 4 Courant Institute, NYU

• Backoff or deleted interpolation smoothing.

Mehryar Mohri - Speech Recognition page 6 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 7 Courant Institute, NYU

• use fsmpush to remove initial weight and create

• counting from far files.

• applies to all probabilistic automata.

hello/2 3/0 </s>/2

Mehryar Mohri - Speech Recognition page 9 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 10 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 11 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 12 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 13 Courant Institute, NYU

Failure function model - failure class:

Mehryar Mohri - Speech Recognition page 14 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 15 Courant Institute, NYU

Methods in GRM: no special utility needed.

Mehryar Mohri - Speech Recognition page 17 Courant Institute, NYU

hello/2 3/0 </s>/2 hello/2 3/0 </s>/2

bye/3 </s>/2 BYE/2 </s>/2

4/0 bye/1 4/0

Original counts. Class-based counts.

Mehryar Mohri - Speech Recognition page 18 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 19 Courant Institute, NYU

Mohri Weighted Context-Free Grammars 25

Mehryar Mohri - Speech Recognition page 20 Courant Institute, NYU

• generality: e.g., counts from arbitrary automata

• efficiency: practical, e.g., counting lattices in

• testing: statistical tests for reliability.

Mehryar Mohri - Speech Recognition page 21 Courant Institute, NYU

• Good, I. The population frequencies of species and the estimation of population

Mehryar Mohri - Speech Recognition page 22 Courant Institute, NYU

• David A. McAllester, Robert E. Schapire: On the Convergence Rate of Good-Turing

Mehryar Mohri - Speech Recognition page 23 Courant Institute, NYU

• Andreas Stolcke. 1998. Entropy-based pruning of back-off language models. In Proceedings

Mehryar Mohri - Speech Recognition page 24 Courant Institute, NYU

Вам также может понравиться