Академический Документы
Профессиональный Документы
Культура Документы
Sadaoki Furui
Tokyo Institute of Technology Department of Computer Science 2-12-1, O-okayama, Meguro-ku, Tokyo, 152-8552 Japan Tel/Fax: +81-3-5734-3480 furui@cs.titech.ac.jp
100%
Read Speech
10%
Noisy
Resource Management Courtesy NIST 1999 DARPA HUB-4 Report, Pallett et al.
1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
1%
Spontaneous speech Fluent speech Read speech Connected speech Isolated words 2 word spotting digit strings system driven dialogue name dialing
2-way dialogue
Speaking style
voice commands 20
directory assistance
1990
Unrestricted
Speech input Acoustic analysis x1 ... xT Global search: Maximize P (x1... xT | w1... wk )P(w1... wk ) over w1 ... wk Recognized Word sequence P(x1... xT | w1... wk ) Phoneme inventory Pronunciation lexicon P(w1 ... wk ) Language model
SBR, MLLR
Phoneme inventory Pronunciation lexicon Language model
Global search
4,000 3,500
3,000 2,500 2,000 1,500 1,000 500 0 1995 2000 2005 2010 2015
Year
64G
256G
520M
1.4G
1990
1995
2000
2005
2010
2015
18 16 14 12 Sales/Yr 10 8 6 4 2 1940 1945 1950 1955 1960 1965 1975 1985 1990 1970 1980 1995 2000 2005 0 Yr
Mainframe (one computer, many people) PC (one person, one computer) Ubiquitous computing (one person, many computers)
Ubicomp
Wearables X X
X X X
Trip (Translator)
Car (Navigation)
Recognizer
Recognizer Recognizer
Recognizer
Meeting manager
Recognizer
Lack of complete structural representations of speech Lack of data for understanding non-structural variability
Channel
Speaker Voice quality Pitch Gender Dialect Speaking style Stress/Emotion Speaking rate Lombard effect
Recognition results Evaluation Discrepancy Parameter adaptation algorithm Parameter modification instruction
Input speech Noise Feature set 1 2 M Language model 1 2 N Context Flexible decoder
Speaker model
Decision
Recognition results
World model
Spontaneous speech Fluent speech Read speech Connected speech Isolated words 2 word spotting digit strings system driven dialogue name dialing
2-way dialogue
Speaking style
voice commands 20
directory assistance
1990
Unrestricted
Speech input Acoustic analysis x1 ... xT Global search: Maximize P (x1... xT | w1... wk )P(w1... wk ) over w1 ... wk Recognized Word sequence P(x1... xT | w1... wk ) Phoneme inventory Pronunciation lexicon P(w1 ... wk ) Language model
Detector 1 Detector 2 Speech input Detector 3 Integrated search & confidence evaluation
Spontaneous speech
Speech recognition
Transcription
Overview of the Science and Technology Agency Priority Program Spontaneous Speech: Corpus and Processing Technology
Audio
Synergy (Fusion)
Gesture Sign
Gaze
Image processing
Speech recognition
Information retrieval
Ubiquitous computing Internet Mobile computing Image/motion processing Wearable computing Multimedia multimodal communication Human-computer interaction Dialog modeling
Contents
Speech understanding
Information extraction
Emerging technology