Speech Translation System

Mobile Speech
Translation
Systems Design for
2020
11/19/2013 INST603 Term Project
MIM, UMD Makoto Asami
Table of Contents
Overview of the Project
Outline of Speech Translation Systems
Automatic Speech Recognition (Speech-to-Text)
Machine Translation
Voice Synthesis (Text-to-Speech)
Google Translate (for iOS) - Current Mobile Speech Translation Systems How Mobile Speech Translation Systems Work (Online)
Forecast of Mobile Speech Translation Systems in 2020
My User Interface Design for Future Systems
Conclusion
Overview of the Project
- A System Design Practice for Mobile Speech Translation

Systems in 2020 Reasons I chose this to be the final project:
It is expected that advancement of raw computing power would
significantly improve capability of language translation systems in the
near future.
Meanwhile we generally anticipate more and more people in the world
will communicate with each other in the future. Also my home country,
Japan, expects many foreign people to come for 2020 Tokyo Olympic
Games.
Thus, this project aims to study current situation of speech translation
systems and provide feasible mobile speech translation systems solution
which would benefit people while traveling abroad and in their daily lives.
Outline of Speech Translation

Systems
A speech translation system typically integrate the following three

technologies: Automatic Speech Recognition (ASR), Machine
Translation (MT) and Voice Synthesis
(TTS).
Text, Japanese
Speech,
Japanese
Automatic
Speech
Recognition
Speech
(ASR)
recognitio
n
databases
(Japanese)
Speech,
English
Voice
synthesis
databases
(English)
Voice
Synthesis
(TTS)
Machine
Translation
(MT)
Can I reserve a
room?
Text, English
JapaneseEnglish
translation
databases
Automatic Speech Recognition

(ASR/SRT)
- Speech to Text (STT) -
Application includes Voice User Interfaces such as dictation (e.g.

Word Processors, Emails, Google Voice Recognition, medical
transcription) and Hands free computing (e.g. Windows, Siri).
Nuance Dragon NaturallySpeaking ($99.99~): Accuracy rate of 93%
CMUSphinx (Open Source Toolkit For Speech Recognition) by
Carnegie Mellon Univ.
Speaker Dependent (use training): Large-vocabulary/limited-users
(e.g. Windows Speech Recognition)
Speaker Independent (do not use training): Small-vocabulary/manyusers (e.g. automated telephone answering)
ou pu nn
fe isu
buk
Open
Faceboo
k
Often processed on clowd

Require Processing Power and
Storage
Machine Translation (MT)

Research has been continued since it began in 1951 in MIT.
The human translation process may be described as:
1. Decoding the meaning of the source text; and
2. Re-encoding this meaning in the target language.
To decode the meaning of the source text, the translator must interpret and analyze all
the features of the text. The process requires in-depth knowledge of the grammar, idioms,
etc., of the source language, as well as the culture of its speakers.
Machine Translation Approaches:
Rule-based, Transfer-based, Interlingual, Dictionary-based, Statistical, Example-based,
Hybrid (statistical + rule-based)
Inside Google Translate
Beginning in the late 1980s, as computational power increased and became less
expensive, more interest was shown in statistical models for machine translation.
Voice / Speech Synthesis

- Text to Speech (TTS) -
Artificial production of human speech from language text

Applied to screen readers as assistive technology for blind, visually
impaired person or others:
Microsoft Narrator: Navigating operations on Windows
NaturalReader (NaturalSoft): Free version available. Text
(Webpages, PDF files, Emails, ) to spoken words.
Also applied to entertainment: games and animations
Can I
reserve a
room?
Current Mobile Speech Translation System
Google Translate (for iOS)

More than 70 languages can be translated.
Free to download and use.
Requires internet connection.
Offline mode is available for Android (2.3+)
Users can speak, type or handwrite text to translate.
Translated results are provided in text and speech.
Transcribes and translates speedy, provided
sufficient network speed.
Keeps history.
How Mobile Speech Translation

Systems Work (Online)
<network
>
more than
352Kbps
required
Forecast of Mobile Speech

Translation Systems in 2020
Since the 1950s, a number of scholars have questioned the
possibility of achieving fully automatic machine translation of high
quality. Some critics claim that there are in-principle obstacles to
automatizing the translation process.
When a human translator need a whole workday to translate five
pages, about 10% of an average text requires him/her to research,
which requires six [more] hours of work. Accomplishing this with
machines would require a higher degree of AI than has yet been
attained.
Architecture would improve, but will still be imperfect.
Forecast of Mobile Speech

Translation Systems in 2020
Network
penetrat
ion
CPU
Mobile
CPU
Server
Storag
e
Mobile
Storage
Network
Cost
Accuracy
Online
High
Offline
Low
Online Systems in 2020:

Development of processing power of CPU and server storage would improve accuracy of speech
recognition and translation (although not perfect yet).
Improvement of network penetration would expand usable areas.
Network connection costs.
Offline Systems in 2020:
Development of processing power of mobile CPU and mobile storage would improve accuracy of
speech recognition and translation (although not as accurate as online systems).
No connection cost, can be used anywhere without network.
Both online and offline systems will be used.
My User Interface Design for Future

Systems
Users can correct misrecognition of the system by writing or choosing from alternatives.
Speakers can confirm what they say is recognized correctly.
Users can see different patterns of interpretation and choose according to the context.
The same sentence could be interpreted differently in different situation.
Corresponding words or phrases are shown in the same color.
Users could better understand the language rather than just receive the result.
[Movie] Fail-Proof
Speech Translation
System User Interface
Design
Conclusion
Considering complex nature of human language communication, Speech
Translation Systems in 2020 will still be imperfect.
It is essential for the systems to have fail-proof user interface to avoid
critical misunderstanding.
Learning of foreign languages will continuously be important, so the
systems should be designed not to be solely relied on but to assist users to
improve their knowledge.
Development of the Systems will increase overall population of people who
can communicate with foreigners. Thus peoples eyes will be more opened to
international community and we will be mentally closer to each other in 2020.
We need to anticipate social impact of this.
Reference
Speech Translation
Overcoming the Language Barrier with Speech Translation Technology (April 2009) http
://www.nistep.go.jp/achiev/ftx/eng/stfc/stt031e/qr31pdf/STTqr3103.pdf
Google Translate For Android Gets Offline Mode With Support For 50 Languages (Mar 27, 2013) http
://techcrunch.com/2013/03/27/google-translate-offline-mode/
[iTunes Preview] Google Translate https://itunes.apple.com/ca/app/google-translate/id414706506
[Toshiba] Research and Development Center, News Release (Japanese) http://www.toshiba.co.jp/rdc/rd/detail_j/0912_03.htm
A Speech Translation System with Mobile Wireless Clients http://aclweb.org/anthology//P/P03/P03-2023.pdf
Automatic Speech Recognition (ASR)
[Wikipedia] Speech recognition
http://en.wikipedia.org/wiki/Automatic_speech_recognition
[howstuffworks] How Speech Recognition Works http://
electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm
[Windows] Set up Speech Recognition http://windows.microsoft.com/en-us/windows7/set-up-speech-recognition
[TopTenReviews] Voice Recognition Software Review http://voice-recognition-software-review.toptenreviews.com/
Machine Translation (MT)
[Wikipedia] Machine Translation http://en.wikipedia.org/wiki/Machine_translation
Why your smartphone will NEVER be a universal translator http://www.fluentin3months.com/translator-app/
Speech Synthesis
[Wikipedia] Speech synthesis http://en.wikipedia.org/wiki/Speech_synthesis
[YouTube] Using Narrator the basic screen reading tool built into MS Windows http://www.youtube.com/watch?v=0mACOm0SuhE

Speech Translation System

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Speech Translation System

Загружено:

Авторское право:

Доступные форматы

Mobile Speech

Overview of the Project

- A System Design Practice for Mobile Speech Translation

Outline of Speech Translation

A speech translation system typically integrate the following three

Automatic Speech Recognition

Application includes Voice User Interfaces such as dictation (e.g.

Often processed on clowd

Machine Translation (MT)

Voice / Speech Synthesis

Artificial production of human speech from language text

Current Mobile Speech Translation System

Google Translate (for iOS)

How Mobile Speech Translation

Forecast of Mobile Speech

Forecast of Mobile Speech

Online Systems in 2020:

My User Interface Design for Future

Вам также может понравиться