Вы находитесь на странице: 1из 16

Mobile Speech

Translation
Systems Design for
2020
11/19/2013 INST603 Term Project
MIM, UMD Makoto Asami

Table of Contents
Overview of the Project
Outline of Speech Translation Systems
Automatic Speech Recognition (Speech-to-Text)
Machine Translation
Voice Synthesis (Text-to-Speech)
Google Translate (for iOS) - Current Mobile Speech Translation Systems How Mobile Speech Translation Systems Work (Online)
Forecast of Mobile Speech Translation Systems in 2020
My User Interface Design for Future Systems
Conclusion

Overview of the Project

- A System Design Practice for Mobile Speech Translation


Systems in 2020 Reasons I chose this to be the final project:
It is expected that advancement of raw computing power would
significantly improve capability of language translation systems in the
near future.
Meanwhile we generally anticipate more and more people in the world
will communicate with each other in the future. Also my home country,
Japan, expects many foreign people to come for 2020 Tokyo Olympic
Games.
Thus, this project aims to study current situation of speech translation
systems and provide feasible mobile speech translation systems solution
which would benefit people while traveling abroad and in their daily lives.

Outline of Speech Translation


Systems

A speech translation system typically integrate the following three


technologies: Automatic Speech Recognition (ASR), Machine
Translation (MT) and Voice Synthesis
(TTS).
Text, Japanese
Speech,
Japanese

Automatic
Speech
Recognition
Speech
(ASR)
recognitio
n
databases
(Japanese)

Speech,
English

Voice
synthesis
databases
(English)
Voice
Synthesis
(TTS)

Machine
Translation
(MT)
Can I reserve a
room?
Text, English

JapaneseEnglish
translation
databases

Automatic Speech Recognition


(ASR/SRT)
- Speech to Text (STT) -

Application includes Voice User Interfaces such as dictation (e.g.


Word Processors, Emails, Google Voice Recognition, medical
transcription) and Hands free computing (e.g. Windows, Siri).
Nuance Dragon NaturallySpeaking ($99.99~): Accuracy rate of 93%
CMUSphinx (Open Source Toolkit For Speech Recognition) by
Carnegie Mellon Univ.
Speaker Dependent (use training): Large-vocabulary/limited-users
(e.g. Windows Speech Recognition)
Speaker Independent (do not use training): Small-vocabulary/manyusers (e.g. automated telephone answering)

ou pu nn
fe isu
buk

Open
Faceboo
k

Often processed on clowd


Require Processing Power and
Storage

Machine Translation (MT)


Research has been continued since it began in 1951 in MIT.
The human translation process may be described as:
1. Decoding the meaning of the source text; and
2. Re-encoding this meaning in the target language.
To decode the meaning of the source text, the translator must interpret and analyze all
the features of the text. The process requires in-depth knowledge of the grammar, idioms,
etc., of the source language, as well as the culture of its speakers.
Machine Translation Approaches:
Rule-based, Transfer-based, Interlingual, Dictionary-based, Statistical, Example-based,
Hybrid (statistical + rule-based)
Inside Google Translate
Beginning in the late 1980s, as computational power increased and became less
expensive, more interest was shown in statistical models for machine translation.

Voice / Speech Synthesis


- Text to Speech (TTS) -

Artificial production of human speech from language text


Applied to screen readers as assistive technology for blind, visually
impaired person or others:
Microsoft Narrator: Navigating operations on Windows
NaturalReader (NaturalSoft): Free version available. Text
(Webpages, PDF files, Emails, ) to spoken words.
Also applied to entertainment: games and animations
Can I
reserve a
room?

Current Mobile Speech Translation System

Google Translate (for iOS)


More than 70 languages can be translated.
Free to download and use.
Requires internet connection.
Offline mode is available for Android (2.3+)
Users can speak, type or handwrite text to translate.
Translated results are provided in text and speech.
Transcribes and translates speedy, provided
sufficient network speed.
Keeps history.

How Mobile Speech Translation


Systems Work (Online)
<network
>
more than
352Kbps
required

Forecast of Mobile Speech


Translation Systems in 2020
Since the 1950s, a number of scholars have questioned the
possibility of achieving fully automatic machine translation of high
quality. Some critics claim that there are in-principle obstacles to
automatizing the translation process.
When a human translator need a whole workday to translate five
pages, about 10% of an average text requires him/her to research,
which requires six [more] hours of work. Accomplishing this with
machines would require a higher degree of AI than has yet been
attained.
Architecture would improve, but will still be imperfect.

Forecast of Mobile Speech


Translation Systems in 2020
Network
penetrat
ion

CPU

Mobile
CPU

Server
Storag
e

Mobile
Storage

Network
Cost

Accuracy

Online

High

Offline

Low

Online Systems in 2020:


Development of processing power of CPU and server storage would improve accuracy of speech
recognition and translation (although not perfect yet).
Improvement of network penetration would expand usable areas.
Network connection costs.
Offline Systems in 2020:
Development of processing power of mobile CPU and mobile storage would improve accuracy of
speech recognition and translation (although not as accurate as online systems).
No connection cost, can be used anywhere without network.
Both online and offline systems will be used.

My User Interface Design for Future


Systems

Users can correct misrecognition of the system by writing or choosing from alternatives.
Speakers can confirm what they say is recognized correctly.
Users can see different patterns of interpretation and choose according to the context.
The same sentence could be interpreted differently in different situation.
Corresponding words or phrases are shown in the same color.
Users could better understand the language rather than just receive the result.

[Movie] Fail-Proof
Speech Translation
System User Interface
Design

Conclusion
Considering complex nature of human language communication, Speech
Translation Systems in 2020 will still be imperfect.
It is essential for the systems to have fail-proof user interface to avoid
critical misunderstanding.
Learning of foreign languages will continuously be important, so the
systems should be designed not to be solely relied on but to assist users to
improve their knowledge.
Development of the Systems will increase overall population of people who
can communicate with foreigners. Thus peoples eyes will be more opened to
international community and we will be mentally closer to each other in 2020.
We need to anticipate social impact of this.

Reference
Speech Translation

Overcoming the Language Barrier with Speech Translation Technology (April 2009) http
://www.nistep.go.jp/achiev/ftx/eng/stfc/stt031e/qr31pdf/STTqr3103.pdf
Google Translate For Android Gets Offline Mode With Support For 50 Languages (Mar 27, 2013) http
://techcrunch.com/2013/03/27/google-translate-offline-mode/
[iTunes Preview] Google Translate https://itunes.apple.com/ca/app/google-translate/id414706506
[Toshiba] Research and Development Center, News Release (Japanese) http://www.toshiba.co.jp/rdc/rd/detail_j/0912_03.htm
A Speech Translation System with Mobile Wireless Clients http://aclweb.org/anthology//P/P03/P03-2023.pdf
Automatic Speech Recognition (ASR)
[Wikipedia] Speech recognition
http://en.wikipedia.org/wiki/Automatic_speech_recognition
[howstuffworks] How Speech Recognition Works http://
electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm
[Windows] Set up Speech Recognition http://windows.microsoft.com/en-us/windows7/set-up-speech-recognition
[TopTenReviews] Voice Recognition Software Review http://voice-recognition-software-review.toptenreviews.com/
Machine Translation (MT)
[Wikipedia] Machine Translation http://en.wikipedia.org/wiki/Machine_translation
Why your smartphone will NEVER be a universal translator http://www.fluentin3months.com/translator-app/
Speech Synthesis
[Wikipedia] Speech synthesis http://en.wikipedia.org/wiki/Speech_synthesis
[YouTube] Using Narrator the basic screen reading tool built into MS Windows http://www.youtube.com/watch?v=0mACOm0SuhE

Вам также может понравиться