Вы находитесь на странице: 1из 3

Programming with Microphone and Gloves

using Speech and Gesture Recognition


Subhajit Sahu

Introduction
Programming can often be a tedious and repetitive task, and when continued for long hours
everyday might eventually cause Repetitive Strain Injury (RSI). RSI is a serious problem that
causes loss of productivity, and pain in the hands. As employment in the software industry
grows, cases of RSI can reach an alarming number. Besides this, there has been rise in the
number of chronic lifestyle diseases such as obesity, diabetes, hypertension, and
cardiovascular diseases. This project focuses on studying the effect of replacing traditional
computer interfaces (mouse, keyboard, monitor) with new emerging technologies: speech
recognition (headset), gesture recognition (motion glove), and wearable display (smart watch)
to enable people to be more active at work. This could pave way to improved productivity in
many industries where a high degree of automation has taken place.

Review of Literature
Speech recognition systems are old, beginning in 1952 at Bell Labs. In the 1960s, Reddy
developed a continuous speech recognition system [1]. His research group used Hidden
Markov Model (HMM) which became the dominant algorithm in the 1980s. Huang [2]
developed a speaker-independent, large vocabulary, continuous speech recognition system.
Today, much of speech recognition utilizes deep learning with Long short-term memory.
Deep feedforward networks for acoustic modeling introduced by Hinton and Deng [3],
decreased word error rate by 30%. Yamamoto [4] implemented a system for C programming
by voice. Rudd [5] used Dragon NaturallySpeaking, Dragonfly, and custom voice-commands
to write python code in Emacs editor alongwith code browsing, selection, editing, and
templates. Williams-King [6] developed open-source Silvius using Kaldi speech recognition
toolkit and Voxforge, Tedlium speech models. Interest in natural input devices, such as
gloves, began in the late 1970s using various methods of hand-tracking [7]. Rosenberg, Slater
introduced a chording glove usable as replacement to keyboard [8]. A low cost visual motion
glove was presented by Han to interpret hand gestures for virtual reality [9]. KeyGlove by
Rowberg [10] is a wearable, wireless, open-source input device. It uses customizable touch
combinations and gestures to enter text data, control the mouse, switch between applications,
perform multiple operations with a single action, and even play games.
Objective and Scope of study
While speech-based programming systems provide an excellent hands-free way to program
by voice, they rely entirely on it. If used continuously, it can lead to fatigue of vocal cords.
Chording gloves can be used for writing text, but they are not fast enough to be used as a
replacement for keyboard, and are difficult to learn.

The objective of this project is to find an optimal combination of these emerging technologies
to minimize chances of RSI and reduce sedentary work. A virtual screen-space wearable
display (using smartwatch), and text-to-speech audio (using headset) for communicating the
necessary information will be studied. Programming is one of the activities that necessitates
the user to make use of keyboard and mouse. Experiments regarding effectiveness of a
particular medium will be made with respect to JavaScript language. Writing, understanding,
and modifying programs would be tested. Voice-commands/gestures for common activities
will be added. Experiments will be graded with respect to the amount of vocal/wrist strain
intuitiveness and speed in order to find the optimal approach. As mentioned above, tests will
be carried out with fixed, portable and no display in order to find ways to promote walking
while at work. Existing technology, software or blueprints will be used in most cases.
Prototypes could be developed as per necessity for completion of experiments.

Methodology
The study will start with programming, making use of speech recognition via microphone,
and display. Review of open source speech recognition systems will be carried out,
comparing them with commercial digital assistants based on accuracy, speed. Different ways
of dictating keywords, identifiers in different cases, templates for blocks (conditions, loops,
functions, classes), commands (scrolling, selecting, modifying), and control options (saving
file, accessing terminal) will be tested. To minimize use of display, tests would be made to
find effective ways for writing/modifying complex programs without display. Volunteers
would be tested to see how long session can they manage with such system (with/without
display) versus traditional keyboard-mouse system.

Prototypes of KeyGlove, algorithms for their sensitivity, specificity and speed in detection of
gestures will be tested. Useful gesture commands will be added. Writing complex programs
could be tried with/without display. Tests will be repeated with a system combining use of
speech and gesture recognition. Interface (voice commands, gestures) would remain similar,
and participants will be encouraged to make use of both features. Results with/without
display will be compared with individual systems. Different display technologies, along with
a virtual screen-space concept will be studied. Such a display would be made by projecting a
part of virtual screen-space, allowing for much wider virtual display on a narrow display
device. This could be tested on monitor, projector, hand-strapped mobile and smartwatch.
Programming test will be carried out while standing and walking.
Possible outcome
The results of each experiment carried out, methods used to grade them, voice+gesture
commands sheets, methods used to develop sheets, blueprint of prototypes developed, and the
software used for the combined system would be published. An open source extension to a
popular code-editor for Javascript might be developed. A product will be developed which
could prove to be useful for programmers, and encourage people to be active even while
working on non-physically exerting tasks. As with science, further research would always be
necessary for determining more natural human computer interfaces for other existing
programming languages. There would be need for development of standards, and possibly for
the introduction of a voice+gesture oriented programming language.

References
1. Juang B.H.; Rabiner L.R. (2004). "Automatic Speech Recognition – A Brief History
of the Technology Development". Elsevier Encyclopedia of Language and
Linguistics.
2. Huang, X.; Baker, J.; Reddy, R. (2014). "A Historical Perspective of Speech
Recognition". Communications of the ACM, Vol. 57 No. 1, Pages 94-103,
10.1145/2500887.
3. Hinton, G.; Deng, L.; et al. (2012). "Deep Neural Networks for Acoustic Modeling in
Speech Recognition: The shared views of four research groups". IEEE Signal
Processing Magazine. 29 (6): 82–97.
4. Han, Y. "A low-cost visual motion data glove as an input device to interpret human
hand gestures," in IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp.
501-509, May 2010.
5. Rudd, T. (2013). "Using Python to Code by Voice". PyCon US 2013.
6. Williams-King, D. (2016). "Coding by Voice with Open Source Speech Recognition".
The Eleventh HOPE.
7. Yamamoto, M. "C program editor through voice input," Proceedings 26th Annual
International Computer Software and Applications, 2002, pp. 290-292.
8. Sturman, D. J.; Zeltzer, D. "A survey of glove-based input," in IEEE Computer
Graphics and Applications, vol. 14, no. 1, pp. 30-39, Jan. 1994.
9. Rosenberg, R.; Slater, M. "The chording glove: a glove-based text input device," in
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and
Reviews), vol. 29, no. 2, pp. 186-191, May 1999.
10. Rowberg, J. (2015). "KeyGlove, freedom in the palm of your hand".
http://www.keyglove.net.

Вам также может понравиться