Академический Документы
Профессиональный Документы
Культура Документы
Abstract: Session- and person-independent recognition of hand and finger gestures is of utmost importance for the
practicality of gesture based interfaces. In this paper we evaluate the performance of a wearable gesture
recognition system that captures arm, hand, and finger motions by measuring movements of, and muscle
activity at the forearm. We fuse the signals of an Inertial Measurement Unit (IMU) worn at the wrist, and the
Electromyogram (EMG) of muscles in the forearm to infer hand and finger movements. A set of 12 gestures
was defined, motivated by their similarity to actual physical manipulations and to gestures known from the
interaction with mobile devices. We recorded performances of our gesture set by five subjects in multiple
sessions. The resulting datacorpus will be made publicly available to build a common ground for future
evaluations and benchmarks. Hidden Markov Models (HMMs) are used as classifiers to discriminate between
the defined gesture classes. We achieve a recognition rate of 97.8% in session-independent, and of 74.3% in
person-independent recognition. Additionally, we give a detailed analysis of error characteristics and of the
influence of each modality to the results to underline the benefits of using both modalities together.
Table 1: Comparison of investigated research questions. This table shows what kind of gesture recognition tasks were evalu-
ated in the work related to this paper, and which modalities were used.
session dependent session independent person-independent
EMG IMU combined EMG IMU combined EMG IMU combined
(Li et al., 2010) - - - - - x - - -
(Wolf et al., 2013) - - x - - - - - -
(Zhang et al., 2011) x x x - - - - - x
(Chen et al., 2007) x x x - - - - - -
(Saponas et al., 2008) - - - x - - x - -
(Kim et al., 2008) x - - - - - x - -
(Samadani and Kulic, 2014) - - - - - - x - -
This paper - - - x x x x x x
recognition, which is of importance for practical and is hard to estimate the transferability of these results
usable gesture based interfaces. Session-independent to other tasks, as a very task specific recognizer de-
recognition surveys how well the system can be sign is used, in contrast to a more general design, as it
attuned to a user. A system with high session- is used in this paper and most other studies and most
independent accuracy can be used by one person with- other studies.
out training it each time it is used. This makes the In (Wolf et al., 2013) two different approaches
system ready to use by just mounting the sensors and for gesture recognition are presented. One uses a
starting the recognition. Person-independent recog- SVM to distinguish between 17 static gestures with
nition is relevant in regard to an envisioned system 96.6% accuracy. It discriminates these 17 gesture
that can be used without prior training by some new classes based on EMG and uses an additional ori-
user. Instead, it would use exemplary training data entation estimation based on IMU signals to distin-
from a selected, representative group, making explicit guish whether the gestures were performed with a
training sessions superfluous. Such a system could hanging, raised or neutral arm, increasing the num-
nonetheless still benefit from further adaption to an ber of classes that can be interpreted by a factor of
individual user. three. To decode dynamic gestures, they combine the
The evaluations of this paper will be based on ges- features of both modalities to a single feature vector.
tures that were designed to resemble actual physical Again, they achieve a high accuracy of 99% for a set
manipulations, as well as gestures known from the of nine gestures. However, both session-independent
interaction with mobile devices. They were not de- and person-independent performance are not evalu-
signed to be optimally distinguishable, but rather to ated for both techniques, and the contribution of the
represent useful gestures for real-world interfaces. individual modalities is not discussed.
The recorded dataset will be made publicly avail- Zhang et al. (Zhang et al., 2011) show, that it
able to be used by others as a common ground for is possible to construct a robust, person-independent,
the development and evaluation of gesture recogni- gesture based interface using an IMU and EMG to
tion systems based on IMUs and EMG. manipulate a Rubiks Cube. They report a person-
independent accuracy of 90.2% for 18 different ges-
1.1 Related Work ture classes. One has to note that this gesture set was
designed to include only three different static hand
The simultaneous usage of an IMU and EMG for ges- postures that were hard to detect with only an IMU,
ture based interfaces was also evaluated in a small set and that, similar to (Wolf et al., 2013), the direc-
of other studies, that are listed and compared in Ta- tion of movement with the whole arm then increased
ble 1. The comparison of the various different ap- the number of gestures by a factor of six. In com-
proaches to the task of gesture recognition based only parison, we use a set of twelve different gestures,
on reported recognition accuracies is hard, as different with dynamic postures and movement. Additionally,
gesture sets were designed and a variety of recording they discuss the contribution of the individual modal-
setups and hardware was used. ities IMU and EMG in a CSL recognition task sim-
In (Li et al., 2010) IMUs and EMG are used ilar to (Li et al., 2010), but do this only for session-
for the automatic recognition of sign language sub- dependent recognition.
words, specifically of Chinese sign language (CSL) Chen et al. (Chen et al., 2007) also com-
subwords. On a vocabulary of 121 subwords a high pare the session-dependent gesture recognition per-
accuracy of 95.78% is achieved, but only session- formance for various gesture sets when using 2D-
independent accuracy was evaluated. Additionally, it accelerometers and two EMG channels individually
100
RecognizingHandandFingerGestureswithIMUbasedMotionandEMGbasedMuscleActivitySensing
as well as in combination. When using both modali- from such gestures by allowing a user to manipualte
ties combined, they report accuracy improvements of displayed virtual objects similar to real physical ob-
5 10% on various gesture sets with up to 12 classes. jects.
Again, this evaluation is not done for the session- and As a result we defined the list of gestures in Table
person-independent case. 2 to be the list of gestures to be recognized. These
As we evaluate single-modality performance for gestures involve both movements of the fingers, as
person-independent recognition, we also evaluate the well as of the whole hand.
performance when using only EMG. Only very few
studies have reported on this so far (Saponas et al., 2.2 Experimental Procedure
2008; Kim et al., 2008; Samadani and Kulic, 2014),
often with lacking results, or only on small gesture 2.2.1 Hardware
sets.
Samadani and Kulic (Samadani and Kulic, 2014) Inertial Measurement Unit
use eight dry EMG sensors in a prototypic commer-
cial armband from Thalmic Labs1 to capture EMG A detailed description of the sensor we use to cap-
readings from 25 subjects. In person-independent ture acceleration and angular velocity of the forearm
recognition, they achieve 49% accuracy for a gesture during the experiments can be found in (Amma et al.,
set with 25 gestures. Additionally, they report 79%, 2014). It consists of a 3D accelerometer, as well as
85%, and 91% accuracy for select gesture sets with a 3D gyroscope. We transmit the sensor values via
10, 6, and 4 gestures, respectively. Bluetooth and sample at 81.92 Hz.
EMG Sensors
2 DATASET
To record the electrical activity of the forearm mus-
cles during the movements of the hand, two biosig-
2.1 Gesture Set nalsplux devices from Plux2 are used. These devices
allow the simultaneous recording of up to eight chan-
To get a reliable performance estimate for practical nels simultaneously per device. Both devices are syn-
interfaces, we define a set of gestures that is a rea- chronized on hardware level via an additional digi-
sonable choice for gesture based interfaces. Dif- tal port. One additional port is reserved to connect
ferent studies, e.g. (Alibali, 2005; Hauptmann and a reference or ground electrode. Each bipolar chan-
McAvinney, 1993), show that spatial gestures can nel measures an electrical potential using two self-
convey unique information not included in linguis- adhesive and disposable surface electrodes. The de-
tic expressions when used to express spatial informa- vices sample at 1000 Hz and recorded values have a
tions. Therefore we use spatial gestures, as an inter- resolution of 12 bit.
face based on such gestures could be beneficial for
mobile and wearable usage when used complemen- 2.2.2 Subject Preparation
tary to speech recognition.
Hauptmann et al. (Hauptmann and McAvinney, Each subject that was recorded received an explana-
1993) evaluate what kind of gestures were performed tion of what was about to happen during the experi-
by subjects trying to solve spatial tasks. In a Wiz- ments. Afterwards 32 electrodes were placed in a reg-
ard of Oz experiment they collected statistical values ular pattern on both the upper and lower side of their
of the amount of fingers and hands that were used to forearm. This pattern can be seen in Figure 1. The
solve tasks related to the graphical manipulations of position of the electrodes near the elbow was chosen,
objects. as the flexors and extensors for all fingers apart from
We compared their statistical values with gestures the thumb are mainly located in this area. (M. exten-
commonly occurring during the actual manipulation sor digitorum communis, M. extensor digiti minimi,
of real objects or whilst using touchscreens or touch- M. extensor pollicis brevis, M. flexor digitorum pro-
pads. On the one hand, we hope that users can intu- fundus and M. flexor digitorum superficialis). These
itively use these gestures due to both their level of fa- muscles are oriented largely parallel to the axis be-
miliarity, as well as their similitude to physical inter- tween elbow and wrist. Therefore we applied the elec-
actions. On the other hand, we assume that interfaces trodes as eight parallel rows around the arm, each row
like virtual or augmented reality glasses might benefit consisting of four electrodes. From each row, the two
1 Thalmic Labs Inc., www.thalmic.com 2 PLUX wireless biosignals S.A., www.plux.info
101
BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing
102
RecognizingHandandFingerGestureswithIMUbasedMotionandEMGbasedMuscleActivitySensing
103
BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing
In our system, continuous density Hidden Markov For the session-independent evaluation, testing is
Models (HMMs) are used to model each of the ges- done using cross-validation individually for each sub-
ture classes. For an introduction to HMMs refer to ject. The training set for each validation fold con-
(Rabiner, 1989). We chose linear left-to-right topolo- sists of all sessions but one of a subject. The re-
gies to represent the sequential nature of the recorded maining session is used as the test set. We achieve
signals and Gaussian Mixture Models (GMMs) to 97.8%(1.79%) recognition accuracy as an average
model the observation probability distribution. Em- for all evaluated subjects.
pirically we found topologies with five states and Figure 5 displays the individual recognition per-
GMMs with five Gaussian components to deliver the formance for each subject. With all of the five sub-
best performance. The Gaussian components have di- jects achieving more than 94% accuracy, the session
agonal covariance matrices to avoid overfitting to the independent recognition yields very satisfying results.
data, as the number of free model parameters is then Additionally, Figure 5 shows the recognition accuracy
only linearly dependent on the number of features, in when modalities are used individually. This will be
contrast to the quadratic dependence for full matrices. discussed later on.
Provided a sufficiently large dataset, in future work Figure 6 shows the confusion matrix for the differ-
the usage of unrestricted covariance matrices might ent gesture classes. It has a strong diagonal character,
further improve the gesture modeling by representing which is fitting the high overall accuracy. A block for-
the correlation between the different channels. To fit mation at the crossing between the rows and columns
the models to the training data, we initialize them us- corresponding to single click and double click illus-
ing kMeans and use Viterbi Training afterwards. trates that these gestures are occasionally confused
with each other. Only one small movement of the
104
RecognizingHandandFingerGestureswithIMUbasedMotionandEMGbasedMuscleActivitySensing
Figure 6: Confusion matrix for session-independent gesture Figure 7: Confusion matrices for session-independent ges-
recognition. The matrix shows what hypotheses were given ture recognition. The recognizer used only the features ex-
for different gestures that were fed to the recognizer. The tracted from the IMU signals. The labels follow Table 2.
labels follow Table 2.
105
BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing
106
RecognizingHandandFingerGestureswithIMUbasedMotionandEMGbasedMuscleActivitySensing
5 CONCLUSION
In this paper we present a system for recognizing hand
and finger gestures with IMU based motion and EMG
based muscle activity sensing. We define a set of
twelve gestures and record performances of these ges-
tures by five subjects in 25 sessions total. The result-
ing data corpus will be made publicly available. We
built a baseline recognition system using HMMs to
Figure 11: Confusion matrices for person-independent ges-
ture recognition. The recognizer used only the features ex-
evaluate the usability of such a system. We achieve a
tracted from the IMU signals. The labels follow Table 2. high accuracy of 97.8% for session-independent and
an accuracy of 74.3% for person-independent recog-
nition, which still has to be considered rather high
for person-independent gesture recognition on twelve
classes. With that, we show both the feasibility of
using IMUs and EMG for gesture recognition, and
the benefits from using them in combination. We also
evaluate the contribution of the individual modalities,
to discuss their individual strengths and weaknesses
for our task.
ACKNOWLEDGEMENTS
This research was partly funded by Google through a
Google Faculty Research Award.
Figure 12: Confusion matrices for person-independent ges-
ture recognition. The recognizer used only the features ex-
tracted from the EMG signals. The labels follow Table 2. REFERENCES
negative count are now rather high. Alibali, M. W. (2005). Gesture in spatial cognition: Ex-
The diagonal character of the confusion matrix in pressing, communicating, and thinking about spa-
tial information. Spatial Cognition Computation,
Figure 12, representing recognition using only EMG 5(4):307331.
features, is rather weak. Interestingly, no predom-
Amma, C., Georgi, M., and Schultz, T. (2014). Airwriting:
inant attractor classes are present. Also no distinct a wearable handwriting recognition system. Personal
block formations are visible, meaning that there is no and Ubiquitous Computing, 18(1):191203.
pronounced clustering of different gesture groups in Benbasat, A. Y. and Paradiso, J. A. (2001). An inertial mea-
feature space. Instead, inter-person variance leads to surement framework for gesture recognition and ap-
many confusions scattered across the confusion ma- plications. In Wachsmuth, I. and Sowa, T., editors, Re-
trix. The gestures flick upwards and fist both performe vised Papers from the International Gesture Workshop
exceedingly well using only EMG, thereby explain- on Gesture and Sign Languages in Human-Computer
ing why they also perform well using both modalities Interaction, volume LNCS 2298 of Lecture Notes in
Computer Science, pages 920. Springer-Verlag.
combined.
Chen, X., Zhang, X., Zhao, Z., Yang, J., Lantz, V., and
We expected person-independent gesture recog- Wang, K. (2007). Hand gesture recognition research
nition with EMG to be a hard task, as EMG sig- based on surface emg sensors and 2d-accelerometers.
nals vary from person to person and produce patterns In Wearable Computers, 2007 11th IEEE Interna-
over the different channels that are hard to compare. tional Symposium on, pages 1114.
107
BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing
Cho, S.-J., Oh, J.-K., Bang, W.-C., Chang, W., Choi, E., Wolf, M. T., Assad, C., Stoica, A., You, K., Jethani, H.,
Jing, Y., Cho, J., and Kim, D.-Y. (2004). Magic wand: Vernacchia, M. T., Fromm, J., and Iwashita, Y. (2013).
a hand-drawn gesture input device in 3-d space with Decoding static and dynamic arm and hand gestures
inertial sensors. In Frontiers in Handwriting Recogni- from the jpl biosleeve. In Aerospace Conference, 2013
tion, 2004. IWFHR-9 2004. Ninth International Work- IEEE, pages 19.
shop on, pages 106111. Zhang, X., Chen, X., Li, Y., Lantz, V., Wang, K., and Yang,
Hartmann, B. and Link, N. (2010). Gesture recognition J. (2011). A framework for hand gesture recognition
with inertial sensors and optimized dtw prototypes. In based on accelerometer and emg sensors. Systems,
Proceedings of the IEEE International Conference on Man and Cybernetics, Part A: Systems and Humans,
Systems, Man and Cybernetics, Istanbul, Turkey, 10- IEEE Transactions on, 41(6):10641076.
13 October 2010, Proceedings of the IEEE Interna-
tional Conference on Systems, Man and Cybernetics,
Istanbul, Turkey, 10-13 October 2010, pages 2102
2109. IEEE.
Hauptmann, A. G. and McAvinney, P. (1993). Gestures with
speech for graphic manipulation. International Jour-
nal of ManMachine Studies, 38(2):231249.
Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J.,
Oikonomidis, I., and Olivier, P. (2012). Digits: Free-
hand 3d interactions anywhere using a wrist-worn
gloveless sensor. In Proceedings of the 25th An-
nual ACM Symposium on User Interface Software and
Technology, UIST 12, pages 167176, New York,
NY, USA. ACM.
Kim, J., Mastnik, S., and Andre, E. (2008). Emg-based
hand gesture recognition for realtime biosignal inter-
facing. In Proceedings of the 13th International Con-
ference on Intelligent User Interfaces, volume 39 of
IUI 08, pages 3039, New York, NY, USA. ACM
Press.
Li, Y., Chen, X., Tian, J., Zhang, X., Wang, K., and Yang, J.
(2010). Automatic recognition of sign language sub-
words based on portable accelerometer and emg sen-
sors. In International Conference on Multimodal In-
terfaces and the Workshop on Machine Learning for
Multimodal Interaction, ICMI-MLMI 10, pages 17
1, New York, NY, USA. ACM.
Mistry, P., Maes, P., and Chang, L. (2009). WUW-wear Ur
world: a wearable gestural interface. Proceedings of
CHI 2009, pages 41114116.
Oviatt, S. (1999). Ten myths of multimodal interaction.
Communications of the ACM, 42(11):7481.
Rabiner, L. (1989). A tutorial on hidden markov models
and selected applications in speech recognition. Pro-
ceedings of the IEEE, 77(2):257286.
Rekimoto, J. (2001). GestureWrist and GesturePad: un-
obtrusive wearable interaction devices. Proceedings
Fifth International Symposium on Wearable Comput-
ers.
Samadani, A.-A. and Kulic, D. (2014). Hand gesture recog-
nition based on surface electromyography. In En-
gineering in Medicine and Biology Society (EMBC),
2014 36th Annual International Conference of the
IEEE.
Saponas, T. S., Tan, D. S., Morris, D., and Balakrishnan, R.
(2008). Demonstrating the feasibility of using forearm
electromyography for muscle-computer interfaces. In
Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, CHI 08, pages 515
524, New York, New York, USA. ACM Press.
108