Академический Документы
Профессиональный Документы
Культура Документы
Soshi Iba, J. Michael Vande Weghe, Christiaan J. J. Paredis, and Pradeep K. Khosla
Pittsburgh, PA 15213
Wireless Ethernet
«a »
¬17
¼
F̂t « »
« » (U X r ' ,U Yr ' ) motion vector
¬« f 9 ¼» t Derivative or
Concatenation U U
( X r ' , Yr ' ) destination
ªd 0 º
D̂t « »
« »
¬«d19 ¼» t
Vector
Quantization
ct codeword
Gesture
Spotter
(thumb 1st + 2nd) the ngers would result in inadvertent robot actions.
Index Finger Curvature 6
(index 1st + 2nd) Including the classication \none of the above" allows
Middle Finger Curvature 6
(middle 1st + 2nd) the user to perform other tasks while wearing the data
Ring Finger Curvature 6
(ring 1st + 2nd) glove, without the HMM interpreting the hand mo-
Pinkie Curvature (pinkie 1st + 2nd) tions as robot commands.
$
6
Finger Spread (index middle + The algorithm described in this paper diers from
$
6
middle ring +
the standard forward-backward technique described in
ring $ pinkie
Rabiner and Juang's tutorial [10] in two important
Thumb Angle 6
$
(thumb index)
respects.
Thumb Rotation thumb rotation
Wrist Pitch wrist pitch
A rst modication allows us to classify continu-
Wrist Yaw wrist yaw
ous streams of measurements. The correspondence
of an HMM, , to an observation sequence O =
Table 1: Denition of the feature vector. O1 ; O2 ; :::; OT can be quantied as P r(O j ), which is
the probability that the observation sequence is pro-
duced by the model. However, as the length of the
positions and velocities that are physiologically feasi- sequence grows, P r(O1 ; O2 ; :::; O1 j ) decreases to
ble. The set D0::5000 is then partitioned into 32 clus- zero. A solution to this problem is to limit the ob-
ters. The algorithm by Linde, Buzo and Gray per- servation sequence to the n most recent observations,
forms the error minimization clustering by repeatedly O = Ot0n ; :::; Ot . The length of this sequence needs
estimating the centroid of a cluster and splitting it to be determined by the user.
into two. For each of the nal 32 clusters, the as-
sociated centroid in the 20-dimensional feature space The second modication allows us to reject hand
is recorded. At run time, a given feature vector is motions that do not match any of the six modeled
mapped to the codeword with the closest centroid (as gestures. In the standard HMM recognition algo-
measured in Euclidean distance). Consequently, each rithm, the gesture l with the largest condence level
P r(O j l ) is selected, where l is the model corre-
measurement of the hand position is reduced to a code-
word, and a gesture becomes a time sequence of code- sponding to the gesture l. This means that one of six
words. gestures will always be selected, unless a threshold is
set to reject all six. However, a threshold that is large
3.2 Gesture spotting using an HMM enough to reject all non-gestures, may also exclude
gestures that are performed slightly dierently from
After preprocessing the data, the gesture spotter the training gestures. This is unacceptable, because
takes a sequence of codewords and determines which dierent users tend to execute gestures dierently.
of the six gestures the user is performing, or \none of To overcome these problems, we used the HMM
the above" if no gesture is being executed. The six gesture spotter illustrated in gure 5. It contains a
gestures we chose to recognize consist of: \wait state" as the rst node that transitions to all
OPENING: Moving from a closed st to a
at the gesture models and itself with equal probability.
open hand All these transition probabilities are all equal to make
OPENED: Flat open hand the transition independent of the observation.
CLOSING: Moving from a
at open hand to a When a new observation is applied to the gesture
closed st spotter, the probability of being in the state is updated
POINTING: Moving from a
at open hand to in- for all existing states. In other words, we update all
dex nger pointing, or from a closed st to index the P r(it = qi j O; l ), the probability of being in
nger pointing state qi at time t of the state sequence I (= i1 i2 :::it ),
WAVING LEFT: Fingers extended, waving to the given the observation sequence O and the model l of
left, as if directing someone to the left gesture l. Next, the state probabilities are normalized
WAVING RIGHT: Fingers extended, waving to so that they sum to one. If the observation sequence
the right corresponds to any of the gestures, the probability of
being in the nal state of the model for this gesture
Being able to reject motions that do not correspond will be higher than that for the other gestures. On
to any of these six gestures is very important. If ev- the other hand, if the observation sequence does not
ery stream of motion data would be classied as one belong to any gestures, the probability of being in the
0.89 0.46 1.0
OPENING 0.7 0.8 0.7
0.5 0.5
7 0.4
0.5
0.4
0.4
0.13 1.0
0.3
0.2
0.3
0.2
0.3
0.2
OPENED 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
0.87
1
7 0.74
0.84 0.54 1.0
1 0.81 1.0
1 CLOSING
7
7
0.26 0.19
1
7
0.84 0.54 1.0 0.16 0.46
NONE POINTING
0.16 0.46 0.6 0.6
1 0.5 0.5
0.3 0.3
0.1 0.1
1
7 0.96 0.59 1.0
WAVING
0.04 0.41 RIGHT
Figure 5: An HMM with a shared initial state Figure 6: A sample 3-state HMM for the POINTING
gesture, shown with transition probabilities and ob-
servation probabilities
\wait state" will be the highest due to the normaliza-
tion. If the recent sequence of observations represents
a gesture, the earlier part of the sequence is trapped
in the \wait state" while subsequent observations raise
the correct gesture's probability. Therefore, the ges- ………………….
HMM
……………………....
Our implementation of the gesture spotter is based
on public domain HMM C++ code written by My-
ers and Whitson [8]. We modied this code to cal-
culate the probability for multiple models simultane-
ously, and to handle a continuous data stream. Figure 7: An HMM with a variable time window
Before an HMM gesture spotter is able to classify
a sequence, it must be trained by the user. Train-
ing by multiple users is desirable to accommodate dif- quence of the n most recent data samples to classify a
ferent ways in which users execute gestures. For ex- gesture. The system then evaluates P r(Ot0n ::Ot j l )
ample, one user may execute WAVING RIGHT much for all n to nd the gesture l that produces the max-
faster than another. For each of the six gestures, a imum score. The system was set to reject a sequence
3-state HMM was trained separately using the Baum- as \none of the above" if P r(Ot0n ::Ot j l ) for all
Welch algorithm. Because of its static nature, the gestures is less than 0.5.
OPENED gesture has been modeled with only two On the test sequence of codewords mixed with ges-
states. Thirty sample sequences were used to train tures and non-gestures, the HMM with \wait state"
each gesture. Figure 6 shows the 3-state HMM used recognized gestures with 96% accuracy (48 out of 50
to model the POINTING gesture. gesture sequences recognized correctly), whereas the
To verify the proposed HMM, the system was eval- HMM with variable window recognized all gestures.
uated on a continuous stream of test data containing However, the false-positive rate for the HMM with
both gestures and non-gestures. To compare results, variable window was 20.2 observations per 1000 sam-
we compared the HMM spotter with basic left-right ples, whereas the HMM with \wait state" had a false-
congurations that have variable windows of the n ob- positive rate of only 1.6 observations per 1000. It is
servations, as is illustrated in Figure 7. Unlike the important to keep in mind that this number does not
HMM with \wait state", this HMM does not have apply directly to the real data, since there may be
a \wait state". HMMs from each gesture take a se- other factors contributing to the errors, such as vector
quantization. 4.2 Global robot control
4 Gesture interpretation In the global robot control mode, the six gestures
have the following meanings:
Once a particular gesture has been recognized, it
needs to be mapped to a corresponding robot action. CLOSING: decelerates and eventually stops the
We have implemented two dierent modes of inter- robot
action with the robot: local control and global con- OPENING, OPENED: maintains the current
trol. In the local interaction mode, the user's gestures state of the robot
are interpreted in the robot's local frame of reference. POINTING: \go there"
For instance, pointing makes the robot move forward WAVING LEFT/RIGHT: directs the robot to-
regardless of whether the robot is facing towards or wards the direction in which the hand is waving.
away from the user. In the global interaction mode,
the position of the hand is measured in the universal The global robot control mode provides close inter-
coordinate frame allowing us to interpret the gestures action between the user and the robot. The robot is
with respect to the global frame of reference. in sight of the user, and the user can point his or her
nger to specify the destination for the robot or give it
4.1 Local robot control an additional bias by waving hands to guide the robot.
In the local robot control mode, the six gestures The CLOSING gesture is implemented in a manner
have the following meaning: similar to the local robot control mode. It cancels
the robot's destination if one exists, and brings (U X_ r0 ,
CLOSING: decelerates and eventually stops the U _
Yr0 ) to zero to stop the robot.
robot The POINTING gesture denes the destination for
OPENING, OPENED: maintains the current the robot as the location where the straight line ex-
state of the robot tending the index nger intersects with the
oor. The
POINTING: accelerates the robot robot controller generates the motion vector, (U X_ r0 ,
U _
WAVING LEFT/RIGHT: increase the rotational Yr0 ), based on the dierence between its current po-
velocity to move left/right sition and the destination.
The interpretation of the WAVING LEFT and
The purpose of the local control mode is to allow WAVING RIGHT gestures depends also on the mo-
the user to control the robot in a remote operation tion vector extracted from the Polhemus 6DOF posi-
scenario. The user watches the video image transmit- tioning system placed on the hand. When a WAVING
ted from the robot's camera and controls the robot as LEFT (or RIGHT) gesture is recognized, the robot is
if he or she were sitting on it. commanded in the direction perpendicular to the palm
A CLOSING gesture is implemented by decreasing of the hand when the hand is waving at its maximum
both linear and angular velocity of the robot. Con- speed. When the robot encounters an obstacle while
tinuous execution of a CLOSING gesture eventually heading towards its destination, the user can wave in
stops the robot's movement. The choice to make the a particular direction to dodge the obstacle.
CLOSING hand a stopping gesture was based on peo-
ple's natural reaction and the fact that it has the best 5 Summary and Future Work
accuracy in gesture spotting (a CLOSING gesture has
100% accuracy in our recognition system). A POINT- In this paper, we have introduced and demon-
ING gesture accelerates the robot by increasing U vr by strated an architecture for gesture based control of
a xed value. Successive pointing gestures cause the mobile robots. An HMM-based gesture spotter rec-
robot to accelerate in the forward direction. ognizes six dynamic hand gestures from a continuous
The WAVING LEFT and WAVING RIGHT ges- data sequence from a data glove. The HMM diers
tures are implemented as simple discrete steering com- from most traditional HMM recognizers because of
mands. When the gesture spotter detects WAVING the inclusion of a \wait state" which allows it to eec-
LEFT (or RIGHT), the gesture server tells the robot tively reject hand motions that do not correspond to
controller to increase (or decrease) U !r by a xed any of the modeled gestures. This is very important
value. Successive waving gestures cause the robot to for avoiding inadvertent robot commands while wear-
turn faster. ing the data glove. In our experiments, we veried
that the HMM with a \wait state" produced an order
of magnitude fewer false positives than a traditional Communications, Vol. com-28, No. 1, pp. 84-95,
HMM recognizer. January 1980.
We envision gesture-based control to be of partic-
ular value for interacting with teams of robots. Con- [7] J. D. Morrow and P. K. Khosla, \Manipulation
trolling individual robots can still be achieved using Task Primitives for Composing Robot Skills," In
a mouse or a joystick; there is a one-to-one mapping Proc. of the IEEE Intl. Conference on Robotics
between the joystick position and the robot velocity. and Automation, Albuquerque, NM, v. 4, pp. 3354-
Multi-robot systems require a much richer form of in- 3359, 1997.
teraction. Instead of providing low-level control for [8] R. Myers and J. Whitson, "Hidden Markov Model
individual robots, the user needs to provide high-level for automatic speech recognition," ftp://svr-
directives for the team as a whole. We will address ftp.eng.cam.ac.uk/pub/comp.speech/recognition/,
control of multi-robot systems in our future research. University of California at Irvine, 1994.
Acknowledgments [9] A. Nakamura, S. Kakita, T. Arai, J. Beltran-
Escavy, J. Ota, \Multiple Mobile Robot Operation
This research is funded in part by the by the Dis- by Human," In Proc. Intl. Conf. on Robotics and
tributed Robotics program of DARPA/ETO under Automation, Leuven, Belgium, v. 4, pp. 2852-2857,
contract DABT63-97-1-0003, and by the Institute for May 1998.
Complex Engineered Systems at Carnegie Mellon Uni-
versity. [10] L. R. Rabiner and B. H. Juang, \An introduction
to hidden Markov models," IEEE ASSP Magazine,
References pp. 4-16, 1986.
[1] H.-J. Boehme, A. Brakensiek, U.-D. Braumann, [11] G. Rigoll, A. Kosmala, and S. Eickeler \High
M. Krabbes, and H.-M. Gross, \Neural Ar- Performance Real-Time Gesture Recognition Us-
chitecture for Gesture-Based Human-Machine- ing Hidden Markov Models," In Proc. Gesture and
Interaction," In Proc. of the Gesture and Sign Lan- Sign Language in Human-Computer Interaction.
guage in Human-Computer Interaction. Interna- International Gesture Workshop, pp. 69-80, Biele-
tional Gesture Workshop, pp. 219-232, Bielefeld, feld, Germany, Sept. 1997.
Germany, Sept. 1997. [12] D. B. Stewart, R.A. Volpe, and P. K. Khosla,
\Design of dynamically recongurable real-time
[2] M. W. Gertz, P. K. Khosla, \Onika: A Multilevel software using port-based objects," IEEE Trans.
Human-Machine Interface for Real-Time Sensor- on Software Engineering, 23(12), pp. 759-776, De-
Based Systems," In Proc. of the 4th Intl Conf. cember 1997.
and Expo on Engineering, Construction and Op-
erations in Space, Albuquerque NM., Feb 1994. [13] R.M. Voyles, A. Agah, P.K. Khosla and G.A.
Bekey, \Tropism-Based Cognition for the Interpre-
[3] K. Dixon, J. Dolan, W. Huang, C. Paredis, and tation of Context-Dependent Gestures," In Proc.
P. Khosla, \RAVE: A Real and Virtual Envi- Intl. Conf. on Robotics and Automation, Albu-
ronment for Multiple Mobile Robot Systems," In querque, NM, v. 4, pp. 3481-3486, Apr. 1997.
Proc. Intl. Conf. on Intelligent Robots and Systems
(IROS'99), Korea, 1999. [14] R.M. Voyles and P.K. Khosla, \Gesture-Based
Programming, Part 1: A Multi-Agent Approach,"
[4] S. B. Kang and K. Ikeuchi, \Robot task program- in Intelligent Engineering Systems through Arti-
ming by human demonstration," In Proc. Image cial Neural Networks, Volume 6; Smart Engineer-
Understanding Workshop, pp. 1125-1134, Mon- ing Systems: Neural Networks, Fuzzy Logic and
terey, CA, November 1994. Evolutionary Programming, ASME Press, 1996.
[5] C. Lee and Y. Xu, \Online, Interactive Learning [15] S. Waldherr, S. Thrun, and R. Romero, \A
of Gestures for Human/Robot Interfaces," In Proc. neural-network based approach for recognition of
Intl. Conf. on Robotics and Automation, vol. 4, pp. pose and motion gestures on a mobile robot," In
2982-2987, 1996. Proc. of the 5th Brazilian Symposium on Neural
Networks, pp. 79-84, Belo Horizonte, Brazil, Dec.
[6] Y. Linde, A. Buzo, R.M. Grey, \An Algorithm for 1998.
Vector Quantizer Design," IEEE Transactions on