Вы находитесь на странице: 1из 10

IEEE TRANSACTIONS ON ROBOTICS, VOL. 23, NO.

5, OCTOBER 2007 991

Affective State Estimation for Human–Robot


Interaction
Dana Kulić, Member, IEEE, and Elizabeth A. Croft, Member, IEEE

Abstract—In order for humans and robots to interact in an ef- size and power, pose physical danger to the user, as opposed
fective and intuitive manner, robots must obtain information about to entertainment and companion robots, which elicit emotional
the human affective state in response to the robot’s actions. This response through social interaction, but generally do not elicit
secondary mode of interactive communication is hypothesized to
permit a more natural collaboration, similar to the “body lan- anxiety or fear.
guage” interaction between two cooperating humans. This paper During human–human interaction, nonverbal communication
describes the implementation and validation of a hidden Markov signals are frequently exchanged in order to assess each par-
model (HMM) for estimating human affective state in real time, ticipant’s affective state, focus of attention, and intent. Many
using robot motions as the stimulus. Inputs to the system are phys- of these signals are indirect, i.e., they occur outside of con-
iological signals such as heart rate, perspiration rate, and facial
muscle contraction. Affective state was estimated using a two- scious control. By monitoring and interpreting indirect signals
dimensional valence-arousal representation. A robot manipulator during an interaction, significant cues about the affective state
was used to generate motions expected during human–robot inter- of each participant can be recognized [10]. Recently, research
action, and human subjects were asked to report their response to has focused on using nonverbal communication, such as eye
these motions. The human physiological response was also mea- gaze [2], [3], facial expressions [11]–[13], and physiological
sured. Robot motions were generated using both a nominal poten-
tial field planner and a recently reported safe motion planner that signals [14]–[17] for human–robot and human–computer in-
minimizes the potential collision forces along the path. The robot teraction. Although not used during interpersonal interaction,
motions were tested with 36 subjects. This data was used to train physiological signals are particularly well suited for human–
and validate the HMM model. The results of the HMM affective robot interaction, as they are relatively easy to obtain using wire-
estimation are also compared to a previously implemented fuzzy less devices, measure, and interpret using online signal process-
inference engine.
ing methods [15]–[17]. By using nonverbal information, such
Index Terms—Affective state estimation, human–robot interac- as physiological signals, the robot can estimate user approval
tion, physiological signals. of its performance without requiring the user to continuously
issue explicit feedback [2], [3]. In addition, changes in some
I. INTRODUCTION nonverbal signals precede a verbal signal from the user [18].
Observation of physiological information can allow the robot
S ROBOT manipulators move from isolated work cells to
A unstructured and interactive environments, they will need
to become better at acquiring and interpreting information about
control system to anticipate command changes, creating a more
responsive and intuitive human–robot interface.
their environment [1]. In particular, in cases where human–robot
interaction is planned, human monitoring can enhance the safety A. System Overview
of the interaction by providing additional information to robot Our research is focused on using affective state estimation
planning and control systems [2], [3]. Example applications during real-time human–robot interaction, to improve the safety
include robots that perform home-care/daily living tasks1 [4], and perceived safety of the interaction by improving the robot
such as dish clearing [5], cooperative load carrying [6], [7] responsiveness to implicit communication messages from the
and feeding [8], [9]. The focus of our research is on larger human subject. In our previous work, we developed a fuzzy
robots needed to perform service tasks, that can, due to their inference engine for online estimation of affective state during
human–robot interaction with a small-scale industrial robot [19].
Manuscript received October 14, 2006; revised May 28, 2007. This paper The valence/arousal representation was used to represent the
was recommended for publication by Associate Editor Y. Nakauchi and Editor affective state. This estimate of affective state was then used to
H. Arai upon evaluation of the reviewers’ comments. This work was supported
by the Canada Natural Science and Engineering Research Council. An earlier modify robot behavior when operating close to the user [20]. The
version of this work was presented at the IEEE International Workshop on Robot robot control system calculated an online estimate of the current
and Human Interactive Communication, 2006. level of danger based on factors affecting the impact force during
D. Kulić was with the University of British Columbia, Vancouver, BC V6T
1Z4, Canada. She is now with the Nakamura and Yamane Laboratory, Depart- a potential collision. The danger index is formulated based on
ment of Mechano-Informatics, University of Tokyo, Tokyo 113-8656, Japan the distance between potential contact points between the robot
(e-mail: dana@ynl.t.u.-tokyo.ac.jp). and the human, the relative velocity between these two points,
E. A. Croft is with the University of British Columbia, Vancouver, BC V6T
1Z4, Canada (e-mail: ecroft@mech.ubc.ca). and the effective inertia of the robot at the potential collision
Color versions of one or more of the figures in this paper are available online point [20]. The estimate of the current level of the danger index
at http://ieeexplore.ieee.org. is used to modify the robot velocity along the path, as well as
Digital Object Identifier 10.1109/TRO.2007.904899
1 The five activities of daily living (ADL) are: 1) transferring to and from bed; to modify the robot path, if the danger index becomes large.
2) dressing; 3) feeding; 4) bathing; and 5) toileting. If additional information about the user is available (such as
1552-3098/$25.00 © 2007 IEEE
992 IEEE TRANSACTIONS ON ROBOTICS, VOL. 23, NO. 5, OCTOBER 2007

Fig. 1. Affective state test case. (a) 7.2 s. (b) 7.8 s. (c) 8.27 s. (d) 8.93 s. (e) 9.4 s. (f) 10.47 s. (g) 11.2 s. (h) 16.87 s.

Fig. 2. Affective state test case danger index. Fig. 3. Affective state test case joint trajectory.

awareness or affective state information), it is used to modulate


the danger index. The danger index based on physical factors
is then scaled using information about the user. The user status
components only affect robot behavior if the potential for a
hazard already exists, based on the kinematic danger factors
[20]. In the current implementation, user status information is
used only to increase the danger index if the user is not aware
of the robot, or if the user appears anxious due to the robot
motion. Increasing the danger index causes the control system
to take corrective action (slowing down or stopping the robot,
or modifying the robot trajectory to evade a collision) sooner, Fig. 4. Affective state test case estimated arousal.
as compared to a controller based only on physical factors. The
user status information is not used to increase robot speed, which test.2 Figs. 2 and 3 show the integrated danger index, and the
could result in unpredictable robot behavior and be perceived as resulting joint trajectory, respectively. Fig. 4 shows the level of
more dangerous than a nonadaptive approach. arousal estimated during the test case.
An example scenario is shown in Fig. 1. The robot’s task is to The robot is initially moving at the maximum specified ve-
approach a position above the table directly in front of the user, locity [Fig. 1(a)–(d)]. Following the user affective reaction, as
simulating a pick-up task. The robot begins executing the task, shown in Fig. 4, the robot is slowed down and then stopped
initially at the maximum normalized velocity, in this case set to [Fig. 1(e)–(g)]. We note that there is approximately a 2 s delay
0.85, in order to elicit a strong response from the user. Fig. 1
shows sample frames from the video sequence taken during the 2 The video can be viewed at www.mech.ubc.ca/∼caris/Videos/physio2.wmv.
KULIĆ AND CROFT: AFFECTIVE STATE ESTIMATION FOR HUMAN–ROBOT INTERACTION 993

between the start of the robot motion and user affective response. the human if the human is in distress. However, in this paper,
This is due to the fact that the estimated affective state is based on the robot does not directly interact with the human; instead, pre-
skin conductance (SC) response and heart-rate changes, phys- recorded physiological information is used to allow the robot to
iological processes that exhibit a 1–3 s delay in response from assess the human’s condition in a simulated rescue situation. In
stimulus onset [21], [22]. Once the reaction of the user subsides, these studies, video game playing, and not robot motion, is used
the robot completes its mission at a lowered velocity as shown to elicit the physiological response.
in Fig. 1(h). Wada et al. [28] and Saito et al. [29] have used a small robotic
In these experiments, the estimated level of arousal was used seal to measure the physiological effects on elderly patients in
as the input to the control system. The fuzzy inference engine a nursing home. In their work, 23 patients were tested using
developed to estimate affective state [19] was developed as a both subjective responses to a questionnaire, and measuring
multiuser rule-based estimator, based on rules derived from psy- physiological changes in stress level through urinary tests. The
chophysiological research [18], [21]–[25]. The multiuser fuzzy effects of the robot on the nursing staff were also examined.
inference engine achieved a 69% recognition rate at medium and In this case, a physical robot is used to elicit the psychological
high levels of arousal, but was not able to successfully estimate and physiological response; however, the physiological effects
valence. The valence estimation rules were based on the corru- of the seal robots were not estimated online, but offline through
gator muscle electromyogram response, which was reported to the subsequent use of questionnaires and urine samples.
be strongly correlated with valence in the psychophysiological Kanda et al. [30] studied human responses to robot motion
research [18], [23], [25]. In these phsychophysiological studies, during a human–robot interaction study with a humanoid mo-
picture viewing is most commonly used as the stimulus to elicit bile robot. In this study, 26 subjects were asked to interact with
affective state response. However, this response was not ob- the robot. Their reactions to the robot were elicited via a postex-
served when robot motion was used as the stimulus. Therefore, periment questionnaire. The relationship between the subjective
the fuzzy inference engine was moderately successful at estimat- evaluations, eye contact, and robot motion was then analyzed.
ing user arousal, but was unable to correctly estimate valence. It was found that well-coordinated robot behaviours correlate
In this paper, we describe an improved user-specific affective with a positive subjective evaluation. In this study, only positive
state estimator, which is based on machine-learning techniques. subjective evaluation was analyzed, fear and anxiety were not
The user-specific estimator is able to estimate both valence and reported. In addition, human response was not measured online.
arousal elicited by viewing robot motions, which would be ob- Nonaka et al. [31] describe a set of experiments where hu-
served during human–robot interaction, based on physiological man response to pick-and-place motions of a virtual humanoid
data such as heart rate, perspiration rate, and corrugator muscle robot is evaluated. In their experiment, a virtual reality display
response. is used to depict the robot. Human response is measured through
heart-rate measurements and subjective responses. No relation-
ship was found between the heart rate and robot motion, but
B. Related Work a correlation was reported between the robot velocity and the
A large body of psychophysiological research [18], [21]–[25] subject’s rating of “fear” and “surprise.”
that examines the relationship between physiological signals Koay et al. [32] describe an early study where human reaction
and affective state exists. In this type of research, the goal is to robot motions was measured online. In this study, 28 subjects
to discover the relationships between physiological responses interacted with a robot in a simulated living room environment.
and affective state, by eliciting various emotions from human The robot motion was controlled by the experimenters in a
subjects, and measuring changes in physiological state. Since “Wizard of Oz” setup. The subjects were asked to indicate their
the goal is not to estimate or predict affective state, machine- level of comfort with the robot with a handheld device. The
learning techniques have generally not been used in this type of device consisted of a single slider control to indicate comfort
research. level and a radio signal data link. The data from only seven
Physiological monitoring systems have also been developed subjects were considered reliable, and included in subsequent
to estimate and predict the user’s reaction, both for human– analysis. Analysis of the device data with experiment video
computer interaction and human–robot interaction. Signals pro- found that the subjects indicated discomfort when the robot was
posed for use in human–computer interfaces include SC, heart blocking their path, the robot was moving behind them, or the
rate, pupil dilation, and brain and muscle neural activity. Bien robot was on a collision course with them.
et al. [26] advocate that soft computing methods are the most Picard [15] and Kim et al. [33] used support vector machines
suitable methods for interpreting and classifying these types to estimate user affective state for human–computer interaction.
of signals, because these methods can deal with imprecise and Liu et al. [34] and Rani et al. [35] compared the effectiveness
incomplete data. of several machine-learning methods for estimating affective
Sarkar proposes using multiple physiological signals to esti- state, using PC-based cognitive tasks to elicit the physiological
mate affective state, and using this estimate to modify robotic response. Most of these approaches consider the instantaneous
actions to make the user more comfortable [17]. Rani et al. [16], value of various signal features (e.g., the rate of change of the
[27] use heart-rate analysis and multiple physiological signals SC) for estimating the affective state. However, physiological
to estimate human stress levels. In Rani et al. [16], the stress response is not characterized by instantaneous changes in value,
information is used by an autonomous mobile robot to return to but rather by a signal sequence or waveform. For example,
994 IEEE TRANSACTIONS ON ROBOTICS, VOL. 23, NO. 5, OCTOBER 2007

II. APPROACH
A. Physiological Signals as Indicators of Affective State
The affective state is estimated, based on measured physio-
logical signals such as heart rate, SC, and facial muscle contrac-
tion. An important question when estimating human affective
response is how to represent the affective state. Two different
representations are commonly used in emotion and emotion de-
tection research: one using discrete emotion categories (anger,
happiness, fear, etc.), and the other using a two-dimensional rep-
resentation of valence and arousal [23]. Valence measures the
degree to which the emotion is positive or negative, and arousal
measures the strength of the emotion. The arousal dimension
encodes the intensity of the emotion; at a neutral valence
level, low arousal represents “calm,” and high arousal rep-
resents “excited.” The valence/arousal representation adopted
herein appears adequate for the purposes of robotic control,
and is easier to convert to a measure of user approval. This
representation system has also been favored for use with physi-
ological signals and in psychophysiological research [10], [18],
Fig. 5. Example skin conductance response. [23], [25].
Three physiological signals were selected for measurement:
1) SC; 2) heart rate; and 3) corrugator muscle activity. These
for SC, a stimulus will be followed by a rise in conductance, three signals have been shown to be the most reliable indi-
followed by a slow decay (see Fig. 5). A good model of the cators of affective state in psychophysiological research [10],
physiological data should capture this time-domain behavior, as [18], [25]. The respiration rate was also considered in an early
also discussed by Villon and Lisetti [36]. study [14], but was rejected as unsuitable for online interac-
A hidden Markov model (HMM) is a stochastic model for tion applications due to the slow physiological response of the
representing time-domain observation sequences. An HMM signal.
represents time-domain data by modeling the progress of the SC is a strong indicator of affective arousal. Several studies
data through time as transitions through a sequence of states. [18], [25], [44] have shown that skin conductance is positively
The output of the system is modeled as the observations aris- correlated with arousal. Bradley and Lang [18] report that 74%
ing from each state. The model is “learned” by optimizing the of subjects exhibit this correlation.
state transition probabilities and the observation probabilities Corrugator muscle activity measured via electromyogram
for each state for a given set of training data. HMMs have been (EMG) is negatively correlated with valence. The corrugator
used extensively for speech recognition [37], as well as human muscle, located just above each eyebrow close to the bridge of
motion pattern recognition [38]. Sheirer et al. [39] developed the nose, is responsible for the lowering and contraction of the
an HMM to estimate user frustration based on physiological brows, i.e., frowning, which is intuitively associated with nega-
signals during human–computer interaction. HMMs have been tive valence. Bradley and Lang [18] reported corrugator muscle
well researched in areas such as speech recognition where on- activity levels that were well above the baseline level for neg-
line training data is easily obtained, and authors have also used ative valence stimuli, slightly above baseline level for neutral
HMMs for facial expressions or prosody-based affect recogni- valence stimuli, and slightly below baseline level for positive
tion [40]–[42]; however, little work has been done using HMMs stimuli. In their study, more than 80% of the subjects showed
for physiologically based affect recognition [39]. The challenges this correlation.
of measuring and interpreting physiological affect data to pro- Unlike the SC and corrugator EMG response, the heart ac-
vide online training data may be the reasons for the more recent tivity is governed by many variables including physical fitness,
level interest in this area. posture, and activity level, as well as the affective state. The
In this paper, an HMM model is developed for modeling affec- heart muscle, unlike the electrodermal system, is enervated by
tive state during robot motions that would be typically encoun- both the parasympathetic and sympathetic nervous system [23].
tered during human–robot interaction. The model is trained and The heart muscle also has homeostatic and metabolic functions
tested using the data obtained during a user trial, using 36 sub- besides emotional perception, unlike facial muscle EMG [23].
jects. The performance of the HMM is discussed, and compared The correlation between heart activity and affective state is,
to the previously developed [43] fuzzy inference engine. The therefore, more modest [23], and conflicting results have been
paper is organized as follows. In Section II, our approach is de- reported in the psychophysiological research. In tests using ex-
scribed. Section III describes the experimental setup, Section IV ternal stimuli to generate the affective response (such as picture
discusses the results, and Section V presents the conclusions, viewing), heart-rate response is initially decelerative, followed
and directions for future work. by a subsequent acceleration, while tests using internal stimulus
KULIĆ AND CROFT: AFFECTIVE STATE ESTIMATION FOR HUMAN–ROBOT INTERACTION 995

Fig. 6. Path PP-PF (Pick and place task planned with the potential field planner).

(recalling emotional imagery) showed an initial accelerative re- C. HMM Structure


sponse [18]. These results indicate that heart-rate deceleration is
HMM training consists of learning the state transition prob-
associated with the orienting response (i.e., increased arousal).
abilities and the observation probabilities. In addition to the
Heart rate at the baseline, with no heart-rate acceleration or de-
learned parameters of the HMM, parameters defining the struc-
celeration is associated with low arousal, while high heart rate
ture of the HMM need to be specified. These include the HMM
and heart-rate acceleration/deceleration are associated with high
type, the number of states, and the observation data type. Since
arousal.
the data to be modeled is temporal, a left–right type (Bakis type)
Another key finding from psychophysiological research is
of HMM was used [37], with the number of states ranging from
that physiological responses can be highly variable between
4–10. Constraints were placed on large state changes, such that
individuals, as well as vary for the same individual, depending
the maximum jump across states was 2.
on the context of the response [18], [21], [22].
Both a discrete and a continuous observation HMM were de-
veloped. For the discrete output HMM, the data were discretized
B. Signal Selection and Processing using five levels for the SC, and three levels for the remaining
The SC, heart rate, and corrugator muscle data were used as signals. For the continuous HMM, a single Gaussian was used to
the input signals for the HMM. Two features were extracted represent each element of the output observation vector. For the
from the heart activity data: the heart rate and the heart-rate continuous HMM, to avoid numerical problems during training,
acceleration. In a typical ECG signal, the most clearly identi- a minimum covariance matrix was imposed on each Gaussian.
fiable feature of the time-domain signal is the QRS complex, The observation vector consisted of five elements: 1) the SC; 2)
corresponding to the electrical current that causes contraction the rate of change of SC; 3) the heart rate; 4) the rate of change of
of the left and right ventricles of the heart. The heart-rate data heart rate; and 5) the electromyogram response of the corrugator
was low-pass filtered before applying a peak detection algo- muscle; each of these elements was normalized as described in
rithm to detect R-waves (peaks) of each QRS signal. The peak- Section II-B. For both the discrete and the continuous models,
to-peak time was used to calculate the heart rate. The heart each observation was generated by averaging over 25 samples
rate was smoothed using a three sample averaging filter, and of data so that a new observation was sampled every 10 Hz.
normalized based on the baseline heart rate, such that the sig-
nal ranged between [−1, 1]. The heart-rate acceleration was III. DATA COLLECTION
calculated by differentiating the smoothed heart-rate signal. A human–robot interaction trial was used to collect physi-
The heart-rate acceleration was normalized to range between ological data for training the HMM. The experiment was de-
[−1, 1]. signed to generate various robot motions and to evaluate both
Two features were extracted from the SC: the level of SC and the human subjective response and physiological response to
the rate of change of the SC. The SC data was low-pass filtered the motions. The affective state was also estimated online dur-
and smoothed using a 1 s averaging window. The data was then ing the experiment, using the fuzzy inference engine described
normalized to range between [0], [1], using the minimum and in Kulić and Croft [43].
maximum values in the preceding 30 s. The rate of change of
the SC is calculated by differentiating the smoothed SC data and A. Experimental Method
normalizing so that the data ranges from [−1, 1].
One feature was extracted from the corrugator muscle EMG The experiment was performed using the CRS A460 six de-
data: the level of response. The EMG data was low-pass filtered gree of freedom (DoF) manipulator, as shown in Figs. 6–9.3 The
and smoothed using a 1 s averaging window. The data was 3 A video showing one example of each task can be viewed at
normalized to range between [0], [1]. http://www.mech.ubc.ca/∼caris/Videos/SamplePaths.wmv.
996 IEEE TRANSACTIONS ON ROBOTICS, VOL. 23, NO. 5, OCTOBER 2007

Fig. 7. Path PP-S (Pick and place task planned with the safe planner).

Fig. 8. Path PP-PF (Pick and place task planned with the potential field planner).

Fig. 9. Path RR-S (Reach and retract task planned with the safe planner).

CRS A460 is a typical laboratory scale robot with a payload of the repeatability of the results across trials, and to determine
1 kg, which is suitable for performing table-top assistive activi- whether 2–3 exemplars was enough data to train the HMM
ties. Two sets of experiments were performed. In the first set of model, or whether more data would improve the classification
experiments, a group of 36 human subjects were tested; 16 were performance. In this set of experiments, six subjects were tested,
female and 20 were male. The age of the subjects ranged from recruited from the students and staff of the Mechanical Engi-
19 to 56, with an average age of 29.2. Approximately half of the neering Department and the University of British Columbia.
subjects were recruited from the students and staff of the Me- In this set of experiments, the same trajectories were used, as
chanical Engineering Department and the University of British described in Section III-B; however, each trajectory was shown
Columbia, and the other half were recruited from off campus. three times for a total of 36 trajectories per subject.
The subjects were also asked to rate their familiarity with robots
on the Likert scale, with 1 indicating no familiarity and 5 indi-
cating excellent familiarity. Of the 36 subjects, 17 had little or B. Trajectory Generation
no familiarity with robots (response of 1 or 2), 11 had moderate Two different tasks were used for the experiment: a pick-
familiarity (response of 3), and 7 had high familiarity (response and-place motion (PP), similar to the trajectory displayed to
of 4 or 5). Each subject was tested once over a contiguous time subjects in Nonaka et al. [31] and a reach and retract motion
period of approximately 25 min. In this set of experiments, each (RR). These tasks were chosen to represent the typical motions
subject was shown each motion trajectory once, for a total of that an articulated robot manipulator could be asked to perform
12 trajectories shown. This generally resulted in at least 2–3 during human–robot interaction, e.g., during hand-over tasks.
exemplars for each affect category. Two planning strategies were used to plan the path of the
In the second set of experiments, a smaller group of sub- robot for each task: a conventional potential field (PF) method
jects was tested with a larger number of trajectories to confirm with obstacle avoidance and goal attraction [45], and a safe
KULIĆ AND CROFT: AFFECTIVE STATE ESTIMATION FOR HUMAN–ROBOT INTERACTION 997

path method (S) reported in Kulić and Croft [46]. The safe response to the motion in the following affective response cat-
path planner is similar to the potential field method, with the egories: anxiety, calm, and surprise. The Likert scale (from 1
addition of a danger criterion, comprising of factors that affect to 5) was used to characterize the response, with 5 representing
the impact force during a collision between the robot and the “extremely” or “completely” and 1 representing “not at all.”
human that is minimized along the path. The same pick and The subject was also asked to rate whether the robot attracted
place, and reach and retract end effector targets were used for and/or held their attention during the motion, on the same Lik-
both planners. Figs. 6–9 show frames of video data depicting ert scale, with 5 representing “full attention,” and 1 representing
each motion type. “not attentive at all.” The rating of each trajectory took approx-
Given the path points generated for each task by the two imately 30 s to complete. After the subjective response was
planners, a velocity and acceleration profile for the motion was collected, a 1 min rest period was enforced before presenting
generated using a minimum time cubic trajectory planner plan- the next trajectory to ensure that the physiological data returns
ning in configuration space. The trajectory was planned for the to the baseline. This experimental protocol was approved by the
maximum robot acceleration and velocity; however, the trajec- Behavioural Research Ethics Board of the University of British
tory was parametrized to allow velocity and acceleration scal- Columbia.
ing along the path. For each path, trajectories at three different
speeds were generated (slow, medium, fast), resulting in 12 tra-
jectories. Details about the path selection and trajectory planning IV. RESULTS
are reported in Kulić [47]. Three HMMs each were used to represent the valence and
arousal, representing the low, medium, and high level or each
C. Physiological Sensing dimension of affective state. The level of valence and arousal to
be used as the ground truth during training is extracted from the
The ProComp Infinity system from Thought Technology [48]
subjective responses. This approach to classifying the responses
was used to gather the physiological data. This system has
is susceptible to subjective misclassification, but the authors are
been used for several physiological studies in human–robot and
unaware of a more accurate method for ground truth labeling,
human–computer interaction [15], [16], [27], and is used by
as the affective state of the subject is usually not visible to an
therapists for biofeedback applications [48]. As discussed in
observer.
Section II, heart muscle activity, SC, and corrugator muscle
Arousal is extracted from the calm subjective response. The
activity were measured.
inverse of the calm was considered the level of arousal. A value
The heart muscle activity was measured via electrocardio-
less than 2 was classified as low, between 2 and 4 as medium,
gram (ECG) measurement using EKG Flex/Pro sensor. The SC
and above 4 as high. The anxiety response was used to repre-
was measured using the SCFlex-Pro sensor. The corrugator mus-
sent valence. The same classification scheme was used for the
cle activity was measured with the Myoscan Pro electromyog-
valence, as described earlier.
raphy (EMG) sensor. All sensor data was collected at 256 Hz.
The physiological data following each trajectory presentation
This rate is sufficient for capturing physiological signal events.
was processed and normalized as described in Section II. Ten
The robot controller and the physiological sensing computer
seconds of data following the start of each trajectory was used
were connected with a serial link so that the trajectory and
for HMM training and classification.
physiological data could be synchronized.
The HMMs for both arousal and valence were trained using
the Baum–Welch algorithm [37]. The Baum–Welch algorithm
D. Experimental Procedure computes the maximum likelihood estimate of the state transi-
For each experiment, the subject was asked to read a descrip- tion probabilities, and the Gaussian mixture weights and param-
tion of the experiment and sign a consent form. After signing eters (the mean and covariance matrix), based on the training
the consent form, the experimental protocol was explained to data. Two types of training were performed: one where each
the subject, and physiological sensors were attached. The hu- set of HMMs were trained to each individual subject, and one
man subject was seated facing the robot. The robot was initially where a single set of HMMs were trained for all subjects.
held motionless for a minimum of 90 s to collect baseline phys- For the individualized HMMs, three examples from each level
iological data for each subject. The use of physically intrusive were used for training, and the remaining data were used as
sensors and the laboratory environment itself may also introduce “new” examples, to test the generalization of the trained HMM
some anxiety in the subjects. To ensure that only physiological to new data. The data from the second set of experiments were
responses related to the robot motion are captured, the baseline used to confirm the validity of this approach. Using the larger
data collected is used to offset the data (i.e., only changes from data set, HMMs were trained with a larger set of examples (up
the baseline are recorded). to eight), but it was found that using additional data beyond
The robot then executed the 12 trajectories described earlier. three did not improve performance, as additional data caused
In the first set of experiments, each subject was shown each the HMM to overfit and, therefore, deteriorate in classification
trajectory once. In the second set of experiments, each subject performance when used with new data.
was shown each trajectory three times. The trajectories were Data from the second set of experiments were also used to
presented to each subject in randomized order. After each tra- analyze the effect of habituation. It was found that subjects
jectory had been executed, the subject was asked to rate their tended to habituate quickly to the medium velocity motions; less
998 IEEE TRANSACTIONS ON ROBOTICS, VOL. 23, NO. 5, OCTOBER 2007

habituation was observed for the fast motion. Habituation was TABLE I
CONFUSION MATRIX FOR AROUSAL
observed both in the subjective responses and in the physiolog-
ical responses, so that the HMM performance was not affected,
i.e., a medium-speed motion would be rated as medium arousal
(or valence) after the first viewing with a commensurate phys-
iological response, and as low arousal on subsequent viewing
with a lower physiological response, so that the HMM perfor-
mance was not affected by the presentation order. However, the
habituation response made it difficult to collect a lot of training
examples at high arousal or valence level for each user.
Individualized HMMs were trained using a random selection
of three sample responses at each arousal and valence level. The
TABLE II
best results were obtained with an HMM model consisting of CONFUSION MATRIX FOR VALENCE
six states with a maximum state transition of two states. For the
discrete output individualized HMMs, the average recognition
rate for all subjects was 72% on trained data, for both arousal
and valence. The discrete output HMMs are fairly insensitive to
model initialization, and the same results were obtained regard-
less of whether systematic or random initialization was used to
initialize the HMM parameters at the start of the training.
The HMMs were also tested on unknown data to test the
generalization of the model. However, due to the quick habit-
uation of subjects to robot motion, it was difficult to obtain
enough high arousal and valence data for both training and test
data, since the majority of the subjects did not report more age recognition rate for all subjects increased to 83% for the
than three high arousal and valence trajectories, from either arousal, and 80% for the valence for training data. For new data
data set. For this reason, the new data contained only low and consisting of only low and medium data points as described
medium affective state data. Using this data set as the untrained earlier, the recognition rate was 66% for both arousal and va-
data, a 60% recognition rate was achieved for both arousal and lence. This result suggests that attention is an important factor
valence. for interpreting physiological data during human–robot inter-
The continuous output individual HMMs were also trained action, as it provides an indicator that the robot motion is the
using the same sample responses at each arousal and valence source of the physiological response rather than other stimuli
level. The continuous output HMM model is quite sensitive to in the environment. In addition to estimating affective state, it
model initialization, due to the fact that the training procedure may be necessary to estimate the current focus of attention of
(the Baum–Welch algorithm) is a local optimization method the user, e.g., by monitoring the user’s eye gaze parameters or
[37]. For the continuous model, the best results were obtained head orientation [47].
when the trained discrete model parameters were used for model The resulting HMM models capture the time-domain behav-
initialization. In this case, the average recognition rate for all ior of the physiological signals in response to robot motion
subjects was 74% for the arousal, and 70% for the valence, stimulus; however, the state transitions encoded by the models
using the training data. For new data, the recognition rate was differed by subject. In general, for the arousal models, the SC
61% for arousal and 62% for valence. The continuous output features were the most important. For example, a large peak
HMM model did not provide significantly better performance and subsequent decline in SC was encoded for the high arousal
when compared to the discrete case, and was significantly more model; however, the peak magnitude and temporal characteris-
difficult to train, as well as more susceptible to numerical issues, tics differed between the users. For the valence models, both SC
due to the need to perform matrix inversion calculations on the and heart-rate information were encoded. However, the impor-
Gaussian outputs covariance matrices (which are being trained). tance of heart-rate information again depended on the subject.
For both discrete and continuous model HMMs, using EMG For example, for high (negative) valence models, heart-rate de-
data did not improve HMM performance, supporting earlier celeration followed by acceleration was encoded.
findings [43] that EMG is not a good indicator of affective state Tables I and II show the confusion matrices for the arousal
for this type of external stimulus. This result underlines the im- and valence on the training data, respectively. These results
portance of the relationship between the type of stimulus (e.g., are generated using the discrete HMM models, using only
picture viewing versus physical interaction) and the physiolog- high attention exemplars for the training data. As can be seen
ical response. from these results, both the valence and the arousal estimation
A significant improvement in recognition could be achieved accuracy are the highest when the reported valence or arousal is
if only those motions for which the subjects rated their atten- the highest.
tion as high (4 or 5 on the Likert scale) were included in the A single set of HMMs was also trained for all users; how-
training data. In this case, using the discrete model, the aver- ever, the multiuser HMMs did not achieve good classification
KULIĆ AND CROFT: AFFECTIVE STATE ESTIMATION FOR HUMAN–ROBOT INTERACTION 999

results, even though physiological data was normalized prior is required to identify the characteristics of the stimulus, which
to model training to facilitate generalization across users. This cause the differing EMG response with specific context to robot
result seems to indicate that there is considerable variability in motion.
the signal amplitude and length between subjects such that the Future research will focus on combining the inference en-
HMM was unable to generalize a single set of models that can gine with the HMM model to further improve the classification
be used to describe the time-domain signal for all users. results. An analysis of the inference engine results [47] indi-
The aforementioned results report the recognition rate after cated that subjects vary not only in terms of response amplitude
the HMM has viewed a full 10 s of physiological response and duration but that for some modalities, a number of subjects
data. However, for online human–robot interaction, it would be show no response at all (e.g., only a subset of subjects exhibit
desirable if recognition results could be achieved prior to all heart-rate response). Improvements to the HMM model could
of the data becoming available. To test the capability of the be achieved by using the information from the fuzzy inference
HMM to recognize human affective state when only partial data engine to customize not only the HMM model parameters but
is available, the forward algorithm was used on fully trained also the structure of the HMM for each individual subject.
HMMs to classify both the training data and the novel data
with partial data provided. The HMM is able to provide good ACKNOWLEDGMENT
recognition results after 4 s of data become available, with a
The authors wish to thank all the subjects for volunteering to
recognition rate of 70% for training data and 64% for novel data.
participate in the experiments.
Compared to the fuzzy inference engine developed in [43], in-
dividualized HMMs achieve improved performance to the fuzzy
inference engine in classifying arousal, especially in the low re- REFERENCES
ported arousal level, where the fuzzy inference engine was not [1] A. Pentland, “Perceptual intelligence,” Commun. ACM, vol. 43, no. 3,
successful. While there is very little work reported in the lit- pp. 35–44, 2000.
[2] Y. Matsumoto, J. Heinzmann, and A. Zelinsky, “The essential components
erature on affective state response to robot motion, the results of human-friendly robot systems,” in Proc. Int. Conf. Field Service Robot.,
achieved are comparable to affective state recognition rates for 1999, pp. 43–51.
other stimuli [15], [33], [35], [39]. The HMM performance [3] V. J. Traver, A. P. del Pobil, and M. Perez-Francisco, “Making service
robots human-safe,” in Proc. IEEE RSJ Int. Conf. Intell. Robots Syst.,
can be further improved when only data from trial runs, where 2000, vol. 1, pp. 696–701.
subjects reported a high level of attention, is used. A clear ad- [4] J. M. Weiner, R. J. Hanley, R. Clark, and J. F. Van Nostrand, “Measuring
vantage of the HMM model for the individual user identification the activities of daily living: Comparisons across national surveys,” J.
Gerontology: Social Sci., vol. 45, no. 6, pp. 229–237, 1990.
case is that it is able to learn a model for classifying valence, [5] A. J. Bearveldt, “Cooperation between man and robot: Interface and
which could not be classified using the fuzzy inference engine. safety,” in Proc. IEEE Int. Workshop Robot Human Commun., 1993,
As discussed in Section I-A, the fuzzy inference engine was pp. 183–187.
[6] H. Arai, T. Takubo, Y. Hayashibara, and K. Tanie, “Human–robot coop-
developed based on previous research, which showed a strong erative manipulation using a virtual nonholonomic constraint,” in Proc.
correlation between valence and corrugator muscle response. IEEE Int. Conf. Robot. Autom., 2000, pp. 4063–4069.
This relationship was not consistently observed in response to [7] V. Fernandez, C. Balaguer, D. Blanco, and M. A. Salichs, “Active human–
mobile manipulator cooperation through intention recognition,” in Proc.
robot motion in our experiments [19]. However, despite this lack IEEE Int. Conf. Robot. Autom., 2001, pp. 2668–2673.
of specific relationship, the HMM was able to identify a rela- [8] E. Guglielmelli, P. Dario, C. Laschi, and R. Fontanelli, “Humans and
tionship between the aggregate physiological data collected and technologies at home: From friendly appliances to robotic interfaces,” in
Proc. IEEE Int. Workshop Robot Human Commun., 1996, pp. 71–79.
valence. [9] K. Kawamura, S. Bagchi, M. Iskarous, and M. Bishay, “Intelligent robotic
systems in service of the disabled,” IEEE Trans. Rehabil. Eng., vol. 3,
no. 1, pp. 14–21, Mar. 1995.
V. CONCLUSIONS AND FUTURE WORK [10] R. Picard, Affective Computing. Cambridge, MA: MIT Press, 1997.
[11] P. Ekman, W. V. Friesen, and P. Ellsworth, Emotion in the Human Face.
This paper describes the development of an HMM-based af- New York: Pergamon, 1972.
fective state classifier for estimating affective state from phys- [12] I. Essa and A. Pentland, “Coding, analysis, interpretation, and recognition
of facial expressions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19,
iological data during human–robot interaction. Training data is no. 7, pp. 757–763, Jul. 1997.
obtained from a user trial, where physiological data and subjec- [13] M. Pantic and L. J. M. Rothkrantz, “Automatic analysis of facial expres-
tive ratings to robot motions are collected. Using individualized sions: The state of the art,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 22, no. 12, pp. 1424–1445, Dec. 2000.
HMM models, the model is able to achieve better classification [14] D. Kulić and E. Croft, “Estimating intent for human-robot interaction,” in
results than a previously reported fuzzy inference engine based Proc. IEEE Int. Conf. Adv. Robot., 2003, pp. 810–815.
on psychophysiological research results [43] for arousal, and [15] R. Picard, “Toward machine emotional intelligence: Analysis of affective
physiological state,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23,
significantly better results for estimating valence. Furthermore, no. 10, pp. 1175–1191, Oct. 2001.
the HMM shows potential as an online human affective state [16] P. Rani, N. Sarkar, C. A. Smith, and L. D. Kirby, “Anxiety detecting
classifier with good recognition rates after only 4 s of data. robotic system—Towards implicit human–robot collaboration,” Robotica,
vol. 22, pp. 85–95, 2004.
Further improvements are possible by using an independent [17] N. Sarkar, “Psychophysiological control architecture for human–robot
measure of user attention to selectively weight input data. coordination—Concepts and initial experiments,” in Proc. IEEE Int. Conf.
Results using the HMM classification confirm earlier results Robot. Autom., 2002, vol. 4, pp. 3719–3724.
[18] M. M. Bradley and P. J. Lang, “Measuring emotion: Behavior, feeling
[43] that corrugator muscle EMG is not a good indicator of and physiology,” in Cognitive Neuroscience of Emotion, R. D. Lane and
valence for a stimulus consisting of robot motions. Further work L. Nadel, Eds. New York, NY: Oxford University Press, 2000.
1000 IEEE TRANSACTIONS ON ROBOTICS, VOL. 23, NO. 5, OCTOBER 2007

[19] D. Kulić and E. Croft, “Physiological and subjective responses to articu- [39] J. Scheirer, R. Fernandez, J. Klein, and R. Picard, “Frustrating the user
lated robot motion,” Robotica, vol. 25, no. 1, pp. 13–27, 2007. on purpose: A step toward building an affective computer,” Interacting
[20] D. Kulić and E. Croft, “Pre-collision safety strategies for human–robot Comput., vol. 14, pp. 93–118, 2002.
interaction,” Auton. Robots, vol. 22, no. 2, pp. 149–164, 2007. [40] A. Kapoor and R. W. Picard, “Multimodal affect recognition in learning
[21] K. A. Brownley, “Cardiovascular psychophysiology,” in Handbook of environments,” in Proc. ACM Int. Conf. Multimedia, pp. 677–682, 2005.
Psychophysiology, J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson, [41] B. Schuller, G. Rigoll, and M. Lang, “Hidden Markov model-based speech
Eds. Cambridge, U.K.: Cambridge University Press, 2000. emotion recognition,” in Proc. Int. Conf. Multimedia Expo., 2003, pp.
[22] M. E. Dawson, “The electrodermal system,” in Handbook of 401–404.
Psychophysiology, J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson, [42] J. J. Lien, T. Kanade, J. F. Cohn, and L. Ching-Chung, “Automated facial
Eds. Cambridge, U.K.: Cambridge University Press, 2000. expression recognition based on FACS action units,” in Proc. IEEE Int.
[23] M. M. Bradley, “Emotion and motivation,” in Handbook of Conf. Autom. Face Gesture Recog., 1998, pp. 390–395.
Psychophysiology, J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson, [43] D. Kulić and E. Croft, “Anxiety detection during human–robot interac-
Eds. Cambridge, U.K.: Cambridge University Press, 2000, pp. 602–642. tion,” in Proc. IEEE Int. Conf. Intell. Robots Syst., 2005, pp. 389–394.
[24] J. T. Cacioppo and L. G. Tassinary, “Inferring psychological significance [44] P. Ekman, R. W. Levenson, and W. V. Friesen, “Autonomic nervous system
from physiological signals,” Am. Psychol., vol. 45, no. 1, pp. 16–28, activity distinguishes among emotions,” Science, vol. 221, pp. 1208–1210,
1990. 1983.
[25] P. J. Lang, “The emotion probe: Studies of motivation and attention,” [45] O. Khatib, “Real-time obstacle avoidance for manipulators and mobile
Amer. Psychol., vol. 50, no. 5, pp. 372–385, 1995. robots,” Int. J. Robot. Res., vol. 5, no. 1, pp. 90–98, 1986.
[26] Z. Z. Bien, J. B. Kim, D. J. Kim, J. S. Han, and J. H. Do, “Soft com- [46] D. Kulić and E. Croft, “Safe planning for human–robot interaction,” in
puting based emotion/intention reading for service robot,” Lecture Notes Proc. IEEE Int. Conf. Robot. Autom., 2004, vol. 2, pp. 1882–1887.
Comput. Sci., vol. 2275, pp. 121–128, 2002. [47] D. Kulić, “Safety for human–robot interaction,” Ph.D. dissertation, Dept.
[27] P. Rani, J. Sims, R. Brackin, and N. Sarkar, “Online stress detection Mech. Eng., Univ. British Columbia, Vancouver, BC, Canada, 2005.
using phychophysiological signals for implicit human–robot cooperation,” [48] Thought Technology Ltd. [Online]. Available: www.thoughttechnology.
Robotica, vol. 20, pp. 673–685, 2002. com.
[28] K. Wada, T. Shibata, T. Saito, and K. Tanie, “Effects of robot-assisted
activity for elderly people and nurses at a day service center,” Proc.
IEEE, vol. 92, no. 11, pp. 1780–1788, 2004.
[29] T. Saito, T. Shibata, K. Wada, and H. Tanie, “Examination of change of
stress reaction by urinary tests of elderly before and after introduction of
mental commit robot to an elderly institution,” in Proc. 7th Int. Symp.
Artif. Life Robot., 2002, pp. 316–319. Dana Kulić (M’98) received the combined B.A.Sc.
[30] T. Kanda, H. Ishiguro, M. Imai, and T. Ono, “Development and evaluation and M.Eng. degrees in electromechanical engineer-
of interactive humanoid robots,” in Proc. IEEE, vol. 92, no. 11, Nov. 2004, ing and the Ph.D. degree in mechanical engineering
pp. 1839–1850. from the University of British Columbia, Vancouver,
[31] S. Nonaka, K. Inoue, T. Arai, and Y. Mae, “Evaluation of human sense BC, Canada, in 1998 and 2005, respectively.
of security for coexisting robots using virtual reality,” in Proc. IEEE Int. She is currently a Postdoctoral Fellow with
Conf. Robot. Autom., 2004, vol. 3, pp. 2770–2775. the Nakamura-Yamane Laboratory, Department of
[32] K. L. Koay, M. L. Walters, and K. Dautenhahn, “Methodological issues Mechano-Informatics, University of Tokyo, Tokyo,
using a comfort level device in human–robot interactions,” in Proc. IEEE Japan. Her research interests include human–robot
Int. Workshop Robot Human Interactive Commun., Aug. 2005, pp. 359– interaction, robot learning, humanoid robotics, and
364. mechatronics.
[33] K. H. Kim, S. W. Bang, and S. R. Kim, “Emotion recognition system
using short-term monitoring of physiological signals,” Med. Biol. Eng.
Comput., vol. 42, pp. 419–427, 2004.
[34] C. Liu, P. Rani, and N. Sarkar, “An empirical study of machine learning
techniques for affect recognition in human–robot interaction,” in Proc.
IEEE Conf. Intell. Robots Syst., 2005, pp. 2051–2056.
[35] P. Rani, C. Liu, N. Sarkar, and E. Vanman, “An empirical study of machine Elizabeth A. Croft (M’95) received the B.A.Sc.
learning techniques for affect recognition in human–robot interaction,” degree from the University of British Columbia,
Pattern Anal. Appl., vol. 9, no. 58–69, 2006. Vancouver, BC, Canada, in 1988, the M.A.Sc. de-
[36] O. Villon and C. Lisetti, “A user-modeling approach to build user’s psycho- gree from the University of Waterloo, Waterloo, ON,
physiological maps of emotions using bio-sensors,” in Proc. IEEE Int. Canada, in 1992, and the Ph.D. degree from the Uni-
Workshop Robot Human Interactive Commun., 2006, pp. 269–276. versity of Toronto, Toronto, ON, in 1995, all in me-
[37] L. R. Rabiner, “A tutorial on hidden Markov models and selected appli- chanical engineering.
cations in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, She is currently an Associate Professor in
Feb. 1989. mechanical engineering with the University of
[38] T. Inamura, I. Toshima, H. Tanie, and Y. Nakamura, “Embodied symbol British Columbia. Her research interests include
emergence based on mimesis theory,” Int. J. Robot. Res., vol. 23, no. 4–5, human–robot interaction, industrial robotics, and
pp. 363–377, 2004. mechatronics.

Вам также может понравиться