Вы находитесь на странице: 1из 6

A Companion Robot with Facial Expressions and Face Recognition

Hsuan-Kuan Huang, Hung-Hsiu Yu


Mechanical Systems Research Laboratories (MSL), Industrial Technology Research Institute (ITRI) Hsinchu, Taiwan, R.O.C. harry.huang@itri.org.tw, godo@itri.org.tw
AbstractThe purpose of this paper is to develop a companion robot, which can display facial expressions and recognize human faces. With the 12 degrees of freedom, the robot can generate various facial expressions. In addition, a face-recognition method is proposed based on combining two complementary matching algorithms (a single-image matching algorithm and a sequentialimage matching algorithm). We have conducted several experiments to test the developed robot. Experiments showed that facial expressions generated by the robot can be identified well. In addition, the accuracy of the face-recognition is higher than 90%. The developed robot shows great potential to be applied for social interaction with the humans. Keywords-companion robot; facial expression; face recognition

Yea-Shuan Huang
Computer Science & Information Engineering Chung Hua University Hsinchu, Taiwan, R.O.C. yeashuan@chu.edu.tw considered by many researchers as the most expressive and complex face robot to date. The researchers at Takanishs laboratory developed a robot called WE-4 (Waseda Eye No. 4). WE-4 simulated four of the five human senses (the visual, auditory, cutaneous and olfactory senses) and realized facial expressions using eyebrow, eyelids, facial color, lips and voice [6]. The robot is able to interact bilaterally with humans. Sosnowski et al. [7] developed a flexible low-cost expression-display, named EDDIE, with 23 DOFs. Actuators are assigned to particular actions of the facial action coding system (FACS). The display of six basic facial expressions is evaluated in a user-study and the results showed that the generated expressions by EDDIE can be recognized well. Berns and Hirth [2] constructed a humanoid robot head, ROMAN, for facial expressions. They adopted a behaviorbased control architecture to realize six basic facial expressions. Every expression is related to one behavior-node. The control of these six behaviors is done similar to the Kismet [1]. Another important area of research on HRI is human face recognition. A lot of research efforts have been devoted to this field, and many face recognition approaches based on a variety of machine learning theorem have been developed already. For example, subspace methods such as Principle Components Analysis PCA [8,9] and Linear Discrimination Analysis (LDA) [10,11] are commonly used which project high dimensional features to low dimensional features and not only faster but also better recognition can be achieved. It is well known that face images are easy to change in color and in shape when there is variation of environment lighting, facial expressions and poses. Therefore, it is apt to be unreliable if recognition is performed based on just a single input image. To obtain a more robust recognition, Yamaguchi proposed the Mutual Subspace Method (MSM) [12] and the Constrained Mutual Subspace Method (CMSM) [13-15], and both methods perform recognition by using multiple sequential images. In MSM, similarity is defined to be the minimum angle between the input subspace and a reference subspace. In CMSM, it further projects each individual subspace including the input and the reference subspaces onto a constrained subspace. According to the projection on the same constrained subspace, it can obtain the features which have good

I.

INTRODUCTION

To serve as a good companion and interact socially with humans, a robot must be able to do more than simply gather information about its surroundings. It must also be able to express its state, so that humans will believe that the robot has beliefs, desires and intentions of its own [1]. In addition, as more than 60% of human communication is conducted nonverbally by facial expressions and gestures [2], it is obvious that one important research topic for human-robot interaction (HRI) is to make a robot that can interact with the humans by generating different kinds of facial expressions. To achieve this objective, there have been several robot faces or robot heads designed and built so far. Researchers at the Science University of Tokyo have developed a human-like robot face that has silicon skin, teeth, hair and a large number of control points [3]. The robot can express the six basic facial expressions (happiness, anger, sadness, surprise, disgust, and fear) defined by Ekman and Friesen [4]. It can also recognize human facial expressions using CCD cameras and reciprocate the same expression back. Breazeal at MIT has developed an expressive anthropomorphic robot called Kismet [1]. It can engage with people in expressive face-to-face interaction. It has a total 15 degrees of freedom (DOF) to move its facial features (e.g. eyelids, eyebrows, lips, and ears) to generate various facial expressions. After developing Kismet, Breazeals group at MIT further developed the robot, Leonardo [5]. The robot has 61 DOF, where 32 are located in the face. This has enable Leonardo to show near-human facial expressions. It is

978-1-4244-5046-6/10/$26.00 c 2010 IEEE

216

discrimination ability among classes and are insensitive to face poses and expressions. Basically, the feature derived from a single image denotes the location of this image in a high dimensional feature space. In the feature space, the locations of two highly similar images will be in general close to each other, and the locations of two distinct images then will be quite separated apart. Therefore, recognition based on a single image mainly measures the distance (or similarity) of the features between the input pattern and the reference patterns. However, the feature derived from a set of sequential images of the same person can present the unique variation model of this person. Therefore, recognition based on sequential face images can compare the specific variation pattern of the unknown person and that of each enrolled user. The two kinds of recognition methods seem to be complementary in nature. With this understanding, it will be very useful if both methods are combined together. In this paper, a companion robot with 12 DOF for facial expressions and face recognition is presented. In Section 2, system overview of the robot is given, including its mechanics and system architecture; Section 3 describes two recognition models; the first is based on a single image and the second is based on sequential images. A linear mechanism is also proposed to integrate both recognition scores; the experiment of the robot for facial expressions and face recognition is described in Section 4; conclusions are given in Section 5. II. SYSTEM OVERVIEW

eyes movements (See Figures 3 and 4). There are two LED lights mounted inside the cheek of the robot, which are used to indicate the flushing expression or signaling someone in front of it. The camera and microphone are encapsulated in the eyes and the horn respectively, which is used as input device of face recognition and voice recognition. The power of the companion robot can be supplied either from the DC 12V battery or the AC adapter, which is selectable by a switch on the back panel.

Size:30x30x50 cm3
MIC Eyes : 4 DOF Eyes Hand

Hand contact point

Cheek LED Neck:2 DOF


base

Figure 1. The companion robot

A. Mechanics The robot has a size of 30 x 30 x 50 cm3 (See Figure 1) and is composed of two main parts: the movable head and the stationary body base. The eye module contains three parts: upper eyelid, eyeballs and eye-socket. The upper eyelid can move 40 upwards and 40 down. The eyeballs are able to move 67.5 in the horizontal direction and 40 in the vertical direction. The both eyeballs can also rotate eccentrically along the perimeter of the eye sockets for 45. The neck has 2 DOF. The first degree of freedom is the rotation over the vertical axis. The range of this rotation for the neck is 30 The second degree of freedom is the inclination of the neck over the horizontal axis for nodding its head down for 15. B. System Architecture The control system contains a x86-based all-in-one motherboard with our developed application software run on own built Microsoft Windows XP Embedded (XPe) operating system. All peripherals are connected to the host through the pinhead of standard ports such as USB, serial, audio/video, which comprises of a webcam, motor driver, digital I/O, speaker and microphone (See Figure 2). There are in total 12 AI motors to actuate the robot, including 2 for hand (up/down), 2 for head (up/down and left/right), 4 for each eye (eyeball up/down, left/right and upper eyelid up/down, rotating left/right). With the aid of the 4 DOF from the eyeball and its upper eyelid, the robot can present various facial expressions vividly and even some expressions which are funny but difficult to perform by the humans due to the restriction of their

Figure 2. System architecture of the companion robot

III.

FACE RECOGNITION

In this section, we describe the proposed face recognition framework which integrates a single-image matching module and a sequential-image matching module. For pattern matching, the single-image matching module uses the Euclidean distance metric, and the sequential-image matching

2010 5th IEEE Conference on Industrial Electronics and Applicationsis

217

feature vector. With PCA, an rno-dimensional reference subspace can be constructed from R1 , , Rr , and an snodimensional input subspace can be constructed from {I1 , , I s } respectively. Therefore, is an rno f matrix

(a)

(b)

(c)

and is an sno f matrix. In general, the relations of r, s, rno and sno are chosen to be rno r, sno s and rno sno. We can further obtain rno canonical angles 1 , , rno between

subspace

and subspace

by the following equations:

XC = C

(1)
i k k j

X = (xij ), xij =
(d) (e) (f)

rno k =1

( )( )

(2)

where

Figure 3. The six basic facial expressions generated by the companion robot: (a) = happiness, (b) = anger, (c) = sadness, (d) = surprise, (e) = disgust, (f) = fear.

orthonormal basis vector of subspace and , is an eigenvalue of X and C is the eigenvectors of X, and X is an rno rno matrix. The value cos 2 i of the i-th smallest canonical angle equals to the i-th largest eigenvalue of
2

and

denote respectively the i-th f-dimensional

. The largest

eigenvalue (i.e. cos 1 ) is taken to denote the similarity between subspace and . Figure 5 shows a schematic diagram of the canonical angle between two subspaces.

(a)

(b)

(c)

(d)

(e)

(f) Figure 5. Canonical angle between two subspaces

Figure 4. The other facial expressions generated by the companion robot: (a) = sleepiness, (b) = innocence, (c) = disregard, (d) = nervousness, (e) = dizziness, (f) = doubtfulness

module uses a CMSM (Constrained Mutual Subspace Method) metric. For making the final decision, a weighted sum is used to combinethe two matching scores. This section consists of four subsections. The first introduces the concept of the canonical angle which can reflect the similarity of two subspaces. The second explains how the constrained subspace is generated. The third describes matching in the constructed constrained subspace. The fourth states the matching of a single image which adopts an Euclidean distance in a LDAtransformed reduced feature space. A. Concept of Canonical Angle In linear algebra, the similarity between two subspaces is calculated by the angle between them. Suppose R1 , , Rr is a set of r reference patterns, {I1 , , I s } is a set of s input patterns, and each pattern is represented by an f-dimensional

B. Generation of Constrained Subspace In CMSM, it is essentially important to generate a proper constrained subspace C which contains the effective matching components but eliminating the unnecessary ones. By projecting the input subspace and reference subspaces to a constrained subspace, it could extract discriminating features for recognizing pattern classes. Suppose there are in total Np reference subspaces. To generate a constrained subspace, we compute the projection matrix j of the j-th reference subspace using

Pj =

rno

k =1

kj ( kj )

(3)

where rno is the number of eigenvectors of a reference

218

2010 5th IEEE Conference on Industrial Electronics and Applicationsis

subspace, k is the k-th orthonormal basis vector of the j-th


j

reference subspace, and each Pj is a f f matrix. Then, we calculate the eigenvectors of the summation matrix S = P1 + P2 + + PN P , that is SA = A , where and A denote the eigenvalues and the eigenvectors of S respectively. Finally, the t eigenvectors [A1,,At] corresponding to the t smallest eigenvalues are selected to construct the constrained subspace CS (that is CS=[A1,,At]tf). For a more detailed description of CMSM, please see [13]. C. Matching on Constrained Subspace Suppose there are in total K recognition classes. denotes the input subspace derived from the input sequence samples, i K) denotes the subspace derived from the and i (1 training sequence samples of class i. Five steps need to be performed for pattern matching as follows: 1. 2. 3. 4. 5. Project each i onto CS and generate an rnot projection matrix Pi; Normalize each Pi, and with a Gram-Schmidt algorithm derive a reference subspace i; Project onto CS and generate an snot projection matrix Q; Normalize Q, and with a Gram-Schmidt algorithm derive the input subspace ; Compute the similarity between and i by using the canonical angle computation described in subsection A.

the transformed input feature vector and R be a transformed reference feature vector, d be the transformed feature dimension, then sim distance be the distance between I and R which is defined as

simdistance =

d i =1

( I i Ri )

(5)

In our proposed recognition method, there are two matching modules, and the final decision is made by combining their matching results by a weighted sum scheme. The similarity of the image-sequence matching module is calculated by the 2 smallest canonical angle cos 1 , and the similarity of the single-image matching module is calculated by Euclidean distance. Suppose simangle and simdistance denote the canonical angle metric and the Euclidean distance metric, respectively. The integrated value of similarity is calculated as

similarity = 1 simangle + 2 1

simdistance

(6)

where 1 and 2 are the combining weights of the two matching scores, and is a normalized parameter.

IV.

EXPERIMENT

D. Matching in LDA-transformed Space Besides matching by CMSM with sequential images, there is another matching based on a single image. The image smoothed by the Anisotropic Smootging Transform (AST) algorithm proposed in [16] is first normalized to a fixed size (such as 3636 in this paper), then it further forms a feature vector. To speed up the feature matching and obtain a better recognition accuracy, the Linear Discrimination Analysis (LDA) algorithm is adopted which computes a linear transformation W* by maximizing the following criterion:

W * = arg max W

det (W T SbW ) det (W T S wW )

(4)

where Sb is the between-class scatter matrix and Sw is the within-class scatter matrix. If Sw is non-singular, then in that case the ratio of Sw-1Sb is maximized and W* is computed. Whenever W* is decided, the original feature vectors are then multiplied with the transpose of W* to generate the projection coefficients which are further used to form the transformed feature vectors with a much small feature dimension. Let I be

A. Facial Expressions The experiment was done similar to the Kismet-Project [1] and ROMAN-Project [2], and its set-up was as follows: we presented 6 pictures with facial expressions of the companion robot to 32 persons (16 men and 16 women) at the age of 25 to 57 years. Every person has to determine the correlation between presented expression and the 6 basic facial expressions with the levels 1 to 5 (1 means a weak correlation and 5 means a strong correlation). For example, if the person thinks the picture is closely related to anger, he/she can give a higher level (5 or 4) in that category. The results of the experiment would assist us to get more information of the recognition and demonstration of the facial expressions. In addition, facial expressions of the robot can be modified based on these results. The results are shown in table I. The left column contains the shown facial expressions. The right column contains the average values of the detected correlation (1~5) between the 6 basic facial expressions and the current pictures. The average values of the detected correction were evaluated by the statistical tests. The results of the evaluation show that the correct recognition of the facial expressions for happiness, anger, sadness, surprise and disgust is statistically significant (P-value < 0.05), which means that these expressions generated by the robot can be differentiated well. However, the facial expression for fear is not identified clearly. Compared to Ekmans experiment [4] for the recognition of human facial expressions, the results are quite similar in the evaluation of basic facial expressions. To further improve the perception of

2010 5th IEEE Conference on Industrial Electronics and Applicationsis

219

the facial expressions, it is planned to add eyebrows and the movement of the mouth generated by the LEDs.
TABLE I. THE RESULTS OF THE EXPERIMENTAL EVALUATION FOR THE PICTURE OF THE SIX BASIC FACIAL EXPRESSIONS Detected strength Happiness: 3.5, Anger: 1.3, Sadness: 1.0, Surprise: 2.3, Disgust: 1.1, Fear: 1.4 Happiness: 1.0 , Anger: 4.1 , Sadness: 1.4, Surprise: 1.2, Disgust: 3.0, Fear: 1.3 Happiness: 1.0, Anger: 1.3 , Sadness: 4.0, Surprise: 1.2, Disgust: 1.4, Fear: 2.0 Happiness: 2.4 , Anger: 1.3 , Sadness: 1.1, Surprise: 3.4, Disgust: 1.2, Fear: 1.8 Happiness: 1.0, Anger: 2.8 , Sadness: 1.4, Surprise: 1.3, Disgust: 3.5, Fear: 1.1 Happiness: 1.0 , Anger: 1.2 , Sadness: 2.3, Surprise: 1.5, Disgust: 1.8, Fear: 2.5

denote the number of decisive recognition which belongs to the enrolled persons. Then

Recognition rate =

C_no 100%. D_no

Presented expression Happiness Anger Sadness Surprise Disgust Fear

The experimental result shows clearly that the proposed method is superior to the other two methods.

B. Face Recognition We used the famous Banca face database to evaluate the performance of the proposed recognition method. The Banca database contains 52 individuals and each individual has 12 image sequences that were taken in different time, at different locations and by difference cameras. Each image sequence consists of 10 face images with various facial poses and facial expressions. To simplify the problem, only 4 image sequences of each individual taken in different time but at the same locations and by the same camera are used in this experiment. Among the 4 image sequences, only one image sequence is used in the training stage, and the other three are used in the testing stage. Among the 52 individuals, the image samples of the first 12 are used to construct a constrained subspace, and the image samples of the other 40 individuals are used to generate the reference models and to evaluate the recognition performance. According to the manually marked eye positions, face images are extracted. Each extracted face image is applied first by the AST algorithm [16] and then resized to 36 x 36 pixels. In the experiment, the constrained subspace was constructed with 36 training subspaces, rno is set to be 9 and t is set to be 1000. From the 40 persons, we randomly selected 35 persons for training, and used all 40 persons for testing. In order to obtain unbiased investigation, we performed the face recognition experiment one hundred times. Finally, the average performance of the one hundred experiments was reported. The experiment results are evaluated by False Rejection Rate (FRR) and False Acceptance Rate (FAR). Figure 6 shows the recognition results of the proposed method and those of CMSM and the used single-image classifier. The recognition rate of the proposed method with no rejection rate is 99.1%, and with a 7.9% false rejection rate it is 90.1% recognition rate. Figure 7 shows the performance of FAR vs. recognition rate. Let C_no denote the number of decisive recognition which is correct and belongs to the enrolled persons, and D_no

Figure 6. Performance

Figure 7. FAR vs. Recognition rate

V.

CONCLUSION

In this paper, the development of a companion robot for facial expressions and face recognition is presented. In the experiments with a group of people, it is shown that the generated facial expressions are in general classified correctly. In addition, a face recognition method by integrating both single-image and image-sequences matching modules is proposed. Experiments have shown that the proposed method can achieve a very promising recognition accuracy (99.1%) for the famous Banca face database. Thus, the developed robot has a great potential to be used for social interaction with humans. In the future, a broader variety of expressions (as

220

2010 5th IEEE Conference on Industrial Electronics and Applicationsis

shown in Figure 4) other than Ekmans six basic facial expressions will also be studied in order to cover most of the important expressions in the daily life.

[7]

[8]

ACKNOWLEDGMENT The authors would like to thank both the Ministry of Economic Affairs, R.O.C. for the funding support of the project 8453UXQ100, and the National Science Council, R.O.C. for the funding support of the project NSC 98-2221-E-216-029. REFERENCES
[1] [2] C. L. Breazeal, Emotion and sociable humanoid robots, International Journal of Human-Computer Studies, vol. 59, no.1-2, pp.119-155, 2003. K. Berns and J. Hirth, Control of facial expressions of the humanoid robot head ROMAN, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 2006, pp. 3119-3124. F. Hara, H. Akazawa and H. Kobayashi, Realistic facial expressions by SMA driven face robot, Proceedings of the IEEE International Workshop on Robot and Human Interactive Communication, 2001, pp. 504-511 P. Ekman, W. Friesen, Facial Action Coding System, Consulting Psychologist Press, Inc. 1978 http://robotic.media.mit.edu/projects/leonardo/leo-intro.html H. Miwa, K. Itoh, M. Matsumoto, M. Zecca, H. Takanobu, S. Roccella, M. C. Carrozza, P. Dario, and A., Takanishi, Effective emotional expressions with emotion expression humanoid robot WE-4RII, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004, pp. 2203-2208.

[9] [10]

[11] [12] [13] [14]

[3]

[4] [5] [6]

[15]

[16]

S. Sosnowski, A. Bittermann, K. Kuhnlenz, and M. Buss, Design and evaluation of emotion-display EDDIE, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 2006, pp. 3113-3118. Matthew A. Turk and Alex P. Pentland,Eigenfaces for recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 7186, 1991. Lindsay I Smith, A tutorial on Principal Components Analysis, http://www.cs.otago.ac.nz/student.tutorials/principal_component.pdf. Li-Fen Chen, Hong-Yuan Mark Liao, Ming-Tat Ko, Ja-Chen Lin, and Gwo-Jong Yu, A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition, vol. 33, pp.17131726, 2000. Hua Yu and Jie Yang, A direct lda algorithm for high-dimensional data with application to face recognition, Pattern Recognition, vol. 34, pp. 20672070, 2001. O. Yamaquchi,K. Fukui and K.maeda, Face Recognition using Temporal Image Sequence, Pro, IEEE Int'l Conf. on Automatic Face and Gesture Recognition, pp. 318-323, 1998. K. Fukui, and O. Yamaquchi, Face Recognition using Multi-viewpoint Patterns for Robot Vision, 11th Symp. of Robotics Reseach, 2003, pp192-201. Masashi Nishiyama, Osamu Yamaguchi, and Kazuhiro Fukui, Face Recognition with the Multiple Constrained Mutual Subspace Method, Audio- and Video-Based Biometric Person Authentication, 5th International Conference, 2005, pp.71-80. Kazuhiro Fukui1, Bjrn Stenger, and Osamu Yamaguchi, A framework for 3D object recognition using the kernel constrained mutual subspace method, Asian Conference on Computer Vision (ACCV), 2006, pp.315324. Yea-Shuan Huang, Wei-Cheng Liu, and Fang-Hsuan Cheng Face Recognition by Combining Complementary Matchings of Single Image and Sequential Images, IAPR Conference on Machine Vision Applications, 2009, pp.253-256.

2010 5th IEEE Conference on Industrial Electronics and Applicationsis

221

Вам также может понравиться