Вы находитесь на странице: 1из 4

2010 Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing

Vision-Based Hand Gesture Recognition Using Combinational Features


Chenglong Yu, Member, IEEE, Xuan Wang, Member, IEEE, Hejiao Huang, Member, IEEE, Jianping Shen, Kun Wu Computer Application Research Center, Shenzhen Graduate School Harbin Institute of Technology Shenzhen, China {ycl, xuanwang, hjhuang, sjp, wk}@cs.hitsz.edu.cn

AbstractThis paper presents a feature extraction method for hand gesture based on multi-layer perceptron. The feature of hand skin color in the YCbCr color space is used to detect hand gesture. The hand silhouette and features can be accurately extracted in means of binarizing the hand image and enhancing the contrast. Median and smoothing lters are integrated to remove the noise. Combinational parameters of Hu invariant moment, hand gesture region, and Fourier descriptor are created to form a new feature vector which can recognize hand gesture. To conrm the robustness of this proposed method, a dataset including 3500 images is built. Experimental results demonstrate that our system can successfully recognize hand gesture with 97.4% recognition rate. Keywords-Hand gesture recognition, Combinational Feature Extraction, Multi-layer Perceptron;

I. I NTRODUCTION Hand gestures are one of the essential and natural ways in the human communication. Hand gesture recognition is one of the most important techniques in human computer interaction. It can be applied to many elds such as entertainment, tele-operated control and virtual human animation, etc. According to different collection devices, the hand gesture recognition systems can be generally classied into the data glove-based gesture recognition system and vision-based gesture recog-nition system. Callejas Bedreqal etc [1] introduce an interval fuzzy rulebased method for the recognition of hand gesture acquired from a data glove, and use it to recognize hand gesture of the Brazilian Sign Language. Reference [2] investigates using the data glove transmitting signals in the form of the hand gestures. In the study of [3], a system is designed and implemented with an articial neural network for recognition of ASL letter and number gestures using Cyber gloves. Compared to data glove-based gesture and many other human-computer interaction methods, vision-based hand gesture has the advantages of intuitive, kind, and easy to use. Vision-based hand gesture recognition is extensively developed in recent years and some different methods have been proposed for hand gesture model reconstruction. Two types of methods, i.e., 3D modeling and appearance modeling, are usually used to identify hand gestures. Reference [4, 5] use 3D models to recognize the hand gesture. However, their methods cannot be implemented
978-0-7695-4222-5/10 $26.00 2010 IEEE DOI 10.1109/IIHMSP.2010.138 543

easily because of the high computational cost. Furthermore, the model parameter estimation is unreliable when the model parameters are extracted using a number of similar process. Appearance-based model is used in the hand gesture model reconstruction. M. Bicego etc [6] propose a new appearance-based 3D object classication method. Reference [7] presents the hand motion recognition method to recognize 28 different hand signs. This paper focuses on the hand gesture feature extraction problem which is a major factor affecting the quality of the hand gesture recognition. We propose a hand gesture recognition method using the combination of features. Firstly, the hand gesture must be segmented from the video sequences. The hand silhouette is segmented using the skin color feature in the YCbCr color space that is aided to classify the hand image into skin color and non-skin color clusters. Furthermore, the hand gesture image should be reprocessed to remove the noise and enhance the contrast using the lters. Secondly, hand gesture features are extracted, and built by combining three features of Hu invariant moment, hand gesture region, and Fourier descriptor. Lastly, MLP is used to recognize the hand gesture. The two experiments are tested, and the results show that the recognition rate achieves 97.4%, and our proposed features extraction method is feasible. The remaining part of this paper is organized as follows: Section 2 describes how to preprocess the hand images in order to obtain the appearance-based hand gesture model. In Section 3, we briey present the methods of extracting the hand gesture features. Section 4 outlines the procedure of the hand gesture recognition. Section 5 gives experimental results. Finally, some conclusive remarks are given in Section 6. II. H AND GESTURE IMAGE PREPROCESSING We preprocess the hand images to effectively extract the features of the hand after the hand gesture is detected to get the hand silhouette. A. Hand Gesture Detection The rst step of the hand gesture identication is the detection of the hand silhouette. In our implementation, the hand silhouette is segmented by means of the feature of the

skin color that assists to classify the hand image into skin color and non-skin color clusters. In general, there are ve common color spaces for object segmentation, i.e., RGB, HSI, HSL, YCbCr, and YUV. Authors in [8] studied the skin color distributions of the same human hand under different lighting conditions in four color spaces that are mentioned above. The results show the YCbCr and HIS color spaces are more suitable for the hand gesture image detection and segmentation than RGB space. We use the common YCbCr color apace to segment the hand gesture in our system. The formula that RGB values are transformed to the YCbCr color space is: Y 16 65.481 128.553 24.966 R 112G . (1) Cb=128+37.797 74.203 Cr 128 112 93.786 18.214 B Figure 1 shows the original RGB image and YCbCr component histograms. The conclusion is made that the YCbCr color space is a proper space for skin color detection and classication in the hand gesture recognition system.

can preserve the sharp edges. Smoothing lter may reduce neighborhood radius to preserve a good smoothing quality. We use the Gauss-Laplace edge detection method to get the hand edge. The experimental result shows the GaussLaplace algorithm is used to effectively implement the hand edge detection. The Gauss-Laplace operator used is template that Figure 2 represents.

Figure 2.

Gauss-Laplace Operator

The hand edges are obtained when the hand images are processed by using the Gauss-Laplace algorithm, and Figure 3 shows the results.

(a) Original hand gesture images (a) RGB Image (b) Y Component Histogram

(b) Images processed using lters (c) Cb Component Histogram Figure 1. (d) Cr Component Histogram

RGB Image and YCbCr Component Histograms

B. Hand Gesture Image Preprocessing We have to binarize hand images and enhance their contrasts so that hand silhouettes and features can be accurately extracted, and simultaneously remove the noise generated in the process of the image transmission and binarization. The rst step in the algorithm of hand image binarization is simple, which the Hue and Satisfaction values of the image pixels using the RGB values will be computed. And in the second step, the pixels are assigned to black color values if the Hue and Satisfaction values belong to the region of the human skin color values, otherwise they are assigned to white. Our proposed method of denoising is the fusion of median and smoothing lters. Median lter is used to remove the impulsive noise from the hand images, and it
(c) The hand edge images Figure 3. Original hand gesture images and images processed by lters

III. H AND G ESTURE F EATURES E XTRACTIION In this process, feature parameters of Hu invariant moment, hand gesture region, and Fourier descriptor are jointly formed to a vector. The hand region feature vector is arearatio, aspectratio, baryratio, handarea, handperimeter. Arearatio value is the ratio that the hand region area is divided by the hand gesture rectangular area. We need to calculate the value of the hand region area and the hand gesture rectangular. The former can

544

be computed by scanning the white pixel sum of the binary image, and the latter obtained by clipping the binary image. Hu invariant moment was created by Hu that was derived from the theory of algebraic invariant. Hu proved that some of the moments had a good translation, scaling invariance and rotation characteristic. Hu invariant moment set are composed of second-order and third-order central moments, and their expressions are as follows. M1 = 20 + 02 M2 = (20 + 02 )
2 2 2

IV. H AND G ESTURE R ECOGNITION Multi-layer perceptron is a common classication of the forward neural network, and used to classify the hand gestures in our application system. S function is commonly used in the multi-layer perceptron in order to facilitate the calculation. Its general expression is the following equation. SC (x) = 1 1 + ecx (12)

(2) (3) (4) (5)


2

Constant c can be chosen at random. However is simplied if c is equal to 1. S(x) = 1 1 + ex (13)

M3 = (30 + 312 ) + (321 + 303 ) M4 = (30 + 12 ) + (21 + 03 )


2 2

M5 = (30 + 312 )(30 + 12 )[(30 + 12 )


2 2

3(21 + 303 ) ] + (321 + 03 ) (20 + 03 )[3(30 + 12 ) (21 + 03 ) ] M6 = (20 + 02 )[(30 + 12 ) 3(21 + 03 ) ] +411 (30 + 12 )(21 + 03 ) M7 = (321 + 03 )(30 + 12 )[(30 + 12 )
2 2 2 2 2 2

(6)

The rst-order derivative formula of can be easily worked out, which the mount of calculation is less signicantly, when SC (x) is calculated from the observation of the below formula. d ex S(x) = (14) 2 dx (1 + ex ) V. E XPERIMENTAL R ESULTS The dataset of hand feature images was classied as a testing set that has 2800 images and a training set that has 700. 3500 images include different hand gesture images of 50 persons in the different backgrounds, and these images are divided into fourteen classes. A. Experiment 1 Five feature parameters of the hand region are extracted in experiment 1. The table I shows the result of the hand gesture recognition using MLP.
Table I G ESTURE R ECOGNITION R ATE BASED ON R EGION F EATURE Classes 0 1 2 3 4 5 6 7 8 9 SoundUp SoundDown TurnOff TurnOn Totals Total Numbers Correct Numbers Correct Rates 200 200 200 200 200 200 200 200 200 200 200 200 200 200 2800 184 193 173 140 102 121 113 169 191 110 182 187 194 128 2187 0.920 0.965 0.865 0.700 0.510 0.605 0.565 0.845 0.955 0.550 0.910 0.935 0.970 0.640 0.781

(7)

3(21 + 303 ) ] (30 321 ) (21 + 03 )[3(30 + 12 ) (21 + 03 ) ]


2

(8)

Low-level moments of the hand gesture are used to determine the rotation radius and directional angle, and highlevel moments are used to describe the details of the hand gesture. The third part of the hand gesture feature is extracted by Discrete Fourier Transform. Fourier descriptor is the Fourier transform coefcient in the shape boundary curve, and used to analyze frequency-domain signals of the object boundary curve. Discrete Fourier coefcient is as follows. 1 z(k) = n 2nk p(1)exp(j ), k = 0, 1, 2, ..., n 1 (9) n

We choose the frontal ten parameters with exception of z(0) to construct a ten-dimension vector. New Fourier transform coefcient z (k) is gained when the position of the curve starting point is moved length , rotated angle , and its size is enlarged times. 2 z (k) = exp(j)exp(j k)z(k) + F (x0 + iy0 ) (10) n Normalized Fourier coefcient can be dened after the modulus value Z(k) and Z(1) of the Fourier transform coefcient are calculated. d(k) = Z(k) , k = 1, 2, ..., n 1 Z(1) (11)

545

B. Experiment 2 Feature parameters are combinarized into a feature vector in experiment 2. The table II shows the result of the hand gesture recognition using MLP on basis of combinarized features.
Table II GESTURE RECOGNITION RATES BASED ON COMBINATIONAL FEATURES Classes 0 1 2 3 4 5 6 7 8 9 SoundUp SoundDown TurnOff TurnOn Totals Total Numbers Correct Numbers Correct Rates 200 200 200 200 200 200 200 200 200 200 200 200 200 200 2800 195 195 197 197 193 196 193 196 200 187 198 199 198 183 2727 0.975 0.975 0.985 0.985 0.965 0.980 0.965 0.980 1.000 0.935 0.990 0.995 0.990 0.915 0.974

color apace, meanwhile, the hand image is binarized and the noise should be removed so as to extract the hand gesture silhouette and features. Secondly, parameters of three parts: Hu invariant moment, hand gesture region, and Fourier descriptor are combined to create a new feature vector. Finally, the hand gesture recognition based on the MLP takes into consideration the characteristic of three features. It is found from the experiments that the hand gesture recognition rate is very promising, and achieves 97.4%. ACKNOWLEDGMENT This work is supported by the National High-tech R&D Program of China (863 Program, No. 2007AA01Z194). R EFERENCES
[1] B. R. Callejas Bedregal, G. P. Dimuro and A. C. Rocha Costa, Interval fuzzy rule-based hand gesture recognition, in 12th GAMM-IMACS International Symposium on Scientic Computing, Computer Arithmetic and Validated Numerics, SCAN 2006, September 26, 2006 - September 29, 2006, Duisburg, United states, 2006. [2] M. G. Ceruti, V. V. Dinh, N. X. Tran, H. Van Phan, L. T. Duffy, T. Ton, G. Leonard, E. Medina, O. Amezcua, S. Fugate, G. J. Rogers, R. Luna, and J. Ellen, Wireless communication glove apparatus for motion tracking, gesture recognition, data transmission, and reception in extreme environments, in 24th Annual ACM Symposium on Applied Computing, SAC 2009, March 8, 2009 - March 12, 2009, Honolulu, HI, United states, 2009, pp. 172-176. [3] C. Oz and M. C. Leu, Human-computer interaction system with articial neural network using motion tracker and data glove, in 1st International Conference on Pattern Recognition and Machine Intelligence, PReMI 2005, December 20, 2005 December 22, 2005, Kolkata, India, 2005, pp. 280-286. [4] M. Kato, Y. Chen and G. Xu, Articulated hand tracking by PCA-ICA approach, in FGR 2006: 7th International Conference on Automatic Face and Gesture Recognition, April 10, 2006 - April 12, 2006, Southampton, United kingdom, 2006, pp. 329-334. [5] H. Guan, R. S. Feris and M. Turk, The Isometric SelfOrganizing Map for 3D hand pose estimation, in FGR 2006: 7th International Conference on Automatic Face and Gesture Recognition, April 10, 2006 - April 12, 2006, Southampton, United kingdom, 2006, pp. 263-268. [6] M. Bicego, U. Castellani and V. Murino, A hidden Markov model approach for appearance-based 3D object recognition, Pattern Recognition Letters, vol. 26, pp. 2588-2599, 2005-1201 2005. [7] Y. Cui and J. Weng, Appearance-based hand sign recognition from intensity image sequences, Computer Vision and Image Understanding, vol. 78, pp. 157-176, 2000. [8] P. Kakumanu, S. Makrogiannis and N. Bourbakis, A survey of skin-color modeling and detection methods, Pattern Recognition, vol. 40, pp. 1106-1122, 2007.

The comparison of two feature extraction methods is given in Figure 4, and the recognition rate obtained using our proposed method is higher.

Figure 4.

Rcognition Correct Rates of Two Features Methods

VI. C ONCLUSION This paper proposes a new technique for the real-time hand gesture recognition which is based on a combinational features and MLP. Firstly, hand gesture region is separated at the preprocessing stage using the feature of the skin color in the YCbCr

546

Вам также может понравиться