Академический Документы
Профессиональный Документы
Культура Документы
A PROJECT REPORT ON :
Page 1
ALLAHABAD
CERTIFICATE
GESTURE RECOGNITION is the bonafide work of SAURAV KUMAR , HIMANSHU and VIVEK GUPTAwho carried out the project work under my supervision.
SIGNATURE
Page 2
Index
1. Abstract..4 2. Introduction .4 2.1 Problem definition.... 5
3. Literary survey
4. Methodology ..10 5. Work done till mid sem 11 6. Proposed Work till end sem12 7. References .. 17
Page 3
ABSTRACT
This project considers a vision based gesture recognition for imitation based learning. It illustrates a new approach towards understanding of Indian Sign Language (ISL) gesture. The technique used is sufficiently robust to deal with dynamic hand gesture, as provided by standard ISL. Dynamic gestures profoundly improve the communication ability for deaf and dumb persons but, at the same time increases the computational complexities. A video database has to be considered instead of static image database. Complexities are further enhanced when both hands are being used extensively. Orientation histogram has been considered as an important feature since it is orientation and scene illumination variant. A simple and effective algorithm has been developed to calculate edge orientations in the sequence of ISL gestures images. It is being qualified using Euclidean distance metric and K-nearest neighbor methods. The behaviors of classified ISL gestures have been learned by Hidden Markov Model (HMM) technique.
Page 4
INTRODUCTION
Sign Language imparts an absolute way of communication among the deaf communities all over the world. It emphasizes the movement of hands, arms, head and body in a conceptual predetermined way so that a gesture language would be constructed significantly. It carries its own syntactical and grammatical meaning since it is not considered as a universal language. Sign language is one of the major concerns for speech and hearing impaired in order to enhance the communication capabilities. It offers two types of gesture one is static gesture and another is dynamic gesture. ISL can be categorized with the both gestures but more challenging part is to deal with dynamic gestures. A static gesture is particular configuration and position of the hand which is being represented by single image. A dynamic gesture is a moving gesture which is being represented by sequences of images. It employs comprehensive study of complexity problem while classification of highly structural nature of sign language and entire analysis of hand gestures.
1.1 Problem Definition: To develop a good classifier that could be implemented to recognize hand gestures in real time. Hence to accomplish this task project has been divided in 6 modules depending upon their functional requirements. 1) Module 1: sensing or capturing Image input. 2) Module 2: Considering orientation contrast and edge of orientation as feature vectors 3) Module 3: Implementation of SIFT (Scale Invariant Feature Transform) to make operation robust. 4) Module 4: use of LIBSVM tool for classification 5) Module 5: Use classified result of dynamic ISL gesture for providing text or speech output.
Page 5
LITERATURE SURVEY
Gesture recognition is one of the most challenging areas in research with appealing its huge applications. In this literature survey module two different aspects of hand gesture recognition are explored. First one is the hand gesture recognition based on data glove based approach and the other one implies computer vision approach.
The data glove is intended to measure the bending of each joint with a significant accuracy level. But its weak point is that the sideways motion of the fingers is tremendously difficult to measure. Figure 2.1. (a) represents the structure of data glove.
Page 6
The weakness of data glove based technology has been solved by Cyber Glove based approach[01] where finger abduction has been done accurately. It was developed by Kramer (Kramer 89) which is being articulated by strain gauges. Those sensors are placed between the fingers in order to measure the side way motion more accurately along with keep sensing the bending of fingers. Following figure demonstrates the overview of Cyber Glove device.
Fig 2.1.b represents the structure of Cyber Glove The working principle of cyber glove illustrates that it is used to capture the motion as well as position of the fingers and the wrist. It contains upto 22 sensors which include three bend sensors mounted on each finger, four abduction sensorsalong with the rest of the sesors which indicate to measure wrist abduction and flexion plus thumb crossover and palm arch. The KHU-1 data glove[] based approach has been used for 3D Hand motion and tracking and recognition of hand gesture. The data glove is consisted with three tri axis accelerometer type of sensors, one controller along with one Bluetooth. It transmits motion signal to systems through wireless medium via Bluetooth. The kinematic theory is applied to construct 3D hand gesture model in digital space. The rule based algorithm has been utilized in order to recognize hand gestures using KHU-1 data glove and 3D hand gesture based model.
Page 7
gesture recognition system. It shows its capability towards accurate hand and body tracking. This technique is not applicable for all the applications. Rehg and Kanade (1994) proposed an approach based on computer vision which offers to create a model on cylindrical components using stereoscopic camera. Etoh, Tomono and Kishino (1991) familiarized the similar work. The image based system attempts to segment hand image from background images by extracting features such as edges, shape, fingertips and hand orientation. Two researchers Starner and Pentland (1995) have proposed a model for description of
hand shape, based on HMM which is applied to recognize 42 ASL gestures with 99% accuracy.
[2]
human computer interaction with the application to virtual reality. It involves 3D based model where 3D kinematics is used for formulation of hand model with considerable DOFs. This model is being used to estimate the hand parameters. It is being done with the comparisons of the input images and the 2D appearance which is the output of 3D hand model. Another approach implies appearance based method where features from the images are extracted to model the visual appearance of the hand. Then the comparison between the extracted features of the images from the video input is to be performed. This approach enhances the real time performance in order to deal with 2D image features. In the recent trends of research, gesture recognition this vision based approach has achieved a remarkable progress and will enhance its growth in the areas of feature extraction, classification method and gesture representation.
Page 8
METHODOLOGY
1. 2. 3. 4.
Scale -space extrema detection. Keypoint localization. Orientation assignment. Generation of keypoint descriptors.
1.Scale -space extrema detection This is the stage where the interest points, which are called keypoints in the SIFT framework, are detected. For this, the image is convolved with Gaussian filters at different scales, and then the difference of successive Gaussian-blurred images is taken. Keypoints are then taken as maxima/minima of the Difference of Gaussians[5] (DoG) that occur at multiple scales. Specifically, a DoG image is given by
where blur
Hence a DoG image between scales ki and kj is just the difference of the Gaussian-blurred images at scales ki and kj. For scale-space extrema detection in the SIFT algorithm, the image is first convolved with Gaussian-blurs at different scales. The convolved images are grouped by octave (an octave corresponds to doubling the value of ), and the value of ki is selected so that we obtain a fixed number of convolved images per octave. Then the Difference-of-Gaussian images are taken from adjacent Gaussian-blurred images per octave. Once DoG images have been obtained, keypoints are identified as local minima/maxima of the DoG images across scales. This is done by comparing each pixel in the DoG images to its eight neighbors at the same scale and nine corresponding neighboring pixels in each of the neighboring scales. If the pixel value is the maximum or minimum among all compared pixels, it is selected as a candidate keypoint.
2.Keypoint localization Scale-space extrema detection produces too many keypoint candidates, some of which are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data (interpolation) for accurate location and scale. This information allows us to reject points that have low contrast (therefore more sensitive to noise) or are poorly localized along an edge.
Page 11
3.Orientation assignment In this step, each keypoint is assigned one or more orientations based on local image gradient directions. This is the key step in achieving invariance to rotation as the keypoint descriptor can be represented relative to this orientation and therefore achieve invariance to image rotation. First, the Gaussian-smoothed image , the gradient magnitude, differences: at the keypoint's scale is taken so that all at scale
computations are performed in a scale-invariant manner. For an image sample , and orientation,
The magnitude and direction calculations for the gradient are done for every pixel in a neighboring region around the keypoint in the Gaussian-blurred image L. An orientation histogram with 36 bins is formed, with each bin covering 10 degrees. Each sample in the neighboring window added to a histogram bin is weighted by its gradient magnitude and by a Gaussian-weighted circular window with a that is 1.5 times that of the scale of the keypoint.
Indian Sign Language Gesture Recognition Page 12
The peaks in this histogram correspond to dominant orientations. Once the histogram is filled, the orientations corresponding to the highest peak and local peaks that are within 80% of the highest peaks are assigned to the keypoint. In the case of multiple orientations being assigned, an additional keypoint is created having the same location and scale as the original keypoint for each additional orientation.
Fig . Orientation histogram for a keypoint 4.Generation of keypoint descriptors Previous steps found keypoint locations at particular scales and assigned orientations to them. This ensured invariance to image location, scale and rotation. Now we want to compute a descriptor vector for each keypoint such that the descriptor is highly distinctive and partially invariant to the remaining variations such as illumination, 3D viewpoint, etc. This step is performed on the image closest in scale to the keypoint's scale. First a set of orientation histograms are created on 4x4 pixel neighborhoods with 8 bins each. These histograms are computed from magnitude and orientation values of samples in a 16 x 16 region around the keypoint such that each histogram contains samples from a 4 x 4 subregion of the original neighborhood region. The magnitudes are further weighted by a Gaussian function with equal to one half the width of the descriptor window. The descriptor then becomes a vector of all the values of these histograms. Since there are 4 x 4 = 16 histograms each with 8 bins the vector has 128 elements. This vector is then normalized to unit length in order to enhance invariance to affine changes in illumination .
Page 13
Fig .A keypoint descriptor is obtained from the image gradient in a 8x8 neighbourhood
Recognition System:
3. Subsample the image ( to make it real time 4. Find the Orientation histogram 5. Save it as a training pattern . Classification In machine learning, pattern recognition is the assignment of a label to a given input value. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes.Image classification is a complex process that may be affected by many factors.we are intending to use one of following algorithms for image classification.
1.K-nearest neighbour algorithm (K-NN) In pattern recognition, the k-nearest neighbour algorithm[6] (k-NN) is a method for classifying objects based on closest training examples in the feature space. The k-nearest neighbour algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbours, with the object being assigned to the class most common amongst If k=1 the object is simply assigned to the class of the nearest neighbour Algorithm: The training examples are vectors in a multidimensional feature space, each with a class label. After having a testing example first we need to calculate the euclidean distance between all the training samples and the testing sample.
Euclidean distance calculation Let xi be an input sample with p features (xi1,xi2,xi3,...xip) and n be the total number of input samples i= (1,2,3,..n ) and the total number of features The Euclidean distance between sample xi and xj is defined as dl(xi,xj)={(xi1-xj1)2+(xi2-xj2)2+(xi3-xj3)2+....(xip-xjp)2}1/2 l=1,2,3,..p.
After calculating l distances between training samples and new sample we select k minimum values amongst all dl and list the corresponding samples. Assignment Of Class:Now by observing these samples find the class(group) where most of these samples belong and assign the testing sample to that class(majority voting). Here we have used eucliean distance as metric.we can also use absolute distance .
2.Support Vector Machine The standard SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the input which of two possible classes forms the input, making the SVM a non-probabilistic binary linear classifier . [7] Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. A hyperplane is a concept in geometry. It is a generalization of the plane into a different number of dimensions. A hyperplane of an n-dimensional space is a flat subset with dimension n 1. By its nature, it separates the space into two half spaces.In a vector space, a vector hyperplane is a linear subspace of codimension 1. Such a hyperplane is the solution of a single homogeneous linear equation.[8] Support Vector Machines are based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships. A schematic example is shown in the illustration below. In this example, the objects belong either to class GREEN or RED. The separating line defines a boundary on the right side of which all objects are GREEN and to the left of which all objects are RED. Any new object (white circle) falling to the right is labeled, i.e., classified, as GREEN (or classified as RED should it fall to the left of the separating line) . [9]
Indian Sign Language Gesture Recognition Page 16
The above is a classic example of a linear classifier, i.e., a classifier that separates a set of objects into their respective groups (GREEN and RED in this case) with a line. Most classification tasks, however, are not that simple, and often more complex structures are needed in order to make an optimal separation, i.e., correctly classify new objects (test cases) on the basis of the examples that are available (train cases). This situation is depicted in the illustration below. Compared to the previous schematic, it is clear that a full separation of the GREEN and RED objects would require a curve (which is more complex than a line). Classification tasks based on drawing separating lines to distinguish between objects of different class memberships are known as hyperplane classifiers. Support Vector Machines are particularly suited to handle such tasks.
SVM s start from the goal of separatingthe data with a hyperplane , and extend this to non linear decision boundaries using the kernel trick . The equation of a general hyper plane is w*x + b = 0 with x being the point ( a vector ) , w the weights ( also a vector ) . The hyperplane should separate the data , so that w*xk + b > 0 for all the xk of one class, and w*x + b < 0 for all the xj of the other class
Page 17
ISL acquisition technique: The prime focus of this project is to create a repository with huge number of images (sequence of images) of several kind of ISL class/word. Preliminarily ISL dynamic gesture has been recorded with different frame rate per second. ISL video has been captured by selecting several dynamic gestures (i.e. sequence of frames) in real time using handy cam.
Page 18
Obtaining SIFT feature descriptors We have splitted each gesture into a sequence of frames and converted each frame into grayscale image. In order to make the gesture recognition reasonable in an uncontrolled environment we have implemented SIFT algorithm to obtain the feature descriptors for each frame.
We obtained the feature descriptors for each frame that is a 128 (16*8) length vector that describes the location and the orientation of each keyword in that frame.
Page 19
Our proposed classification approach entitles to indicate different modalities of task to reach its target effectively. It enriches to reach its goal by choosing appropriate features among the gathered ISL gestures. The process of feature selection and extraction can be carried out using algorithmic approach. Therefore, it leads to a classification module which is being done by statistical techniques namely known as Euclidean distance metric and K Nearest Neighbor metric. The entire process has been segmented into following sections: Feature Selection for ISL videos
Recognition system
Page 20
REFERENCE
[01] http://www.billbuxton.com/input14.Gesture.pdf [02] Pragati Garg; Naveen Aggarwal; Sanjeev Sofat;, Vision Based Hand Gesture Recognition, World Academy of Science, Engineering and Technology, Vol.49, January 2009. [03] www.aishack.in/2010/05/sift-scale-invariant-feature-transform as on [Jan-15-2012] [04] http://en.wikipedia.org/wiki/Scale-invariant_feature_transform as on [Jan-18-2012] [05] http://en.wikipedia.org/wiki/Feature_(computer_vision) as on [Feb-17-2012] [06] http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm as on [Jan-24-2012] [07].http://www.theopavlidis.com/technology/CBIR/PaperE/AnSIFT1.htm as on [Jan-19-2012] [08].http://en.wikipedia.org/wiki/Support_vector_machine as on [Jan-26-2012] [09]http://www.statsoft.com/textbook/support-vector-machines as on [Feb-23-2012]
Page 21
Page 22