DeBoisset Report PDF

HAND GESTURE RECOGNITION
Gradient orientation histograms and

eigenvectors methods
Bertrand de BOISSET
bertranddeboisset@hotmail.com
FRAUNHOFER INSTITUT
INSITITUT GRAPHISCHE DATENVERARBEITUNG
Fraunhoferstrae 5
D-64283 Darmstadt
Supervisor:
Didier Stricker
Examiner:
Didier Stricker
1
Declaration
I hereby declare that this dissertation and the work described in it is my own work, except
where otherwise stated, done only with the indicated sources. All the parts, which were
inferred from the sources, are marked as such. It has not been submitted before for any
degree or examination, at any other university.
DARMSTADT, June 15th 2006
Ehrenw ortliche Erkl arung
Hiermit versichere ich, die vorliegende Diplomarbeit ohne Hilfe Dritter und nur mit den
angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus den
Quellen entnommen wurden, sind als solche kenntlich gemacht worden. Diese Arbeit hat
in gleicher oder ahnlicher Form noch keiner Pr ufungsbeh orde vorgelegen.
DARMSTADT, 15. Juny 2006
D eclaration
Je d eclare que le rapport r ealis e ainsi que le travail d ecrit dans ce document est un travail
personnel, sauf contre-indication, r ealis e avec laide des sources cit ees dans la bibliogra-
phie. Toutes les parties qui sont reprises sont indiqu ees en tant que telles. Ce projet na
jamais et e pr esent ee pour aucune autre examination auparavant dans aucune autre univer-
sit e.
DARMSTADT, 15 Juin 2006
2
Abstract
The aim of this work is to implement different methods to make gesture recognitions. The
main parts of my work were:
First the analysis of the different ways to realize gesture recognition.
Then to implement the Gradients histogram recognition. This method consists in
calculating gradients in a picture and then construct histograms of gradients orien-
tation.
We also took a closer look on the algebraical analysis of an image, by searching the
principal components that denes a set of pictures (eigenvectors in the space of the
data set). This second method is called PCA (Principal Component Analysis).
Then, to nish the project, we had to analyze the different methods implemented,
by performing different tests. After that, We could dene the best and worst points
of each method. We also realized a small application to illustrate our work.
Acknowledgments
I would like to thank my supervisor, Alain Pagani, for his enthusiasm, help and guidance
throughout this project. I would also like to thank Didier Stricker, who supervised my
work during this period. And, I will not forget:
F. Merienne, C. P` ere, M. Moll, H. Wuest,F. Vial... They all helped me to nish this
project in time and gave me some pieces of advice when i needed.
All the members of the Department for Virtual and Augmented Reality (A4) of the
Fraunhofer IGD for providing an interesting and stimulating working environment.
3
Contents
1 Project Aims 6
2 Theory and backgrounds 7
2.1 The database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The Gradient based method . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 9
2.4 The Principal Component Analysis -PCA- method . . . . . . . . . . . . . 10
2.4.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Mathematical Backgrounds . . . . . . . . . . . . . . . . . . . . 10
2.4.3 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 18
3 Implementation and explanation 21
3.1 simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Realization of the method . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 21
3.2 Histograms of oriented gradients method . . . . . . . . . . . . . . . . . . 21
3.2.1 Step 1: Gradient magnitude calculation . . . . . . . . . . . . . . 22
3.2.2 Step 2: Gradient orientation calculation and magnitude threshold . 24
3.2.3 Step 3: Gaussian lter operator . . . . . . . . . . . . . . . . . . 26
3.2.4 Step 4: Euclidian distance comparison . . . . . . . . . . . . . . . 27
3.2.5 Step 5: Establish a comparison matrix . . . . . . . . . . . . . . . 30
3.2.6 Problems encountered . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.7 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 39
3.3 PCA or Eigenfaces method . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Step 1: Realize the database . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Step 2: Subtract the mean . . . . . . . . . . . . . . . . . . . . . 42
3.3.3 Step 3: Calculate the covariance matrix . . . . . . . . . . . . . . 42
3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose the
good eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4
CONTENTS 5
3.3.5 Step 5: Realize the new Data set and compare . . . . . . . . . . . 43
3.3.6 Conclusion on this method . . . . . . . . . . . . . . . . . . . . . 44
4 Tests, results and analyze 45
4.1 The application: Rock Paper Scissors Game! . . . . . . . . . . . . . . . 45
4.2 Test and choices of the parameters . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Choice of the size of the derivative lter and the number of box
for the gradients method . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 Choice of the number of pictures and the size of images for the
data set for both methods . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Last tests to explain the efciency of each method . . . . . . . . . . . . . 60
4.3.1 First tests: Recognition percentage of each method in general . . 61
4.3.2 Second tests: Recognition percentage of each method in different
conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Conclusion: advantages and drawbacks 74
A Tables of general tests 78
B Tables of specic tests 85
C Script of the game 90
List of gures 118
Bibliography 120
Chapter 1
Project Aims
We can dene the goal with a simple question: How could we command some different
applications just by a singular hand gesture? The aim of my nal year project is to an-
swer this question by studying different methods that allow you to realize a hand gesture
recognition. Moreover, the recognition has to be done by one camera and in real time, so
that you can operate as fast as you want to.
To begin, we had the idea to realize a simple subtraction between two images, pixel
per pixel to compare them. We will see in the second Chapter the results of that.
Then we studied a method that used Gradients. The aim is to build the orientation
histograms of the different pictures and to compare them. We will take a closer look
on this method in the chapter 3.
After that, we implemented a method called PCA (principal Component Analy-
sis) or Eigenface. The goal is to calculate, nd and study the eigenvectors of the
different pictures and then to express each image with its principal components
(eigenvectors). The difcult part was to nd a way to compare the images through
the expression of them with the eigenvectors (as it is done in the Eigenface -face
recognition-)
Last we created a small application to illustrate the different working methods.
6
Chapter 2
Theory and backgrounds
Before explaining the theory of the different methods, we will just say the main idea of
the methods.
In fact, we realized a database of different hand gestures and we labeled all the data set
pictures so that each picture is classied. Then, the aim is to compare an unknown image
with an image of the database and identify the label by taking the nearest image label
back.
Therefore, we will see in the rst part how we choose our database and how we dened
it.
2.1 The database
In this section, we will take a closer look on the database.
At the beginning, we had two main questions about the creation:
Which hand gestures should we choose?
How many pictures of each gesture should we take?
With these questions we could answer by two different kind of database for the same
gestures choose:
Take lots of pictures of the different hand gestures to realize a huge database, so that
the recognition will be better (and it is a way to reduce the limits of the different
methods: we will have much more chances to nd an image in the database that
looks like the gesture to analyze ). The problem is that it will be longer to look for
the pictures in the database during the comparison.
7
CHAPTER 2. THEORY AND BACKGROUNDS 8
Take few pictures of the different gestures to realize the database quickly. Then it
will be easier to create the database for the user, and it will be quicker to look for a
picture in the database (during the comparison). The problem is that the recognition
will be harder if the gestures are similar.
Therefore, during the project, we begun by taking 5 positions of the hand ( 1, 2, 3, 4
and ve ngers) and lots of pictures in the database. We had good results, but the calcu-
lation time was huge. Therefore, we decided to change the database creation by taking a
minimum of pictures of new hand gestures (3 positions that were really different: scissors,
paper and rock).
We will study later the results returned. We will see in details why we change the positions
and the database. What is important now is to understand with which kind of database we
realized this work.
2.2 The simple subtraction method
The aim of this method was to try a simple way to compare images and then to explain and
justify why we had to implement other methods. We will not take a deeper look on this
method here. The theory is really simple: subtract the different images pixel per pixel,
and then compare the results and show the closest one.
We will see the different tests in the last chapter (4) and a short sum up oh this method in
the next chapter about implementation (3).
2.3 The Gradient based method
In order to study the hand gesture recognition, we will study the theory of the gradient
orientation histogram method. In this section, we will take a closer look to dene the aim
of this method and the main steps to implement it.
2.3.1 The goal of this method
First of all, the aim of this method is to recognize different hand gestures (without cap-
tors). These hand gestures must be clearly identied in order to command any kind of
applications.
The theory of this method is to study the gradients in the image and to analyze it to realize
an orientation histogram. Then the goal is to compare the histogram to return the label of
the nearest image.
2.3.2 Main steps of the method
In order to implement this method, we begun by reading some different articles on the
subject, which are really interesting and useful to know the directions to go to: [25] and
[24].
We can then split up into its main parts the method:
First of all, we had to implement the gradient magnitude calculation. The aim is
to dene where in the picture the biggest gradient magnitudes are. Then, it will be
easy to apply a threshold on the gradients in order to keep the really interesting one
and to cut all the background noise. To realize this part, the theory is to calculate
the magnitude with the formula:
magnitude =
_
dx
2
+ dy
2
Therefore, we have to calculate the derivative of the image in x and y to have the
magnitude. We will have to choose a size for the derivative lter (in any case, we
will choose a circle derivative lter).
Then, we implemented the gradient orientation calculation. The goal is to realize an
histogram cut in 36 bins (each 10 degrees) or more (we will study the inuence later
in the chapter 4). To realize this histogram, we will have to calculate the gradient
orientation dened by the formula:
orientation = Arctan(dy/dx)
Therefore, with this formula, it will be possible to know the orientation of the gra-
dients in the image. We can see that for both magnitude and orientation we will
need the derivative of the image.
With this histogram, we can then have a vector of gradient orientations, which is
dening the picture quite good. So, this second step is the part that will allow
us to compare the images between them. It is a way to dene the form with an
appropriate precision.
Also, we had to realize a Gaussian lter to blur the image and have an homoge-
neous picture. It will permit to obtain better results in the gradient magnitude and
orientation calculation. The goal of this lter is to erase the background defects. We
can say that for this method, it is really important to have an uniform background
to avoid noise. To make the background more uniform and to erase white pixels,
we realized this lter. It will permit to obtain better results.
We created a gradient magnitude threshold which had to erase the lower levels
gradients in order to keep the really interesting ones. That will cut all the noise
and regularize the background. This part will be complementary with the gaussian
lter. The gaussian lter will blur the big defects (but it will still be there), and the
threshold will cut the lowest magnitudes. Then the noise will be quite well cut.
Then, the next step was to calculate the euclidian distance between the vectors of
the different images analyzed. This part is made to compare the different pictures,
by comparing the different histograms. This is the nal step. With this, we are able
to recognize the different gestures.
To conclude, we can say that this method does not require special mathematical back-
grounds. Therefore, once we understood the main way to realize it, we just had to imple-
ment it (3).
2.4 The Principal Component Analysis -PCA- method
In this section, we will still study the hand gesture recognition, but we will need some
mathematical background to understand what we made. This method is called: PCA or
Eigenfaces.
So, we will take a deeper look to understand the mathematical backgrounds, the aim of
this method, and the principal parts for realization.
2.4.1 The goal of this method
The Principal Components Analysis (PCA)will also be used for our gesture recognition.
It is a useful statistical technique that has found application in different elds (such as
face recognition and image compression). This is also a common technique for nding
patterns in data of high dimension too. Before realizing a description of this method, we
will rst introduce mathematical concepts that will be used in PCA. Here, we will speak
about standard deviation, covariance, eigenvectors and eigenvalues. This background
knowledge is made to make the PCA section easier to understand, but can be skipped
if the concepts are already known. There are examples all the way through this kind of
lesson to illustrate the concepts explained.
2.4.2 Mathematical Backgrounds
This section will attempt to give the elementary mathematical background that will be
required to understand what is the Principal Components Analysis. We will try to realize
a kind of sum up of the principal knowledge used in the PCA method. Each parts is inde-
pendent from the other. We can notice that the goal of that is to understand the principal
lines of the method and especially to understand why this method is used and what signify
the results returned. We will not use all the backgrounds knowledge described here, but
the different section will provide the grounding of the main skills required.
Therefore, we will rst take a quick look on Statistics, and especially on the spread of
data and on the distribution measurements. Then, the other section is on Matrix Algebra
and looks at eigenvectors and eigenvalues (important properties of matrices that are more
than fundamental to PCA).
Statistics
What we will see about statistics is how to analyze a big set of data and how to nd and
understand the relationship that we have between the elements of the data set. In this
section, we will take a look on the measurements we can perform on a data set and what
they tell us about the data.
Standard deviation First of all, we will see closer what is the Standard deviation. In
statistics, we generally use samples of population to realize the measurements. The results
returned on this sample will permit to have an overview of the possible and most likely
results that we could have if we make the same test on the entire population. Therefore,
we just extend the sample results to the entire population. To explain it clearly, we will
create a data set and assume that it is just a sample of a larger data set (it is not used in
our project, but it will help us to understand easily the concept).
Here is an example set:
X = [1 2 4 6 12 15 25 45 68 67 65 98]
For the notation, we will use the symbol X to refer to the entire sample and we will
use the symbol X
n
to indicate a specic data of the sample. Therefore, X
3
refers to the
3rd number in X (we can notice that X
1
will be the rst data and not X
0
). Therefore, with
this kind of samples, we can realize many calculations that will give us information about
the set. For example, we can rst calculate the mean of the set. As it is really simple, we
will just give the formula of it but we will not describe further.
X =
n
i=1
X
i
n
It is important to note that we will call

X the mean of the set X. The mean of the data
set will not give us so many indications, apart of the middle point.
For example, we can have the same mean for two really different data sets. Therefore, we
will see what is important to better dene the data sets below:
[0 8 12 20] and [8 9 11 12]
Here, what is really different between the two sets is the standard deviation. This
is a way to measure the spread out of the data in a set. Here is the denition of the
standard deviation: This is the square of the add of the average distance from the mean of
the set to the point, divided by n 1, when n is the number of points in the set. Here is
the formula:
s =
n
i=1
(X
i

X)
2
(n 1)
Where s is the usual symbol for standard deviation of a sample.
We can wonder why we are dividing the sample by n 1 and not by n. We will not give
any explanations of that here, because it would be too long to explain, and it is not impor-
tant for our project. But what is important to remember is that when we use a sample of
a population and that we want an approximation results for the entire population, then we
will have to use n 1. But if we calculate the standard deviation on the entire population
directly, then we will have to use n instead of n 1. We can nd further information on
the web site http://mathcentral.uregina.ca/RR/database/RR.09.95/weston2.html
This page is explaining a bit more about standard deviation and about the differences
between the denominators choice. It also gives interesting experiments which are well
describing the difference between the samples or population used and therefore on the
denominators choose.
We will draw tables of the standard deviation calculation for the 2 sets written upper.
Set 1:
X (X

X) (X

X)
2
0 -10 100
8 -2 4
12 2 4
20 10 100
Total 208
Divided by (n1) 69,333
Square root 8,3266
Set 2:
Xi (X
i

X) (X
i

X)
2
8 -2 4
9 -1 1
11 1 1
12 2 4
Total 10
Divided by (n1) 3,333
Square root 1,8257
As expected, the rst set has a much bigger standard deviation as the second one. In-
deed, the rst data set has really spread out data instead of the second one.
We can just watch quickly another set, which will have a standard deviation of zero:
[10 10 10 10]
Here, the standard deviation is equal to zero, although the mean is still of 10. This is
because all the points are the same so the data are not spread out. None of them deviate
from the mean.
Variance Variance is another measure of the spread out of data in a set. In fact it is
quite the same as the standard deviation.
We can take a look on the formula:
s
2
=
n
i=1
(X
i

X)
2
(n 1)
We can notice that this is just the square of the standard deviation (thats why the sym-
bol s
2
is used).
Usually, we use the symbol s
2
for the variance of a sample. The variance is just another
way of measuring the spread out of data in a sample. We can say that the variance is less
used as the standard deviation. In fact, the variance will be useful for the next section
which is the covariance.
Covariance The covariance will differ from the two rst measurements explained in
the upper sections on one principle way: the covariance is a 2-dimensional measurement.
The covariance is a really important knowledge for the PCA method, because we will
need this calculation later.
So, the calculation of standard deviation or of variance will be useful in the case of one
dimension data set, like the set of the marks obtained by all the ENSAM students for their
FYP (Final Year Project). But, for the PCA method, which will deal with more dimen-
sions, we will need the covariance and not the variance knowledge.
The covariance will allow us to see if there are any relationship between the different
dimensions of the data set. For example, we could realize a 2-dimensional set of the
marks obtained by the ENSAM students and their age. Then, we could see if the age has
an effect on the mark received by the student. It is exactly the kind of test that we could
perform with the covariance (We can yet imagine where we want to go with that in our
project: watch if our different pictures are in relations or not).
The covariance formula is really near from the variance formula. We can write the vari-
ance formula like this, to better understand the covariance formula:
var(X) =
n
i=1
(X
i

X)(X
i

X)
(n 1)
Now we can take a look on the covariance formula:
cov(X, Y ) =
n
i=1
(X
i

X)(Y
i

Y )
(n 1)
We can just notice that if we try to calculate the covariance between a dimension and
itself, we will get the variance.
In fact, we just replace the second part of the formula with the second dimension to ana-
lyze to obtain the covariance formula!
We can also say that it is possible to calculate the covariance between more dimensions
than two. We can calculate covariance on three dimensions for example. The lonely thing
to know is that we will calculate 9 covariances between dimensions (2 by 2) and then
create a matrix (called the covariance matrix), that will be 3 3, in case of three dimen-
sions. In fact, the diagonal will be the result of the variance for each dimension and the
other terms will be the covariance between terms (for example, line 2 column 1 will be
the covariance between the y and the x dimensions). By the way, we can notice that the
covariance is commutative (we can easily replace each dimension per the other without
changing the results). Therefore the covariance matrix will be symmetrical.
Then, we can get lots of really important information with the covariance calculation.
In any case, it is important to notice that the value returned will not be as important as the
sign returned.
Indeed, if the result is positive, that will mean that the two dimensions increase together
(For our example on the ENSAM students -marks received and age- ) this will mean that
the mark increases when the age increases.
And if the result is negative, then it will mean that when one dimension is increasing, then
the other is decreasing.
Last case, the result returned is null. That will just mean that our 2 dimensions do not
have any kind of relations between them. They are independent.
Therefore, the covariance calculation can bring us really important indications on the
set of data we are studying. With it, we can then represent the covariance between 2 di-
mensions in a graph to get an idea of the relation that exists between them.
Of course, it will not be possible to represent the covariance when our data set will have
more than 3 dimensions.
Although the covariance can just be calculated between two dimensions and it is not
possible to represent the relationship between the data when we get more than 3 dimen-
sions, the covariance is often used for big data set with many dimensions. Indeed, we
can calculate the relationship between the dimensions and have some exploitable results.
Moreover, it will be pretty hard to visualize the relationship between dimensions when
we have a huge data set with many dimensions without the calculation of the covariance.
Therefore, the calculation of the covariance will bring us lots of help to see the relation-
ships between dimensions in a data set like we have in our project.
The covariance matrix Recall that covariance is always measured between 2 dimen-
sions. If we have a data set with more than 2 dimensions, there is more than one covari-
ance measurement that can be calculated. For example, from a 3 dimensional data set
(dimensions x, y ,z ) we could calculate cov(x, z), cov(y, z)...
In fact, for an n-dimensional data set, we can calculate
n!
2(n2)!
different covariance values.
The other will be the variance in the diagonal.
A useful way to get all the possible covariance values between all the different dimen-
sions is to calculate them all and put them in a matrix. Lets have a quick overview of the
denition for the covariance matrix for a set of data with n dimensions:
C
nxn
= (c
i,j
, c
i,j
= cov(Dim
i
, Dim
j
))
Where C
nxn
is a n by n matrix (n rows and n columns), and Dim
x
is the xth dimension.
We can here notice that the covariance matrix will be square in any case, and that each
part of the matrix is the result for a covariance calculation between two dimensions (ex-
cept for the diagonal as said before).
For example, we will build the covariance matrix for a 3 dimensional data set, using the
usual dimensions x , y and z . Then, as the matrix is square we will have the values below:
C =
_
_
_
cov(x, x) cov(x, y) cov(x, z)
cov(y, x) cov(y, y) cov(y, z)
cov(z, x) cov(z, y) cov(z, z)
_
_
_
As said earlier, the matrix will be symmetrical, and the diagonal will be the variance
calculation. Therefore, we can say that the matrix will have this form:
C =
_
_
_
var(x) cov(x, y) cov(x, z)
cov(x, y) var(y) cov(y, z)
cov(x, z) cov(y, z) var(z)
_
_
_
Therefore, we will have 6 terms to calculate for the 9 terms.
Matrix Algebra
This section is made to provide a background for the matrix algebra required in PCA. We
will especially take a closer look at the eigenvectors and eigenvalues of a given matrix.
Lets see an example of matrix:
_
2 3
2 1
_
_
3
2
_
= 4
_
3
2
_
For example, 4 is an eigenvalue of the matrix.
Eigenvectors First of all we will give the wikipedia denition of an eigenvector:
In linear algebra, the eigenvectors (from the German eigen meaning inherent, character-
istic) of a linear operator are non-zero vectors which, when operated on by the operator,
result in a scalar multiple of themselves. The scalar is then called the eigenvalue associ-
ated with the eigenvector.
As we can see in the example upper, the results of the multiplication between a vector
and a matrix returns exactly 4 times the beginning vector. We have here an example of
eigenvector. We will try to explain this example to better understand the eigenvectors.
The vector is a 2-dimensional one. The vector
_
3
2
_
represents an arrow going from the
origin (0, 0) to the point (3, 2). The matrix
_
2 3
2 1
_
can be imagined as a transformation
matrix. Therefore, if we multiply this matrix with a vector, the result returned will be
another transformed vector. If this transformed vector is just a multiplication of itself
by a scalar, then it is an eigenvector and the scalar will be the eigenvalue associated to the
eigenvector.
Now we will try to see the different properties of these eigenvectors:
First of all, we can just nd eigenvectors for square matrixes. We can also say that
not every square matrixes do have eigenvectors. In the case they have, then they can
not have more eigenvectors than their dimension (for a 3 3 matrix, the maximum
number of eigenvectors is 3).
You can multiply an eigenvectors by a scalar, it will still be an eigenvector (because
we just change the length and not its direction).
All the eigenvectors are orthogonal between them, no matter the number of dimen-
sions.
Most of the time, the returned eigenvectors are normalized (norm = 1). It will be
then easier to exploit.
We can nd further information on eigenvectors on the web site:
http://www.mathphysics.com/calc/eigen.html.
Eigenvalues Each eigenvector is associated to an eigenvalue. The eigenvalue could give
us some information about the importance of the eigenvector. The eigenvalue are really
important in the PCA method, because they will permit to realize some threshold to lter
the non-signicative eigenvectors, so that we can keep just the principal ones.
MATLAB will return the eigenvalues and the eigenvectors of the covariance matrix with-
out any problem.
2.4.3 Main steps of the method
Finally we arrived to Principal Components Analysis (PCA), the interesting part of our
project. We could rst answer a question: What is it exactly?. We can answer that it
is an algebraical way to compare images by compressing the set of data and highlighting
the principal components of the set.
The main advantage of PCA is that once we have found the principal components of
the set, which express pretty well the data, we can take back the beginning data (images
in our case) with a low loss, even if the compression is really high!
In this section, we will try to explain how we went through the problems to realize this
method to make gesture recognition. Therefore, we will describe the work made step by
step, to understand each part of the work.
We can then split up the method into its main parts:
First of all, we had to create the data set. Indeed, we had to take some pictures of
the hand that could do the database for the PCA recognition. The aim is to choose
a good number of pictures and a good resolution of these in order to have the best
recognition with the smallest database. Then, the aim is to make the database. To
create it, the theory is to transform all the pictures in a simple vector, which will
have a dimension of the number of pixels. Then, we create a matrix where each line
is an image-vector... The results for 12 pictures and a 640 480 denition will be a
12 307200 matrix.
Then, the next step is to subtract the mean from each of the data dimensions. The
mean subtracted is simply the average across each dimension.For example, for a
three dimensions x, y and z, we will have to subtract x from x, y from y and z from
z. The aim is to center our set in the space of all the dimensions (we will see later
further explanations of the different spaces used, but what is important to remember
here is that we have to subtract the mean to center our set of data).
The step three is to calculate the covariance matrix of the database. It will be quite
difcult in our case, because the data set is really huge! So we have found a method
to simplify this calculation. We will explain the method:
Indeed, we can not calculate the covariance matrix of the rst matrix , because it
will be too huge. So we had to nd a way to nd out the principal eigenvectors
without calculating the big covariance matrix.
I have found the solution in a paper written by M. Turk and A. Pentland. [23]
The method consists in choosing a new covariance matrix.
Indeed we will call our second matrix (all the images with the mean subtracted)
12 307200 :
. Our training set of image will be B

1
, B
2
, B
3
...B
12
with dimen-
sions l c. M is the average of the whole set of pictures. As seen earlier, we
transform each image in a vector of l c dimensions. So, we can say that our pic-
ture is a point in a l c dimensional space. Therefore, our 12 images represent 12
points in this space. But, as we centered the set (by subtracting the mean), each
picture is not so far from the other in this space (because they are quite similar at
the end). Therefore, it is possible to express our data set with less dimensions.
Our covariance matrix for A will be called C and C is dened by:
C = A A
T
Then, the eigenvectors and the eigenvalues of C will be the principal components
of our data set. But as explained before, we can not calculate C.
The idea is to say that when we have 12 points in a huge space, the meaningful
eigenvectors will be less than the dimension, and the number of the meaningful
ones will be the number of points minus 1. So in our case, we can say that we will
have 11 meaningful eigenvectors. The remaining eigenvectors will have an eigen-
value around zero.
Fortunately, it will be easier to calculate the eigenvectors of a 12 12 matrix than
for a 307200 307200 matrix!
We will name the eigenvectors of the covariance matrix A
T
A, v
i
and its eigenval-
ues k
i
.
We can then write:
A
T
A v
i
= k
i
v
i
Then we multiply both side per A:
A A
T
A v
i
= A k
i
v
i
= k
i
A v
i
We can see that A v
i
are the eigenvectors of C = A A
T
. Now , we will construct
a new matrix L = A
T
A, and we will nd the l eigenvectors v
l
of L.
These vectors determine linear combination of the 12 training set images to form
the eigenpictures of our set.
So, with this subtlety, we will have a small covariance matrix to calculate : 12 12
instead of 307200307200! The calculation will also be much faster and the eigen-
vectors returned are the principal one.
Then, we will calculate the eigenvectors and the eigenvalues of the covariance ma-
trix. This will give us the principal orientation of the data. MATLAB will do it
easily.
After that, we have to choose the good components and form the feature vector.
This is the principal step. We will have to choose the principal (most important)
eigenvectors with which we can express our data with the lowest information loss.
We also have to choose a precise number of eigenvectors to have the less calculation
time, but the best recognition. Here, the theory says that we will normally have 11
meaningful eigenvectors.
Last, the nal step is to make a new data set (that we will call eigenset). Then, it
will be possible to realize the last script which could compare the different pictures
and class them by resemblance order. To compare the different pictures, we will
have to express each image of the data set with these principal eigenvectors. The
last thing to do is to compare (by calculating the euclidian distance between the
coefcients that are before each eigenvector).
To conclude, we can say that we will need more mathematical backgrounds for this
method. Then, once the theory is well understood, we can implement this method on
MATLAB too. In the next chapter, we will take a closer look on the implementation of
the different methods.
Chapter 3
Implementation and explanation
3.1 simple subtraction method
3.1.1 Realization of the method
We will not take too many times to explain how we realized this comparison, because it
is really simple to create and the results are really bad. Therefore, what is important to
notice here is:
Before doing the subtraction, we applied some adjustments on the contrast rst, and
then we applied a blurring lter to erase the background imperfections.
We performed some tests on this method that we can see in the last chapter 4). This
gure shows the efciency of this method : 4.14.
This method conrms the idea that we should implement other methods, because
the results returned are really bad and we can say that it does not work properly.
3.1.2 Conclusion on the method
In any case, it is a good thing to know what are the results for this method. It was our
rst idea and it has conrmed us that we had to look further in the image analysis, by
implementing other methods.
3.2 Histograms of oriented gradients method
In this section, we will explain how we implemented this method, and the problems en-
countered. We will try to understand each part of the method and why it works or not
21
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 22
works in the different cases.
3.2.1 Step 1: Gradient magnitude calculation
After a rst approach of the MATLAB software, we realized the rst script which had to
calculate the magnitude of each gradient in the image.
Magnitude gradient denition:
If dx and dy are the outputs of the x and y derivative operators, then the gradient
magnitude is calculated by:
mg =
_
dx
2
+ dy
2
(3.1)
Therefore, in order to calculate the gradient magnitude, we had rst to calculate the
derivative dx and dy of the image.
The script of the derivative operator has been found on internet, but we can see what
it looks like to understand the following steps:
X-derivative operator script:
explanation of how to use the script:
function d = xDeriv(im, xRadius, yRadius, shape)
XDERIV Returns the X-derivative of image im.
D = XDERIV(IM, XRADIUS, YRADIUS, SHAPE)
IM - Input image.
XRADIUS - half the width of the vicinity in which the
derivative is calculated.
YRADIUS - half the height of the vicinity in which the
derivative is calculated (default: equal to XRADIUS).
SHAPE - Either of:
full - (default) returns the full 2-D convolution,
same - returns the central part of the convolution
that is the same size as A (the default).
valid - returns only those parts of the convolution
that are computed without the zero-padded
edges, size(C) = [ma-mb+1,na-nb+1] when
size(A) size(B).
Written by Ariel Tankus, 19.9.96.
Therefore, this script can calculate the derivative matrix by entering the image refer-
ence, the xradius, the yradius and the returned shape you want.
So, with the same script for the y derivative, we could have the gradient magnitude really
fast. We just had to know about the speed of the running process.
We also implemented the gradient magnitude script as described below:
Gradient magnitude operator script:
function(mag, dx, dy) = grad(im, xRadius, yRadius, shape)
GRAD Return the gradient magnitude of the given image.
(MAG, DX, DY) = GRAD(IM, XRADIUS, YRADIUS, SHAPE)
IM - image
XRADIUS - half width of derivation vicinity.
YRADIUS - half height of derivation vicinity.
SHAPE - either of same, valid, full. See xderiv.
MAG - Gradient magnitude.
DX - X-derivative (optional).
DY - Y-derivative (optional).
The outputs are for MAG:
MAG =
_
min(min(
_
dx
2
+ dy
2
)), max(max(
_
dx
2
+ dy
2
))
_
(3.2)
The MAG output returns a two-dimensional vector which has the minimal and the
maximal term of the gradient magnitude matrix. The aim is to relieve the calculation with
a smaller output.
The DX and DY outputs return two matrixes, which have both the size of the image ma-
trix.
What we just needed as the outputs is the minimum and the maximum of the gradient
magnitude in order to realize an efcient threshold (to cut the lowest gradient magni-
tudes). But we will come back on this part later (3.4).
Then, having the minimum and the maximum gradient magnitude of the picture, we
could go through the second part: The gradient orientation calculation.
3.2.2 Step 2: Gradient orientation calculation and magnitude thresh-
old
Once we have made the gradient magnitude calculation, we implemented the second script
which had to calculate the orientation of each gradient (which is enough important) in the
image.
Gradient orientation denition:
If dx and dy are the outputs of the x and y derivative operators, then the gradient
direction is calculated by:
dir = arctan (dy/dx) (3.3)
Therefore, in order to calculate the gradient direction, we had just to use the derivative dx
and dy of the image, that we already calculated for the magnitude.
Then, after having applied a threshold on the gradient magnitude, we had to sort out all the
measurement in a 36 dimensions vector. We made 36 bins (10 degrees each). And after
having the vector, we plot it in polar and cartesian coordinates, just to have an overview
of the orientation.
We will describe the script wrote to realize this implementation:
Gradient orientation operator script:
function(Z) = grador2(im, xRadius, yRadius, shape);
This function returns the 36 dimensional gradient orientation vector of the given im-
age.
(ORI, DX, DY) = GRADORIENTATION(IM, XRADIUS, YRADIUS, SHAPE)
IM - image
XRADIUS - half width of derivation vicinity.
YRADIUS - half height of derivation vicinity.
SHAPE - either of same, valid, full. See xderiv.
ORI - Gradient orientation matrix (contains all the gradient directions).
(gm,dx,dy)=grad(im, xRadius, yRadius, shape);
We call the gradient magnitude operator. It will return the X-derivative, the Y-derivative
and the 2 dimensional magnitude vector (to have the maximum and the minimum of all
the gradients).
gm = ((gm(1) + gm(2)) 0.1) + gm(1) (3.4)
This is the threshold number. It is relative to the image and xed at 10 percent of the
scale (between the minimum and the maximum gradient magnitude).
Then, we dened that when we have less inputs than 3, we will consider the shape as
same and the yradius equal to theyradius.
The next step was to create a 36 dimensional vector which was full of zeros. Then,
we just had to increment each bin when an orientation is found.We had to take care that
the arctan function is just available from 0 to Pi.
After that, we had to create the gradient direction matrix, with all the pixels gradient
directions. We had to take in consideration the threshold to have the main gradient mag-
nitude orientation.
We will see in the section 3.2.6 that we have some problems with the borders of the
pictures. Therefore, we will have to take in consideration this problem and we will cut
the new image border. It is relative to the input and will avoid the high level gradient
magnitude which are calculated on the different borders of the pictures.
So after having realized the gradient orientation vector of each image, it will be possi-
ble to realize comparison between the different images.
Then, to see the histograms, we will have to display it by showing the cartesian and
the polar representation of the orientation vector.It is good to have the two histograms to
see accurately where the orientation peeks are.
So, we have seen how to calculate the gradient orientation vector of a picture. With
the different gradient orientation vectors of the data set, it will be possible to compare the
images between them and so to sort them out.
We can take a look on the histograms form on the gure 3.1.
Figure 3.1: Representation of the orientation histograms for each new position On
this graph, each histogram is drawn under its picture.
3.2.3 Step 3: Gaussian lter operator
After having lots of problems in our orientation vectors (too much noise), described in the
paragraph 3.2.6, we decided to realize and apply a gaussian lter on the picture. We will
now detail the way we realized a gaussian lter on MATLAB, in order to blur the image
and to erase unwanted high level contrasts. We can notice that the gaussian lter function
is also in the Image processing toolbox. Therefore, it was useful at the beginning of the
project (we did not have this toolbox), but then we used the direct MATLAB function.
Gaussian lter operator script:
This script will just return the ltered image of the given one.
The aim of this script is to balance each pixel in function of the other. Therefore, we just
have to put a weight on each pixel around the one to be blurred. After that, a white pixel
on a black background will become dark gray. We decided to make a circle lter of three
pixels around. We choose to balance with this numbers below for the different weight, as
seen on he table 3.2.
Figure 3.2:
In order to have a good lter, we can change the value of the lters. But, with these
values, the picture will be well blurred and it will erase a large part of the background
noise without having a too strong lter.
3.2.4 Step 4: Euclidian distance comparison
In this section, we will see how we realized the comparison between images, in order to
have a realistic and efcient gesture recognition.
We have used the euclidian distance on the gradient orientation vector (36 dimensional
vectors) to calculate the difference between two pictures.
The euclidian distance comparison:
In order to obtain an efcient comparison between images, we decided to calculate
the euclidian norm between the orientation vectors of the picture that has to be recog-
nized and the data set pictures (We just calculated the euclidian distance of the gradient
orientation vector between the analyzed picture and each database picture. We then sorted
it out and selected the four smallest one). Therefore, we implemented a script that had to
do this calculation between all the gradient orientation vectors.
Here is the script we made:
The vector-vector comparison script:
function(disteuclid)=ni2(b, Im2, xRadius, yRadius, shape);
Returns the 4 nearest database pictures of the analyzed image.
First we created a 25 dimensional euclidian distance vector, where each dimension is
a result of the comparison with a database picture.
Then, we had to use the gradient orientation script to calculate the the orientation vector
of the picture to analyze.
After that, we returned the index of the 4 smallest vector terms in the 25 dimensional
euclidian distance vector. With these terms it will be easier to nd the images correspond-
ing to these indexes and then to take back the label of the image and to display it.
The script did work really well, but the executing time was really to high. It needed
180 seconds to give the 4 nearest pictures. So, the new goal was to considerably reduce
this time in order to have a quick answer to a done picture. We must not forget that the
nal aim is to have a real time application!
The idea was to realize a script that could create a matrix (MATLAB works faster with
matrix) of all the gradient orientation database vectors and to save it in a text le. Then,
it would be easier to compare the gradient orientation vector of the analyzed picture with
each line of the database matrix, which we will call MATDIST (see C). That would cut
all the calculation time for all the database pictures. Indeed, We did not need to calculate
each time all the database pictures.
We can now see how we made the code to create the MATDIST:
The database matrix operator:
function(matdist) = matdist(Im2, xRadius, yRadius, shape)
This function returns matrix of all the gradient orientation vectors of the pictures
stored in the database.
We decided to use the tic underlineMATLAB function to launch a chronometer. The
goal is to know the calculation time for the matrix creation (we took 26 images for the
data set). It is a good function to know the efciency of the algorithm we made. More-
over, we can then easily know the time we won with the different changes we made.
We just had to add a matrix creation in the loop (which will be 26 lines for the 26 images
and 36 columns for the 36 orientation bins). This new matrix will just be the combina-
tion of all the orientation vectors. Each line will be the gradient orientation vector of a
database picture.
Once the matrix is created, we just had to save it in a specied folder so that we can load
it whenever we want.
The time to execute this script to create the database matrix is around 70 seconds and
you just need to run it one time. With this script, we have won the time we wanted. Now,
to operate and recognize a picture, we will run another script that compares the analyzed
image with this matrix.
We can now explain and comment the new script encoded:
The vector-matrix comparison operator:
function(disteuclid) = fini3(b, xRadius, yRadius, shape)
This script returned a window displaying the 4 nearest images with the euclidian distance
associated between each image and the compared one.
First of all, we began by loading the MATDIST.
Then we just had to calculate the orientation vector of the image to analyze and to com-
pare it with each line of the database matrix.
After that, we had to sort the distance out and to take back the database pictures with their
label: the class is recognized.
With this script, the time needed to compare the images between them is about 3
seconds. We won around 177 seconds in the execution times (We needed 180 seconds
to make a comparison before)! To immediately test our results, we just had to take a
database picture and to compare it with the database. If everything works, then it should
return the same image in rst with an euclidian distance equal to zero. This is the result
we get in any case when we use a database picture. It veries that in a case of two identi-
cal pictures, the script returns a logical result.
We can see it on the picture 3.3.
On the picture 3.3, we can see that the image returned as the nearest is the image en-
tered in input. This is a way to check the algorithm. Here, the algorithm is well working.
The euclidian distance gives true results. Now, we have to check that the method is good
in recognizing the picture with similar picture in the database.
Now we can have a look on the 4 nearest pictures returned for the 1 position on the
gure 3.4.
On the gure 3.4, we can see that the two rst pictures returned are the same gesture.
But the third one is not the same gesture. Therefore, we had to try with others hand ges-
tures and see the results. We will further test the algorithm in the chapter 4.
3.2.5 Step 5: Establish a comparison matrix
In this part, we will see further than just realizing the script. We will try to know why
sometime it is not working as expected and what kind of solutions we could bring to have
better results.
As we have seen upper on the gure: 3.4, the results are not always as good as expected.
For example, if we ask the result for another hand gesture recognition that is more com-
plicated, we can see the returned answer on the gure 3.5.
We can see that the results expected are clearly not the results given. This problem
Figure 3.3: Results of the 1st answer of vector-matrix comparison script for a
database picture. We can check that the the rst image has a euclidian distance equal to
zero.
comes from the database quality and size, or from the different positions we took. We
have 26 pictures in the database and we took all of them different to see the problems that
we could have...
We also change the orientation of the hand and the ngers spacing during the hand shoot-
ing, to recognize more positions.
What we can notice is that we have much better results for the position 1. It is just
because the spacing is not inuencing the results and we just have the orientation of the
nger that is really acting upon the results. Thats why we have better results with the
Figure 3.4: Results of the vector-matrix comparison script for a database picture(1).
position 1 picture.
Therefore, a solution would be to widen the database. We can take lots of pictures for
each hand position and then apply the script again. The new problem expected is the run-
ning process time that will be too excessive.
The second solution is to change our gesture position and to choose new one, that are
really different. We will try this possibility after.
To dene the way to go to, we tried to identify the problem clearly. Therefore, we decided
to realize a matrix which could show in a gray scale if the different database pictures are
close or not (all the pictures were taken under the same lighting) in terms of euclidian
distance between their gradient orientation vector. Black will mean that the pictures are
really close and white that they are really different.
To realize this matrix, we just had to use the database matrix already done and to com-
pare each line with the others one. Then, MATLAB will display the new matrix in a gray
scale to show the results.
We can see the image of this matrix on the gure 3.6.
Figure 3.5: Results of the vector-matrix comparison script for a database pic-
ture(position:5).
We can see the database for this gray matrix on the picture 3.7.
The best is to have just white everywhere, but black in the diagonal. Indeed, the white
means that our gesture are really far between them and the black means it is the same
picture. Therefore, we will have black in the diagonal in any case, because it is the com-
parison between two identical pictures.
To verify that our gestures are good between class and in a class, we can plot this gray
matrix and when we have black in the same class and white between the class, that means
that our gestures are perfectly choose.
Here, we can see that our gestures are too close. There is for sure too many dark gray
in the matrix. This shows us that the real problem is our gesture positions. Indeed, the
positions are too close and then the recognition will be too hard to realize.
Therefore, we decided to reduce the number of positions and to take just three really
different positions: Rock, paper and Scissors. It will also allow us to realize an ap-
plication (the well-known little game) to obtain a concrete comparison application.
Figure 3.6: Returns of the graymat function for a 5 pictures database (1 image per
class).We can see that the diagonal is black: that shows that the matrix is well calcu-
lated.On the diagonal is the euclidian distance between the two same vectors...
We can see the gray matrix of our new gestures on the gure 3.8.
With the observation of this new gray matrix, we can conrm that the new gestures
choose are much better than the others. We have white between the different gestures (that
means that the positions are far between them) and black in the diagonal, as expected. We
can see in the results (Chapter 4) that the recognition will also work far better.
3.2.6 Problems encountered
During the realization of the different steps, we came across different problems that we
will explain in this part. We will not say all the problems we had, but the one were we
lost some time.
Figure 3.7:
First problem:
The rst problem we had, was about the different class in MATLAB. Indeed, MATLAB is
auto dening its term classes and all the function used depend of the class of each element
called in the process.
When we charged the image (MATLAB imread function), the class of the resultant ma-
trix was colorful uint8 (three matrixes) and when we had to calculate the gradient, we
needed a double class gray equivalent element in the matrix. Therefore, we had to write
a small script which could convert a uint8 MATLAB class in a double one (you can
nd a direct function if you have the image processing toolbox, but we did not have it at
the beginning of the period as said before, thats why we implemented this small script).
Here is the script we wrote to make this transformation:
Then we had to transform it in gray scales. We used these coefcients to have a good
gray scaled image, where A1 is the colored image:
A1=((A(:,:,1))*0.3+(A(:,:,2))*0.59+(A(:,:,3))*0.11);
So we changed the three RGB matrixes in one gray equivalent matrix. The coefcients
are chosen to respect the different contrast.
Figure 3.8: Returns of the graymat function for our new gesture database. We can
see that the diagonal is black and that the other colors are much whiter (1 image per
class).
After that, we could easily calculate the gradient of the image, but with some loss (we
transformed a colorful picture in a gray equivalent picture -three matrixes to one-).
Then, we bought the image processing toolbox, so we could just use the new MATLAB
function.
Second problem:
The second problem was the image border. When we calculated the gradient magni-
tude of the picture, the boarder were included in the calculation with a very high level,
due to the consideration of the XRadius and the Y Radius. We can see it clearly on the
picture 3.9, where we calculated the gradient magnitude on a simple form (white triangle
on black background):
On this gure (3.9) , we can really see the noise of the borders in the picture. The
gradient magnitude operation calculate the borders as a part of the image. That will bring
Figure 3.9: Gradient magnitude of a triangle with noise on borders
lots of problems later in the second step: Gradient orientation calculation 3.2.2.
Therefore we decided to cut the borders in the gradient magnitude calculation, otherwise
all of our histograms would be similar.
Third problem:
When the background is not completely black and dark, we will have problems with
the reects and the contrasts for the gradient magnitude calculation. Indeed, we will have
lots of noise that will be part of the gradient orientation vector. Therefore, in order to
avoid this noise, we can apply a gaussian lter on the image. That will blur and soften
the contrast. Then, we will have less high gradient magnitude noise. A complementary
way to avoid this kind of noise is to take the picture on a really black background. We can
see on the different pictures below 3.10 and 3.11 the differences between the gradients
magnitude of each image and the the different histograms returned.
We can also look the gradients magnitude images of these two pictures on the gure
3.11.
(a) (b)
Figure 3.10: 1 nger picture with black and gray backgrounds We can see the noise
of the gray background (a). That will bring lots of problems on the gradient calculation.
With a good black background (b), we really simplify the problem.
(a) (b)
Figure 3.11: Gradients of 1 nger picture with black and gray backgrounds We can
clearly see the white reects of the gray background after the zoom (a), compared to the
black background (b), even if we applied an important gaussian lter. The histogram will
also be deteriorated.
We can notice that with a good black background, we have no troubles. We cut all the
background noise. We will see later how we can do to resolve this recognition problem.
Now we will compare the two histograms on the gure 3.12.
To conclude with these few pictures, we can say that having an homogeneous black
(a) (b)
Figure 3.12: Histograms of oriented gradients of 1 nger picture with black and gray
backgrounds On these two graphs, we can notice that for the black background (b), the
histogram is much more accurate than for the gray one (a). Therefore, it will be easier to
treat. We need to have precise histograms to have a good gesture recognition
background will make the work easier.We will have much more precise histograms, and
the recognition result will be far better.
3.2.7 Conclusion on the method
After having implemented this method, we understand much more about images and what
is scientically behind an image. We can see that the results obtained are good but we
could have think that they will be better. By implementing the next method, we will surely
have new ideas to make this method more efcient. We will then test the method in the
chapter 4.
3.3 PCA or Eigenfaces method
We will now explain each part and detail the way we made the different algorithms.
3.3.1 Step 1: Realize the database
First of all, we had to choose how to make our database and what kind of database would
be the best for the recognition. We choose to take the minimum of pictures to have the
best recognition.
It is important to notice that for the Eigenface method, we work with the entire pictures at
beginning. Then we reduce the datas (our aim is to express the data set with less factors).
Therefore, we must take care of the data set to decrease the calculation time. There are
two parameters to include to realize the database:
The number of pictures.
The size of each picture, which will be part of the size of the rst matrix to reduce.
Both are really important. Indeed, when we will create the rst matrix to reduce, its
size will be the number of pixels by the number of pictures.
Therefore, if there is too many pictures or too many pixels in each picture, the calculation
time will grow fast!
For example: 10 images with a denition of 640 480 will give a matrix, which size is
10 307200. and 10 images with a denition of 1280 960 will return a matrix size of
10 1228800.
So, as you can see here, it can easy and fast become a really huge matrix. The calculation
time will then hardly depend of that. Moreover, we must not forget that MATLAB can
not manage such big matrixes too.
Therefore, the question of the database is a really important question, because it will
then determine the efciency of the method (and its calculation time too).
At the beginning, we could not know how many pictures we had to take and what size we
had to choose. So we decided to make a database of 12 pictures (4 of each position) with
a denition of 640 480
To choose what kind of database would give the best recognition, we realized some tests
(after having implemented the method) of efciency with different numbers and size of
pictures in entrance.
We will describe these different tests in the chapter 4 : Tests, results and analyze .
The implementation of the rst matrix of the data set is really easy. We have to read
each image and to transform the 640 480 picture in a 1 307200 vector. Then each line
will be a kind of deployed image. So, the resultant matrix which expresses the data set is
a 12 307200 matrix in our case.
Now we can explain how we implemented this code:
First matrix script:
First of all, we had to read all the data set pictures, to treat the contrast (to normalize
the pictures), to apply a gaussian lter and to resize them in a vector.
We can see the data set we took for the PCA recognition on the picture 3.13.
Figure 3.13: Example of a data set used for the PCA recognition method
3.3.2 Step 2: Subtract the mean
The next step is to calculate the mean of each direction. It is a fast step. We just had
to take the rst matrix of all the images, and then to ask to MATLAB to calculate the
mean of the matrix. Then, we subtracted it to the rst matrix. We do not have so many
things to say about this step as it is really trivial. We must nor forget that this part is really
important to center the data set pictures in the space.
3.3.3 Step 3: Calculate the covariance matrix
This step was a bit more difcult than the two rst one (we had to well understand the
theory to realize the calculation precisely).
But once we understood the subtlety described in the second chapter 2, the calculation
becomes fast and easy to implement. We can see on the picture 3.14 the different eigen-
pictures returned by this covariance matrix.
Figure 3.14: Example of the eigenpictures of the data set used for the PCA recogni-
tion method
3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose
the good eigenvectors
In this step, we will take a closer look on the calculation of the eigenvectors and eigenval-
ues of the covariance matrix, and how to choose the good one.
Indeed, it is really important to choose the good eigenvectors to express the data set with
the best base. The number of eigenvectors choose will be in direct relations with the re-
sults that we get.
The value of the eigenvalues (between 0 and ) will determine if the eigenvector is
important or not in the expression of the data set in the new space.
Therefore, we thought we would have to realize a threshold on the eigenvalues to keep
the most important eigenvectors. It is sure that it is really important to realize an efcient
threshold to have the best results. At the beginning, we decided here to keep the 11th
rst eigenvectors (as described in the theory: it is 11 for 12 database pictures). But then,
we planned to make some tests to know which distribution is the best (we had to nd the
value of threshold that give good results and that decrease the calculation time efciently).
But at the end, we decided to keep just three images in the data set (after having performed
other tests in the chapter 4). Therefore, we decided to keep all the eigenvectors, because
three eigenvectors is in any case really small.
3.3.5 Step 5: Realize the new Data set and compare
In this step, we will have to realize the new data set, by saving the new matrix of the
eigenpictures and expressing each image of the database with the principal eigenvectors
(we just have to realize a scalar product between the eigenvectors kept and the image). We
will then save the coefcients that will be in front of each eigenvectors for each database
image. Therefore, we will have as coefcients as eigenvectors.
At the end, it is a way to express each image with the eigenvectors calculated. We will
then express the image to analyze with these eigenvectors too. With these coefcients,
we will be able to compare the images between them, by comparing the coefcients (we
make the euclidian distance between each image coefcient). The results returned are
quite good, as we can see in the chapter 4.
3.3.6 Conclusion on this method
After having implemented this method, we understand how we can hardly compress with
low loss a huge set of images. Moreover, we have seen an other view of image analysis
and it is really interesting to compare the two kind of methods. It is what we will do in
the next chapter (4): Tests, results and analyze. We can see that the results returned
are quite good, but we can easily imagine that this method will be better with centered
image, because the position in the picture of the gesture will be really important. We can
understand that the second name for this method Eigenface is not an hazardous name. It
is just that it should better work with faces than hands, because it will be easier to center
a face in the picture (by centering the mouth and the eyes).
Chapter 4
Tests, results and analyze
In this chapter, we will explain the different tests made and the results returned. It will
give a kind of tutorial of each method and then help people to choose one or the other
method in function of the application they want to create. We will also explain the draw-
backs of each method and the technical reasons of these drawbacks.
So in a rst part, we will see the application realized and we will give the complete script
of the application. We will explain how to use the application too. After that, we will see
the different choices made for each method and explain why we made these choices by
performing tests. Then, we will make a simple comparison of the different methods and
draw graphics of the results of each method in different conditions.
4.1 The application: Rock Paper Scissors Game!
After having implemented the different methods to realize the gesture recognition, we
decided to implement a small application which would use the different methods.
The rst idea that comes to our mind was to realize a simple game that everybody knows:
The Rock-Paper-Scissors Game!
Indeed, it was the best way to test our gesture recognition script with fun.
Moreover, as everybody knows the game, it is really easy and comfortable for other peo-
ple to test the scripts and the recognition level of each method. The application has a GUI
form for a better interface with the player. We tried to make an easy-to-use application,
with very few things to do to realize the database or to play the recognition game.
We can take a look on the Gui form shown on the gure 4.1.
45
CHAPTER 4. TESTS, RESULTS AND ANALYZE 46
Figure 4.1: Photo of the application realized
We can see on this screen shot the GUI form of the application and the overview of the
different options proposed by the game.
Then, we will see a photo of the environment constructed (PC and web camera environ-
ment) to take good pictures for a better analysis. Moreover, it is important to know which
environment we used to realize these pictures, because it is in direct relations with the
results returned.
The black background for example is really important. We can see that the environment
looks quite basic. Indeed, it will be pretty easy for someone to make its own working
space and use this script.
Then we will explain each button, and what the application can do.
But rst, we can watch the working space on the gure 4.2.:
Figure 4.2: Photo of the working space realized for the gesture recognition applica-
tion
We can see on this photo the working space realized for the gesture recognition. We
used a Philips camera, with a tripod. We cut a wood board and painted it in black to
have a better background. We can just say that paint is surely not the best way to have
an uniform background, but that was what we had. Indeed, even we took a matt paint,
the different lighting set are directly reecting on the paint and therefore inuencing the
results. Thats why we make some different work on the pictures before analysis. The
best background would be a textile, because it is much more matt.
So far, everything is easy to realize. What is a bit more hazardous is to have a camera
recognized by MATLAB. We were lucky and we had one.
Now we can see the application in details on the gure 4.3. We are going to explain
each function and why we realized it like that.
Figure 4.3: Explanation of the application realized
So, we will explain more precisely the different buttons and their utility:
The Start/Stop button: As indicated, it is made to start or stop the cam. The aim is
to have more memory space by stopping the cam when the application is not used.
It will also give the preview of your gestures. The Start button must have been
pressed before starting any comparison.
The preview window: This windowwill just be used for the previewof your gesture.
You just have to click on the start button to have the preview.
The nearest database picture window: This small window is just made to indicate
which image is recognized in the data set. It is really useful when you have several
pictures of each gesture. Indeed, you will know which picture is recognized and it
will help you to understand how the recognition works.
The Text eld: It is made to say the different indications to the user. So, it will be
really useful for the data set creation. It directly says which gesture you have to do,
how long you have to wait or even if you won or loosed.
The conrmation for recognition buttons: These buttons are especially made for test-
ing. After having launched a method and seen the results, you can click on yes or
no to say if your gesture is recognized or not. The aim is that the application will
automatically count how many gestures were found or not. It is really appreciable
for long test series.
The Players and computers position windows: These two windows are made to
see the picture analyzed for recognition. It shows the picture of the gesture you
just made and the random computers gesture. It is to have a quick overview of the
results and to make the game more attractive.
The Text eld for the score: Here, you can see the score between the user and the
computer, and the user will also read the gesture recognized. When Scissors
against Rock is written, that means that the application recognized a scissors posi-
tion for the user, and that the computer gesture (random) is a Rock. In fact, the rst
gesture written will be the position of the nearest data set picture recognized.
The Reset Score button: It is just made to reset the score in a simple way.
The Load Eigenface Matrix Button: When you launch the application, to avoid you
to wait to the loading of these huge matrixes in case you just want to use the gradient
histogram method, you can load these matrixes whenever you want. You must just
know that the Eigenface method will not work before you loaded these matrixes. It
is made to access to the GUI quicker.
The Game Database Creation button: It is made to create a new database. It will
take pictures of the user and calculate all the matrixes automatically. The aim of
this button is to realize a new set of pictures (database) for each user. The results
will be better when the user makes its own set.
The Compare with Eigenface button: It will simply launch the script that analyzes
the new picture with the Eigenface method. You can not use this before you loaded
the Eigenface matrixes. Once the matrixes are loaded, the method goes really fast.
The Compare with Gradients button: It will just launch the gradient histogrammethod.
This method is a bit longer as the Eigenface one, but you do not have to wait for the
loading of matrix. You can directly use this method once the GUI is opened.
The Compare with simple sub button: This button will launch the rst method im-
plemented: the simple subtraction one. This method is in the game just to show that
it is not good recognizing. You can just test it to have an overview of the results.
So, we have explained the different buttons of the application and what there are do-
ing. We have also seen the working space realized for this project. Now, it is important to
see which results we obtained for each method.
If you are interested by the script itself, then take a look on the Appendix C.
4.2 Test and choices of the parameters
In this section, we will take a closer look on the different tests made to explain our choices
in the different methods scripts.
During this project, we had to do multiple choices that have inuenced the results. We
made tests to approve these choices, so that we are sure that the different way that we took
just go in favor of better results.
For both methods, we realized some diagrams to explain the results.
4.2.1 Choice of the size of the derivative lter and the number of box
for the gradients method
It is important to notice that all the pictures had the same size before doing any com-
parison. We just had pictures that made 640 480. We xed the size of the images
because it will inuence on the results of the tests -the size of the derivative lter is
in pixels, so having a circle of 3 pixels on an image that is 50 30 will not have the
same effect as having the same circle on a 640 480 picture-. Thats why we xed the
image size.
In this part, we will see how we choose the size of the derivative lter for the gradient
method. As well, we will also see how we choose the number of bins (or boxes) to count
the different orientation for the histogram. We can notice, that in all of our case, we
choose a circle derivative lter.
Before seeing the different graphics, we have to say that we realized these tests with
different positions. Indeed, we made these tests at the beginning of the project, and at
this time, the positions to recognize were the 5 positions of the hand between the 1 and
the 5. It is a good thing that the tests are made on these position, because in this case it
will give more information (the position are more precise and more difcult to recognize,
so the inuence of the number of box or of the derivative lter size will also have more
impact).
We can rst look to the graphics of the euclidian distance between gestures in the same
class (1 to 5) on the gure 4.4. It is in function of number of bins and derivative lter size.
Figure 4.4: Graphic of the euclidian distance between the 1 themselves This graphic
represent the draw of euclidian distance in y-axis and the number of box with the size of
the derivative lter in the x-axis -it goes from 18box lter 3, then 18box lter 6, 18 box
lter 12, 36box lter 3 ... to 72box lter 12-.
So, as we can see on this picture, the lowest euclidian distance is for 36 bins. Then if
we make a mean of the different distance intra-class, we can see that the best choice is to
choose a derivative lter of 6, to obtain a minimum euclidian distance.
But, to conrm that, we have to take a look on the gures 4.5 and 4.6, that shows the other
position distances between themselves.
(a) (b)
Figure 4.5: Graphics of the euclidian distance between the 2 and the 3 themselves
These two graphics represent the euclidian distance in y-axis and the number of box with
the size of the derivative lter in the x-axis as for the rst graphic.
(a) (b)
Figure 4.6: Graphics of the euclidian distance between the 4 and the 5 themselves
Same graphics than the rst one.
So with these graphics, we can denitely say that to have the lowest euclidian distance
in a class between position of the same class, we have to choose 36 bins. And that is what
we will choose for the application. Now, we will observe the graphics of the euclidian
distance between a class and another. We can watch the 1 against the other classes on
the gure 4.7.
Here, what is important to notice is not the highest euclidian distance, but the high
Figure 4.7: Graphic of the euclidian distance between the 1 class and the other
classes This graphic represent the draw of euclidian distance in y-axis and the other
classes in x-axis. Each curve is for a number of box with a size of the derivative lter.
of the euclidian distance in a class (intra-class) in comparison with the euclidian distance
between the classes.
We will just watch 1 graphic, because all the graphics look the same. We decide to choose
the rst one (class 1 against the other).
We can see that between the 1 position and the 2 position, we have a mean in euclidian
distance around 0.4 (All the distance are normalized, so the maximum distance that we
can have between two classes is 2 -when the two normalized vectors are opposite-).
We can see that between the 1 position (in intra-class), the euclidian distance is also
around 0.4.
Moreover, we can see that for the others positions, the distance between the different
classes (inter-class) and the intra-class distance are similar.
What does it mean?
It means that our set of pictures is too close. Our different images are for sure too close,
because the distance between the classes are the same as our distance in the classes.
Therefore, we changed the data set and we took other pictures that are more far between
them when we change the class. We founded that the positions Rock, Paper and Scissors
were corresponding to what we wanted, as well as it could do a good application.
In any way, we can still say that the best is 36 bins, because it allows in any case to
have better results. We can see that with 18 bins, the different histograms are too close,
because too many orientations are in the same bin, and that for 72 bins, the different ori-
entations are too spread. So these orientation histograms are not good. In one case, we
will have big peeks and in the other case, we will have a too regular histogram.
Now, to conrm that a circle derivative lter of 6 will be the best, we made other tests
with the new data set of the three positions: Rock, Paper and Scissors. We can see them
on the gure 4.8.
Figure 4.8: Graphics of the percentage of recognition with different derivative lters
-all circle- This graphic represent an histogram of percentage of recognition with different
derivative lters.See also A.1, A.2 and A.3 in Appendix A.
On this graph, we can see that the circle derivative lter 6 gives the best recognition.
So, we are sure that we have to choose 36 bins and a derivative lter of 6 to have the best
results.
4.2.2 Choice of the number of pictures and the size of images for the
data set for both methods
In this part, we will see how we choose the number and the denition of pictures for the
data set. We made different tests to explain our choices. We will study the different dia-
grams of results.
First we can take a look on the results of the calculation time for the Eigenface method
and for the Gradients method on the gure 4.9.
Figure 4.9: Graphics of the creation and loading time in function of the data set size
and the resolution of the images This graphic represent an histogram of the time needed
to create the database and to load the matrix in the application. See also B.1, B.2,B.3 and
B.4 in Appendix B.
On this graphic, we can clearly see that the calculation time depends of the resolution
of the images and of the number of pictures in the data set too. But, what is really inter-
esting is to see the inuence of increasing the resolution or the number of images in the
data set.
We can notice that when we double the resolution, the calculation time needed increases
much more as the double. Moreover, when we double the number of pictures, we can see
that the evolution is linear. So, from a low denition 320 240 with three pictures to a
better denition 640 480 with twelve pictures, the total calculation time needed is about
13,5 time more. Therefore, we have to nd the best solution to avoid lost of time.
To make our choice we will have to see the recognition results for each of these case.
We can watch the results on the recognition tests graph, gure 4.10.
Figure 4.10: Graphics of the recognition of each position in function of the data
set size and the resolution of the images This graphic represent an histogram of the
recognition of each position. We can see that it depends of the image resolution and of
the number of them in the data set. See also B.1, B.2,B.3 and B.4 in Appendix B.
What catching our eyes when we look this graph is the low difference of recognition
between huge data set pictures (in terms of numbers or of resolution) and small ones.
First of all, we can see that in any case, the Eigenface method works better for the lower
resolution. We can easily explain that.
Indeed, The Eigenface method is based on the pixels position. As we have seen be-
fore, we just make a compression of the dimensions of the original space (which has a
dimension equal to the number of pixels in the images). Therefore, the position of the
hand in the picture is really important. And, with a lower denition, we will have a lower
inuence of the lighting set or of the position of the hand, because the denition has less
precision. Therefore, we can easily imagine that the Eigenface method has much better
results with a lower denition.
Therefore, we can conclude that the best for the Eigenface is to choose the lower deni-
tion.
Then, we can see that the recognition is quite the same whatever the number of pictures we
take. Therefore, we decided to choose the lower calculation time and that is for 3 pictures.
Now, we have to study the same inuence on the Gradients method.
In fact, the application will use the same data set for the two methods (We will not make
two different database creation, for a question of calculation time, but also for a handy
question for the user). We can take a look on the gure 4.11 the differences of time needed
to realize the comparison for the gradients method.
On this picture, we can immediately see that the comparison time is almost always the
same, whatever the number of pictures or the denition. It increases just of 1 second from
3 pictures with a resolution of 320 240 to 12 pictures with a resolution of 640 480.
Indeed, the real increase of the time calculation is made during the database creation. It
goes from around 1,5 second to 15 seconds, so about 10 times more. Therefore, it is dur-
ing the database creation that there are some real increase of time.
Now, we have to check the recognition results to decide which number of pictures and
which resolution is the best. We can see these results on the gure 4.12.
At the rst view, we can nd this graph really surprising. All the pictures have been
recognized in any case. What does it mean? It seems that the size of the pictures or that
the number of pictures in the set does not inuence the results.
In fact, it does. But there are two reasons that explain why this is really well working,
whatever the number or the resolution of pictures you take.
The rst one is that our different gestures are really different now. Moreover, we just
Figure 4.11: Graphics of the creation and comparison time in function of the data set
size and the resolution of the images for the gradients method This graphic represent
an histogram of the time needed to create the database and to compare the users gesture.
See also B.3 and B.4 in Appendix B.
have 3 different gestures, that is not so many. As our pictures are really different,
the orientation histograms are also really far between them. Therefore, when you
take a new photo of gesture, the application will recognize easily which move you
made. We can just have a look on each gradient orientation histogram for each
position on the gure 4.13 to better understand.
On this gure, we can clearly see the differences and it is pretty easy to read the
different histograms. We can see that for the Rock position, there is gradients
in each direction with a peek in the horizontal position ( due to the arm). In the
Scissors position, we can clearly see the orientations of the ngers (peek in the
direction of the nger -210 degrees-), plus the peek in the horizontal position due to
the other nger. And for the Paper position, It is clear that the principal direction
is horizontal...
Indeed, as the gradients are representing indirectly the contrast in the pictures, we
can really easily understand these histograms.
Figure 4.12: Graphics of the recognition of each position in function of the data set
size and the resolution of the images for the gradients method This graphic represent
an histogram of the recognition of each position. We can see that the image resolution
and the number of them in the data set does not inuence the results here. See also B.3
and B.4 in Appendix B.
The second reason is that to realize the tests, we took picture that were close from
the data set pictures (Our aim was to test the inuence of the number or size of
pictures and not of other points). Therefore, the histograms must be really close
too. But we will see later that when we change the orientation of the hand, then the
method will not work so well (see the next section for further tests on each method
4.3).
To conclude, we can say that when we make a good position of the hand (well cen-
tered), then the recognition works pretty good for both methods, whatever the number of
pictures you put in the data set.
For the resolution of the pictures, we can notice that the results are better when we take
the lowest one (for the Eigenface method).
Figure 4.13: Representation of the orientation histograms for each new position On
this graph, each histogram is drawn under its picture. We can notice the huge differences
between them. Each histogram is made for a picture resolution of 320*240.
Therefore, we will not make the application slowdown (due to calculation time) for noth-
ing. We will take a data set of three pictures with the lower denition. At the end, we
obtain an efciency quite similar, and the application goes much faster.
4.3 Last tests to explain the efciency of each method
In this section, we will try to determinate the best conditions of recognition for each
method and to know the limits of it. Therefore, we will rst realize and explain general
basic tests on each method to say which one is the best in our case. Then we will perform
exhaustive tests to try to nd the limits of each method. To conclude, we will make a
recapitulating table to say which method to use in which conditions.
4.3.1 First tests: Recognition percentage of each method in general
In this part, we will perform further tests on the efciency of each method in our case:
recognition between three hand positions (Rock, Scissors, Paper).
First of all, we will take a look on a recapitulating graph on the gure 4.14, that displays
the results of each method implemented, in the case of good use (gestures centered but
not perfect...).
On this graph, we can easily compare the different methods and see which one has
the best results. The rst recognition means that the gesture has been directly recognized.
The second recognition means that the 2nd picture returned by the application is the good
position (but the rst position returned is false). Therefore, we classify the methods in
general efciency:
The rst one is the gradients method with a derivative lter of 6. It is clearly the
result expected. This method has a percentage of success of 88 percents in general
use. It is pretty good.
The second one is the Eigenface method. It is what was expected too. We can
say here that the Eigenface method is more efcient that a bad calibrated gradients
method. It is really interesting to know that, because it means that to be really ef-
cient, you have to choose a good derivative lter for the gradients method. The
Eigenface method is reaching 80 percents of good recognitions in general use.
Then, the two bad calibrated gradients method are having the same results (76 per-
cents of good recognitions for both). Therefore, even with a bad calibration of the
gradients method, the results are not so bad.
The last one is the simple subtraction. It is quite normal because a little translation
will have big inuences on the results. It reaches 44 percents, what is not so bad.
We could think that it would not recognize so many pictures. But we will explain
later more precisely these results.
Now that we have seen the results of each method in general use, it is good to know
which gestures are pretty difcult to recognize for which method. We also have noted the
percentages of recognition for each gesture. This will give us further explanations.
Figure 4.14: Graphic of the results for each method (out of 35 pictures)] The gradient
6 method has the best efciency. See also A.1, A.2 and A.3, A.4, A.5 in Appendix A.
Indeed, we can see on the second graph of the gure 4.14 the percentages of recogni-
tion for each position made.
At the rst look, we can see that the simple subtraction has a percentage of recognition
for the Rock position of zero, but a percentage for the Paper position of hundred.
This means that quite all the pictures are recognized as paper (we can have a look in the
appendix A in the table A.5), because it is the position that have all the pixels for all po-
sitions. We can now much clearly understand why the rst recognition percentage of this
method reaches 44 percents.
In this particular case, it is really interesting to have this graph.
Moreover, we can see that in the main case, the most difcult position to recognize is
the scissors one. It is quite comprehensible because it is the middle of the two other ges-
tures.
So, when the user does not make a good scissors position (with the nger above 210 de-
grees) , it will surely become very close to the paper position, and will also recognize
Paper instead of Scissors (for all the methods). Thats why the recognition works
good, when the user realizes good position.
4.3.2 Second tests: Recognition percentage of each method in differ-
ent conditions
In this section, we will make different tests in extreme conditions to see if the recognition
scripts are strong or not. Therefore, we will make four different tests to compare the ro-
bustness of the scripts:
Test with a different light set.
Test with rotation of the hand position.
Test with some translation of the gesture in the picture.
Test with different hands: to see if with another user, which has not made the data
set, the results are still satisfying.
Different light set:
We have performed some tests on the 2 principal scripts (Eigenface and Gradients) in
different light set to try to dene if the scripts are robust or not and if they depend of the
external lighting.
In fact, we can suppose that the way we implemented the gradients method (with an in-
crementally add of 1 in the direction bin when the gradient magnitude is upper than the
threshold -the threshold depends of the maximum and the minimum magnitude in the
image-), the lighting set will not inuence the results. But, we can also imagine that for
the Eigenface method, which depends of the pixels, a different lighting set would be crit-
ical.
We can watch the two different lighting set for the test realization on the gure 4.15.
On this picture, we can see that the lighting set choose was really dark. The aim was
to test the algorithms in extreme conditions to know if it is really robust or not.
But, it is important to say that we can still see the position of the user, and that there is
still a contrast between the hand and the background, even if it is quite reduced.
Then, we will observe the results for the two main methods implemented on the gure
4.16.
We can notice that the gradient method still have really good results with an extreme
lighting set. Indeed, it reaches 93 percents of good recognition. It is better as what we
expected. In fact, as the gradients method is made to not take in consideration the external
light changing, we should just still have some contrasts to obtain good results. We also
have to make clear positions to help it to recognize. It is sure that when the position is
not clearly dened by the user (by making dened gestures), then the recognition method
does not work so well.
But what we had also notice is that the Eigenface method does not work anymore. An
extreme lighting set changing is too hard for the method. We have noticed it does not
recognize any Rock position, and that it reaches 50 percents of good recognition, just in
the case of Scissors or Paper position.
It is not surprising that the Paper position has the best results, because it is the easier
position to recognize (the hand takes a good part of the picture). But we can wonder why
the rock position is not recognize at all. An explanation is that the pixels lightened in this
lighting are the one who are closer to the Scissors position. Indeed, as we take the prin-
cipal components in the Eigenface method, the principal vectors enlightened are really
similar with the principal one of the Scissors database picture.
We realized a normalization of the data set before doing any comparison (we used the
contrast adjustment of MATLAB. But the results are still bad.
Figure 4.15: Conditions of the different lighting set test realization We can clearly see
on this screen shot that the lighting set of the data base is really different as the lighting
set to make the tests.
To conclude on these tests, we can say that the Gradients method is really robust to differ-
ent lighting set, and that it is the method to use in these kind of conditions. The Eigenface
method will be a method to exclude.
Different rotations of the gesture:
Figure 4.16: Results of the different lighting set test This graph shows us that the
Gradients method works still good in extreme conditions of lighting. See also B.5 in
Appendix B.
In this section of tests, we tried to dene the strength of the scripts, when you rotate
the hand. We made two kind of rotation: the rst one is to rotate the gesture of 45 degrees
on the left and the other is 45 degrees on the right.
We performed 10 tests of each position.
We can take a look on how we rotate the hand on the gure 4.17.
We can see on these pictures the two orientations made for the tests. We can immedi-
ately see that the orientation can be a real problem, especially for the gradients method.
Indeed, the gradients method works on the gradients orientation. Therefore, when we ro-
tate the hand the histogram will be completely different and the comparison will become
pretty hard. For the Eigenface method, we can also imagine that it will be really hard,
because the pixels encountered will not be the same between the data set and the users
position.
(a) (b)
Figure 4.17: Screen shots of the rotation tests performing Here we can see two exam-
ples of how we rotate the gesture to perform the test.
We can see the results on the graphs of the gure 4.18.
On this graph, we can see that both methods reach 50 percents of good recognition,
but not in the same positions and not for the same reasons.
We will try to explain the results for both methods:
Eigenface method: Here, the Rock position is always recognized. It is not so sur-
prising because it is the position that always encounter the same pixel. Indeed,
rotating the Rock position does not change the position in the picture. It is always
centered and most of the pixels are the same as the normal position. Thats why this
position is always recognized.
Then, we can see that the Scissors position is never recognized (it always recog-
nizes the Rock position). Indeed, the most pixels in common are the middle of the
hand (that is always centered), but the two ngers are somewhere else. Therefore it
recognizes the position that is the closer: the Rock one.
Last, the Paper position is quite good recognized. In fact, during the test execu-
tion, i just noticed that when the rotation is not so strong, then there is enough pixels
in common. But, when the rotation is too strong, then the pixels of the ngers are
not in the same place as the data set. Therefore it will not recognize the position
anymore.
Figure 4.18: Results of the different rotation test This graph shows us that both meth-
ods have big difculties with rotation changing. See also B.5 in Appendix B.
To conclude, we can say that the Eigenface method works good when you keep
most of pixels in common. If not, then it does not work. Therefore, whatever the
position (rotated or not) we make, if there are enough pixels in common, the recog-
nition script will work.
Gradients method: With the gradients method, it is quite different. The aim of the
method is to take in consideration the orientation of the gradients, what is similar of
the contrast in the picture. Therefore, when we change the rotation, the histogram
will be completely different.
But, we can notice that in the case of the Rock position, the method works good.
It is just because the Rock position is the lonely position that has an histogram
with no big peeks. This position is almost a circle, so the orientations are good
distributed, even if we rotate the gesture. Thats why the Rock position is always
recognized.
But, for the other gestures, it is really different. We can notice that the paper posi-
tion is not recognized at all. Why that? It is just because when we rotate the hand
of 45 degrees to the left, then we will get a peek of orientation in this position. Or,
this is exactly what makes the Scissors position being different. Therefore, the
recognized gesture will be the Scissors one. Then, when we rotate 45 degrees
on the right, it will not know which position it is because any of our position does
have an orientation peek in this direction. Therefore, it will take the closest one,
which will be the one that do not have any big peeks: the rock one. We can now
understand why the Paper position is not recognized at all.
Yet can we understand why we get 5 good recognitions in the Scissors case. In
fact, when we will rotate this position 45 degrees to the right, it will recognize the
Scissors position, so the good one. But when it will be on the other direction, it
will recognize the Rock position, as the Paper one.
Therefore, the conclusion is that both of the methods does not work with rotation
changing. It is a weak point for both of them.
But, you may ask two questions :
Why dont we take 4 pictures of each position in the different orientation, so that
the recognition will go through?
The answer is that if we take more pictures and rotate the position, then the his-
tograms will become really different in the same class and instead having 3 class of
positions well dened, we will just get the equivalent of 12 classes much closer be-
tween them. Therefore, the results could then be affected, because it will be much
more difcult to realize the differences between the Scissors position and the Pa-
per rotated 45 degrees to the right than before. Therefore, the method will work
worst in case of normal use. Thats why we do not make this. We can say that we
privilege good results in normal use that approximative good results in any use.
Moreover, if we begin doing things like that, then we will take more pictures (be-
cause it will not work so well) and so on. At the end, we will just have a huge
database which will be really harassing to make for the user, and much more longer
to compare.
Why dont we take the principal axis of orientation and compute the orientation
relative to this axis?
We would answer that this would be the next step of the work (it is something that
we would like to do, but we did not have enough time). The main problem will be
that there are not always one principal axis (it can be two or three which are really
close). Therefore, it will be hard to nd the exact orientation of the gesture. But we
are sure that there are lots of things to do in this way.
Translations of the gesture in the picture:
In this part we performed some tests about translation. Will the results be affected in
the case we translate the hand in the picture or not? What is the strongest method to trans-
lation? These are the questions we want to answer in this paragraph.
To perform this test, we decided to take 5 positions per picture, like shown on the g-
ure 4.19.
Figure 4.19: Position of the hand in the picture to perform the tests
As we can see on the picture, we just translated the position from each part of the
image. Then we performed the tests and see the results. We can take a look on the results
on the gure 4.20.
On these results, we can see that the Gradients method is really robust relative to trans-
lation. It is not so surprising, because as we have seen upper, this method is based on the
contrast and the gradients orientation. Therefore the position of the gesture in the picture
is not so important (the gradient orientation will be the same).
We can see that we have more than 90 percents of good recognition in the translation
cases for the Gradients method against less than 50 percents for the Eigenface one.
Indeed, the Eigenface method, based on the position in the image can not recognize po-
sition that does not have any pixels in common, even if it is the same position. We also
do not have so many surprises here, but it was important to perform these tests to conrm
our hypotheses.
To conclude on these tests, we can say that we have to choose the Gradients method
Figure 4.20: Results of the different translation tests This graph shows us that the
Gradients methods works much better with translation changing. See also B.1, B.2,B.3
and B.4 in Appendix B.
in the case of not xed position to recognize, where the user would translate its position.
Tests with different hands:
In this part, the question to answer was : which of the both methods is stronger to a
change of user? Is the user really important for the recognition or not? Could we avoid
the user to realize a new database adapted to himself?
We will try here to implement tests that could allow us to answer these questions.
First of all, we will see a picture of the two different hands that have been compared
on the gure 4.21, and then, we will take a closer look on the results.
(a) (b)
Figure 4.21: Screen shots of the other hands to perform the tests Here we can see the
two hands used to perform the test. One is much bigger with a different form (especially
in Scissors position and the other is much smaller.
To perform the tests, we used two kind of hands: one much bigger and one much
smaller.
Now, we can have a look on the results on the gure 4.22.
On this Graph, what catches our eyes is the really good results of the Gradients method
and the lack of good results for the Scissors position for the Eigenface method.
We can explain these results in three points:
First of all, the Gradients method makes a normalization of the histogram. There-
fore, the size of the hand in the picture will not inuence the results, if the proportion
of the hand are kept. And in most of the case, the users will not have ngers twice
longer as normal one. That explains the pretty good results of the gradients method.
Moreover, as the translation is not a problem for this method, the returned position
is almost always the good one (we have more than 93 percents of good results).
Then, we have good results for the Eigenface method for the two easiest positions:
Paper and Rock. It is also quite easy to understand, because these positions do
Figure 4.22: Results of the different hands tests This graph shows us the results re-
turned by the application when we change the user hand without changing the data set.
See also B.5 in Appendix B.
not really take care of the form of the hand. It is the same overall position for all
the hands. Moreover, the size of the hand will not be so important, because we will
have enough pixels in common to nd the position. Thats why this method will
recognize these positions quite good.
But, we can notice that we have big problems with the Scissors position for the
Eigenface method. Indeed, for this gesture, the hand will become really important.
The form and the size of it will clearly inuence the results. The pixels in common
will not be the same when the hands are different. Therefore the recognition will
not work anymore (if the ngers are making another position or if the ngers are
just a bit translated then the rock position is closer) . In any case, it will also easier
recognize the Rock position that will be closer when the hand is too different.
To conclude on these tests, we can say that the Eigenface method is not adapted to
hands changing. On the contrary, the Gradients method returned pretty good results.
Chapter 5
Conclusion: advantages and drawbacks
In this report we have studied the theory and the implementation of the two main methods
studied during my nal project.
We have seen how we could make the implementation better and which problems we en-
countered.
In this conclusion, we will try to sum up the different advantages of each method and
to give the results of all the tests in a table. The aim is to have a quick overview of what
each method can do or can not do. It could then help people who want to do some recog-
nition to choose one or the other method in function of the application they want to build.
We will draw this table by saying that:
- - - : means really bad results
- - : means bad results
- : means not so bad but not good
+ : means not so good but OK
+ + : means good results
+ + + : means excellent results
The parameters that we will put are:
Robustness to lighting set.
Robustness to rotation.
Robustness to translation.
74
CHAPTER 5. CONCLUSION: ADVANTAGES AND DRAWBACKS 75
The calculation time.
The general recognition.
The different users robustness.
Other applications (not studied here), but centered and without good contrasts.
Let draw the sum up table:
Criterion Gradients
method
Eigenface
method
Lighting set + + +
Rotation strength
Translation strength + + +
Calculation time (creation/loading) + +
Calculation time (compare) + + + +
General recognition + + + + +
Different users + + + +
Centered applications without good con-
trasts
+ + +
To conclude on the tests, we can say that the Eigenface method was not adapted to
the application we would like to realize. Indeed, the Eigenface method needs centered
pictures with more information on it. For example, it will much better work on faces, as it
is said in the name, because before making any comparison, we just normalize and center
the picture in order to compare places in the image that are comparable. For a face it will
be pretty easy to re-center the face with the eyes and the mouth. Then, it will feasible
to compare the pictures and to have good results, because all the faces are in the same
position in the picture.
The Gradients method will be more efcient for other kind of applications. It needs
any centering, because the place of what we want to analyze is not important. However,
it will be really hard to use this method with noise in the picture or without good contrast.
This method is based on the contrasts in the picture and will then be really efcient to
recognize objects or gestures in an image. Thats why this method will work much better
for our case.
If we had had more time, we would have developed these points:
The robustness to rotation.
The detection in any case of backgrounds (this is much more difcult).
I hope that this project will be continued and that these points will be realized on the
rst script I created during these four months.
To conclude, the work done here brought me a real knowledge in the image area. I
implemented two different methods that allowed me to understand better the World of
Image analysis. I had the chance to work on two different approaches of the image. The
rst one was about Gradients calculation and contrast analysis in an image, and the sec-
ond one was more on the algebraical solutions to compress the data and then to compare
images.
This work brought me a real overview of how an engineer should go through a prob-
lem, by researching the solution and learning in an area that I did not know, and then by
realizing a working application that shows concretely the work done.
Appendix A
Tables of general tests
In these tables we can see the general recognition tests made on all the methods. We can
also read the distance to next class recognized.
78
APPENDIX A. TABLES OF GENERAL TESTS 79
Figure A.1: Results of the general test recognition for the gradients method with
a derivative lter of 3 We can read in this table the different gestures recognize and if
not, which position is recognized. We can also read the euclidian distance between the
returned pictures and the analyze one.
Figure A.2: Results of the general test recognition for the gradients method with a
derivative lter of 6 It is the same table as upper.
Figure A.3: Results of the general test recognition for the gradients method with a
derivative lter of 12 It is the same table as upper.
Figure A.4: Results of the general test recognition for the Eigenface method It is the
same table as upper for the Eigenface method.
Figure A.5: Results of the general test recognition for the simple subtraction method
It is the same table as upper for the simple subtraction method.
Figure A.6: Sum up of the results of the general test recognition in percentages It is
a way to have a good overview of the different results.
Appendix B
Tables of specic tests
In this section, we will see the tables of the results for the specic tests on lighting set,
rotation changing, translation changing, calculation time, or the different users hands...
Figure B.1: Table of the results for the time tests and for the position tests in function
of the number of pictures and of the quality of the pictures It is a way to have a good
overview of the different results on the specic tests made on the two main methods: here
is the Eigenface one.
85
APPENDIX B. TABLES OF SPECIFIC TESTS 86
is the Eigenface one.
is the Gradients one.
is the Gradients one.
Figure B.5: Table of the results for the lighting set, rotation and users hand tests for
both methods Here we can see the results for the specic tests: lighting set, rotation or
different user hand.
Appendix C
Script of the game
The GAME script:
function varargout = Camprev2(varargin)
CAMPREV2 M-le for Camprev2.g
CAMPREV2, by itself, creates a new CAMPREV2 or raises the existing
singleton*.
H = CAMPREV2 returns the handle to a new CAMPREV2 or the handle to the exist-
ing singleton*.
CAMPREV2(CALLBACK,hObject,eventData,handles,...) calls the local function named
CALLBACK in CAMPREV2.M with the given input arguments.
CAMPREV2(Property,Value,...) creates a new CAMPREV2 or raises the exist-
ing singleton*. Starting from the left, property value pairs are applied to the GUI be-
fore Camprev2-OpeningFunction gets called. An unrecognized property name or invalid
value makes property application stop. All inputs are passed to Camprev2-OpeningFcn
via varargin.
*See GUI Options on GUIDEs Tools menu. Choose GUI allows only one instance
to run (singleton).
90
APPENDIX C. SCRIPT OF THE GAME 91
See also: GUIDE, GUIDATA, GUIHANDLES
Edit the above text to modify the response to help Camprev2
Last Modied by GUIDE v2.5 29-May-2006 18:16:25
Begin initialization code - DO NOT EDIT
gui-Singleton = 1; gui-State = struct(gui-Name, mlename, ...
gui-Singleton, gui-Singleton, ...
gui-OpeningFcn, @Camprev2-OpeningFcn, ...
gui-OutputFcn, @Camprev2-OutputFcn, ...
gui-LayoutFcn, [] , ...
gui-Callback, []);
if nargin and and ischar(varargin1)
gui-State.gui-Callback = str2func(varargin1);
end
if nargout
(varargout1:nargout) = gui-mainfcn(gui-State, varargin:);
else
gui-mainfcn(gui-State, varargin:);
end
End initialization code - DO NOT EDIT
Executes just before Camprev2 is made visible.
function Camprev2-OpeningFcn(hObject, eventdata, handles, varargin)
This function has no output args, see OutputFcn.
hObject handle to gure
eventdata reserved - to be dened in a future version of MATLAB
handles structure with handles and user data (see GUIDATA)
varargin command line arguments to Camprev2 (see VARARGIN)
Choose default command line output for Camprev2
Create video object
Putting the object into manual trigger mode and then
starting the object will make GETSNAPSHOT return faster
since the connection to the camera will already have
been established.
handles.video = videoinput(winvideo, 1,I420-640x480);
set(handles.video,ReturnedColorSpace,rgb);
set(handles.video,TimerPeriod, 0.05, ...
TimerFcn,[if( isempty(gco)),... handles=guidata(gcf);...
Update handles image(getsnapshot(handles.video));... Get picture
using GETSNAPSHOT and put it into axes using IMAGE
set(handles.CamAxes,ytick,[],xtick,[]),... Remove
tickmarks and labels that are inserted when using IMAGE else ...
delete(imaqnd);... Clean up - delete any image acquisition
objects end]); triggercong(handles.video,manual);
handles.Coefmat=0; handles.Eig-FacT=0;
Update handles structure
guidata(hObject, handles);
UIWAIT makes Camprev2 wait for user response (see UIRESUME)
uiwait(handles.Camprev2);
Outputs from this function are returned to the command line.
function varargout = Camprev2-OutputFcn(hObject, eventdata, handles)
varargout cell array for returning output args (see VARARGOUT);
hObject handle to gure
Get default command line output from handles structure
handles.output=hObject; varargout1 = handles.output;
Executes on button press in Eigbutton.
function Eigbutton-Callback(hObject, eventdata, handles)
hObject handle to Eigbutton (see GCBO)
set(handles.Textedit2,String,);
if strcmp(get(handles.Eigbutton,String),Compare with Eigenfaces)
Compare is off. Change button string and start to compare.
set(handles.Eigbutton,String,Lets go!)
axes(handles.PlayerPos)
im1=getsnapshot(handles.video);
stop(handles.video)
Coefmat = handles.Coefmat;
Eig-FacT = handles.Eig-FacT;
load score-comp.txt -ascii;
load score-player.txt -ascii;
comp-pos=oor(11.9*rand)+1;
comp-pos2 = imread(sprintf(D://moi//La-Totale//Images//jeu//base//d.jpg,comp-pos));
image-index = input(Enter the number of the image to identify?);
Load the Hand gestures image set
ima1=im1;
image-data=im1;
image-data=rgb2gray(image-data);
image-data=imadjust(image-data);
h=ones(5,5)/25;
image-data=imlter(image-data,h);
xim=image-data(:);
total number of images and number of pixels in image
(nImages,nPixels) = size (Coefmat);
size of image (they all should have the same size)
imsize = size(image-data);
convert to double and normalize
xim = double(xim)/255;
Find similar hand gestures, variable image-index denes hand gesture used in com-
parison
ximt=xim;
New-Coef=ximt*Eig-FacT;
for i=1:nImages;
Y=New-Coef;
Z=Coefmat(i,:);
Result-comp(i)=disteuc3(Y,Z);
end
(euc-dis,sorted-index)=sort(Result-comp);
imshow(ima1);
axis image;
axis off;
axes(handles.CompPos)
imshow(comp-pos2);
axis image;
axis off;
if sorted-index(1)=1 and and sorted-index(1)=4 and and comp-pos =1 and and comp-
pos=4
text(-450,550,)
text(-450,590,Rock against Rock Ex-aequo)
text(-450,630,)
text(-450,650,)
score-comp=score-comp+0;text(-450,690,strcat(The computer score is : , num2str(score-
comp)))
score-player=score-player+0;text(-450,760,strcat(The Player score is : , num2str(score-
player)))
text(-450,800,)
set(handles.Textedit2,String,EX-AEQUO);
elseif sorted-index(1)=1 and and sorted-index(1)=4 and and comp-pos =5 and and
comp-pos=8
text(-450,550,)
text(-450,590,Rock against Scisors Player wins)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
set(handles.Textedit2,String,YOU WIN);
comp-pos=12
text(-450,550,)
text(-450,590,Rock against Paper Computer wins)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
set(handles.Textedit2,String,YOU LOOSE);
comp-pos=4
text(-450,550,)
text(-450,590,Scisors against Rock Computer wins)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
comp-pos=8
text(-450,550,)
text(-450,590,Scisors against Scisors Ex-aequo)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
comp-pos=12
text(-450,550,)
text(-450,590,Scisors against Paper Player wins)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
comp-pos=4
text(-450,550,)
text(-450,590,Paper against Rock Player wins)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
comp-pos=8
text(-450,550,)
text(-450,590,Paper against Scisors Computer wins)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
comp-pos=12
text(-450,550,)
text(-450,590,Paper against Paper Ex-aequo)
text(-450,630,)
text(-450,650,)
comp)))
player)))
text(-450,800,)
else
end
save (D:/moi/La-Totale/jeu/Gradients/36box/score-comp.txt,score-comp,-ASCII);
save (D:/moi/La-Totale/jeu/Gradients/36box/score-player.txt,score-player,-ASCII);
set(handles.No,String,No)
set(handles.Yes,String,Yes)
axes(handles.Picrec)
imshow(imread(strcat(D://moi//La-Totale//Images//jeu//base//,num2str(sorted-index(1)),.jpg)))
set(handles.Eigbutton,String,Compare with Eigenfaces)
axes(handles.CamAxes)
start(handles.video)
return
else
Compare is on. Wait.
end
Executes on button press in Gradbutton.
function Gradbutton-Callback(hObject, eventdata, handles)
hObject handle to Gradbutton (see GCBO)
Compare getsnapshot if
strcmp(get(handles.Gradbutton,String),Compare with Gradients)
set(handles.Gradbutton,String,Lets go!)
tic
Im1=getsnapshot(handles.video);
ima1=Im1;
stop(handles.video)
load xRadius.txt -ascii;
MATDIS=load ( strcat(D:/moi/La-Totale/Data/,num2str(xRadius),-,num2str(xRadius),/MATDIS,num2str(xRadius),.txt),-
ASCII);
b= input(Enter the number of the image to identify?);
b1=1;
DIS=zeros(1,12);
Im1=imread(Im1);
ima1=Im1;
Im1=im2double(Im1);
Im1=rgb2gray(Im1);
Im1=imadjust(Im1);
Im1=log(Im1);
h=ones(5,5)/25;
Im1=imlter(Im1,h);
X=grador2(Im1, xRadius, xRadius, same);
for a=1:12;
Y=MATDIS(a,:);
D=disteuc3(X,Y);
Cr ee le vecteur des distances euclidiennes
DIS(b1)=D;
b1=b1+1;
end
DIS;
(DAS,sorted-index)=sort(DIS);
imshow(ima1);
axis image;
axis off;
imshow(comp-pos2);
axis image;
axis off;
Then it is the same script of the Eigenface method for the explanation win or loose..
set(handles.Gradbutton,String,Compare with Gradients)
t4=toc
set(handles.Textedit3,String,num2str(t4));
return
else
Compare is on. Wait.
end
Executes on button press in StartStopCamera.
function StartStopCamera-Callback(hObject, eventdata, handles)
hObject handle to StartStopCamera (see GCBO)
Start/Stop Camera
if strcmp(get(handles.StartStopCamera,String),Start Camera)
Camera is off. Change button string and start camera.
set(handles.StartStopCamera,String,Stop Camera)
else
Camera is on. Stop camera and change button string.
set(handles.StartStopCamera,String,Start Camera)
stop(handles.video)
end
Executes on button press in Datbutton.
function Datbutton-Callback(hObject, eventdata, handles)
hObject handle to Datbutton (see GCBO)
if strcmp(get(handles.StartStopCamera,String),Stop Camera)
stop (handles.video)
else end
set(handles.Datbutton,String,In process...)
axes(handles.CamAxes) start(handles.video)
set(handles.StartStopCamera,String,Stop Camera)
set(handles.Textedit1,String,We are going to realize the database
(4 pics of each position).); set(handles.Textedit2,String,Lets
take the Rock position pics.); pause(5);
for i=1:4;
while 1
Database of Rock
set(handles.Textedit3,String,strcat(Rock:Pic ,num2str(i),.Center your hand on the
blackboard (ROCK position) and wait 3s.));
pause(1);
im1=getsnapshot (handles.video);
if i==1
axes(handles.Pic1);
elseif i==2
axes(handles.Pic2);
elseif i==3
axes(handles.Pic3);
else
axes(handles.Pic4);
end
imshow(im1);
stop(handles.video)
choix=menu(Is the pic well centered?,Yes,No);
if choix==1;
imwrite (im1,strcat(D://moi//La-Totale//Images//jeu//base//,num2str(i),.jpg));
break
else
end
end
end
set(handles.Textedit2,String,Lets take the Scisors position
pics.); set(handles.Textedit3,String,); pause(1);
for k=5:8;
while 1
Database of Scisors
set(handles.Textedit3,String,strcat(Scisors:Pic ,num2str(k),.Center your hand on the
blackboard (SCISORS position) and wait 3s.));
pause(0.5);
if k==5
axes(handles.Pic5);
elseif k==6
axes(handles.Pic6);
elseif k==7
axes(handles.Pic7);
else
axes(handles.Pic8);
end
imshow(im1);
stop(handles.video)
choix2=menu(Is the pic well centered?,Yes,No);
if choix2==1;
imwrite (im1, strcat(D://moi//La-Totale//Images//jeu//base//,num2str(k),.jpg));
break
else
end
end
end
set(handles.Textedit2,String,Lets take the Paper position
pics.); set(handles.Textedit3,String,);
pause(1);
for l=9:12;
while 1
Database of Paper
set(handles.Textedit3,String,strcat(Paper:Pic ,num2str(l),.Center your hand on the
blackboard (PAPER position) and wait 3s.));
pause(0.5);
if l==9
axes(handles.Pic9);
elseif l==10
axes(handles.Pic10);
elseif l==11
else
end
imshow(im1);
stop(handles.video)
choix3=menu(Is the pic well centered?,Yes,No);
if choix3==1;
imwrite (im1, strcat(D://moi//La-Totale//Images//jeu//base//,num2str(l),.jpg));
break
else
end
end
end
set(handles.Loadbutton,String,Load Eigenfaces Matrix)
stop(handles.video); set(handles.StartStopCamera,String,Start
Camera) set(handles.Datbutton,String,Game database creation)
msgbox(Next we will create the gradient database.,Database
creation,help);
xRadius=6;
save(D:/moi/La-Totale/recognition/Gradients/36box/xRadius.txt,xRadius,-ASCII);
run(D://moi//La-Totale//recognition//Gradients//36box//matdist.m);
msgbox(Last but not least, we create the eigenface database.,Database creation,help);
run(D://moi//La-Totale//recognition//PCA//Eigenface//eigenbase.m);
gure; for i=1:12
subplot(3,4,i)
lename = sprintf(D://moi//La-Totale//Images//jeu//base//d.jpg,i);
imshow(imread(lename))
end
msgbox(Thanks for creating the database. See you soon in the game.
Goodbye!,Database creation,help);
Executes on button press in Subbutton.
function Subbutton-Callback(hObject, eventdata, handles)
hObject handle to Subbutton (see GCBO)
Compare getsnapshot if
strcmp(get(handles.Subbutton,String),Compare with simple sub)
set(handles.Subbutton,String,Lets go!)
Im1=getsnapshot(handles.video);
ima1=Im1;
stop(handles.video)
WA=waitbar(0,Running...);
k = 0;
for i=1:12
lename = sprintf(D://moi//La-Totale//Images//jeu//base//d.jpg,i);
image-data = imread(lename);
k = k + 1;
waitbar((i-1)/24);
x(:,k) = image-data(:);
end;
image-data=ima1;
xim=image-data(:);
Find similar hand gestures, variable image-index denes hand gesture used in com-
parison
for l=1:k
Diff=xim - x(:,l);
Diff2=sum(Diff);
Diff2=Diff2/(length(xim));
mat-sum(l)=Diff2;
waitbar((i+l-1)/24);
end
(Class-diff,sorted-index)=sort(mat-sum);
waitbar(1);
close(WA);
imshow(ima1);
axis image;
axis off;
imshow(comp-pos2);
axis image;
axis off;
Then it is the same script of the Eigenface method for the explanation win or loose..
set(handles.Subbutton,String,Compare with simple sub)
return
else
comparison is on
end
Executes on button press in Razbutton.
function Razbutton-Callback(hObject, eventdata, handles)
hObject handle to Razbutton (see GCBO)
if strcmp(get(handles.Razbutton,String),Reset Score)
Change button string and reset score.
set(handles.Razbutton,String,In progress...)
score-comp=0;
score-player=0;
set(handles.Razbutton,String,Reset Score)
set(handles.Textedit2,String,SCORE RESET);
else
Score is reseting.
end
Executes on button press in Loadbutton.
function Loadbutton-Callback(hObject, eventdata, handles)
hObject handle to Loadbutton (see GCBO)
if strcmp(get(handles.Loadbutton,String),Load Eigenfaces Matrix)
Change button string and reset score.
tic
if strcmp(get(handles.StartStopCamera,String),Stop Camera)
stop (handles.video)
set(handles.StartStopCamera,String,Start Camera)
else
end
msgbox(The game is loading the Eigenfaces matrix for the Eigenface recognition method.
Please wait...,Matrix loading,help);
handles.Coefmat =load ( strcat(D:/moi/La-Totale/Data/Coefmat/Coefmat.txt),-ASCII);
handles.Eig-FacT =load ( strcat(D:/moi/La-Totale/Data/Eigfact/Eig-FacT.txt),-ASCII);
guidata(hObject,handles);
set(handles.Loadbutton,String,Eigenfaces matrix loaded!)
set(handles.Textedit2,String,ENJOY THE GAME !);
else
Loading matrixes
end
t3=toc
function Textedit1-Callback(hObject, eventdata, handles)
hObject handle to Textedit1 (see GCBO)
Hints: get(hObject,String) returns contents of Textedit1 as text
str2double(get(hObject,String)) returns contents of Textedit1 as a double
Executes during object creation, after setting all properties.
function Textedit1-CreateFcn(hObject, eventdata, handles)
handles empty - handles not created until after all CreateFcns called
Hint: edit controls usually have a white background on Windows.
See ISPC and COMPUTER.
if ispc and and isequal(get(hObject,BackgroundColor),
get(0,defaultUicontrolBackgroundColor))
set(hObject,BackgroundColor,white);
end
end
end
Executes on button press in Yes.
function Yes-Callback(hObject, eventdata, handles)
hObject handle to Yes (see GCBO)
if strcmp(get(handles.Yes,String),Yes)
yes-count=load ( D:/moi/La-Totale/Stat/yes-count.txt,-ASCII);
set(handles.Yes,String,Made)
set(handles.No,String,Made)
yes-count=yes-count+1;
save (D:/moi/La-Totale/Stat/yes-count.txt,yes-count,-ASCII);
else end
Executes on button press in No.
function No-Callback(hObject, eventdata, handles)
hObject handle to No (see GCBO)
if strcmp(get(handles.No,String),No)
no-count=load ( D:/moi/La-Totale/Stat/no-count.txt,-ASCII);
set(handles.No,String,Made)
set(handles.Yes,String,Made)
no-count=no-count+1;
save (D:/moi/La-Totale/Stat/no-count.txt,no-count,-ASCII);
else end
END OF THE SCRIPT
List of Figures
3.1 Representation of the orientation histograms for each new position . . . . 26
3.2 Form of the Gaussian lter . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Results of the 1st answer of vector-matrix comparison script for a database
picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Results of the vector-matrix comparison script for a database picture(1) . 32
3.5 Results of the vector-matrix comparison script for a database picture(position:5) 33
3.6 Returns of the graymat function for a 5 pictures database (1, 2, 3, 4 and 5
ngers) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Database of the 5 principle classes . . . . . . . . . . . . . . . . . . . . . 35
3.8 Returns of the graymat function for our new gesture database . . . . . . . 36
3.9 Gradient magnitude of a triangle with noise on borders . . . . . . . . . . 37
3.10 1 nger picture with black and gray backgrounds . . . . . . . . . . . . . 38
3.11 Gradients of 1 nger picture with black and gray backgrounds . . . . . . 38
3.12 Histograms of oriented gradients of 1 nger picture with black and gray
backgrounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.13 Example of a data set used for the PCA recognition method . . . . . . . . 41
3.14 Example of the eigenpictures of the data set used for the PCA recognition
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 Photo of the application realized . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Photo of the working space realized for the gesture recognition application 47
4.3 Explanation of the application realized . . . . . . . . . . . . . . . . . . . 48
4.4 Graphics of the euclidian distance between the 1 themselves . . . . . . 51
4.5 Graphics of the euclidian distance between the 2 and the 3 themselves 52
4.6 Graphics of the euclidian distance between the 4 and the 5 themselves 52
4.7 Graphics of the euclidian distance between the 1 class and the other classes 53
4.8 Graphics of the percentage of recognition with different derivative lters
-all circle- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.9 Graphics of the creation and loading time in function of the data set size
and the resolution of the images . . . . . . . . . . . . . . . . . . . . . . 55
116
LIST OF FIGURES 117
4.10 Graphics of the recognition of each position in function of the data set
size and the resolution of the images . . . . . . . . . . . . . . . . . . . . 56
4.11 Graphics of the creation and comparison time in function of the data set
size and the resolution of the images for the gradients method . . . . . . . 58
4.12 Graphics of the recognition of each position in function of the data set
size and the resolution of the images for the gradients method . . . . . . . 59
4.13 Representation of the orientation histograms for each new position . . . . 60
4.14 Graphic of the results for each method in percentage (test made on 35
pictures) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.15 Conditions of the different lighting set test realization . . . . . . . . . . . 65
4.16 Results of the different lighting set test . . . . . . . . . . . . . . . . . . . 66
4.17 Screen shots of the rotation tests performing . . . . . . . . . . . . . . . . 67
4.18 Results of the different rotation test . . . . . . . . . . . . . . . . . . . . . 68
4.19 Position of the hand in the picture to perform the tests . . . . . . . . . . . 70
4.20 Results of the different translation tests . . . . . . . . . . . . . . . . . . . 71
4.21 Screen shots of the other hands to perform the tests . . . . . . . . . . . . 72
4.22 Results of the different hands tests . . . . . . . . . . . . . . . . . . . . . 73
A.1 Results of the general test recognition for the gradients method with a
derivative lter of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
derivative lter of 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
derivative lter of 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.4 Results of the general test recognition for the Eigenface method . . . . . 82
A.5 Results of the general test recognition for the simple subtraction method . 83
A.6 Sum up of the results of the general test recognition in percentages . . . . 84
B.1 Table of the results for the time tests and for the position tests in func-
tion of the number of pictures and of the quality of the pictures: here is
320*240 for the Eigenfaces method . . . . . . . . . . . . . . . . . . . . 85
640*480 for the Eigenfaces method . . . . . . . . . . . . . . . . . . . . 86
320*240 for the Gradients method . . . . . . . . . . . . . . . . . . . . . 87
640*240 for the Gradients method . . . . . . . . . . . . . . . . . . . . . 88
LIST OF FIGURES 118
B.5 Table of the results for the lighting set, rotation and users hand tests for
both methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Bibliography
[1] http://ai.ucsd.edu/tutorial/matlab.html. Matts Matlab Tutorial Source Code Page.
[2] http://amath.colorado.edu/courses/4720/2000spr/labs/worksheets/matlab-
tutorial/matlabimpr.html. Introduction to image processing in MATLAB.
[3] http://archive-edutice.ccsd.cnrs.fr/docs/00/02/75/36/pdf/nkambou-heritier.pdf. Re-
connaissance emotionnelle par lanalyse des expressions faciales dans un tuteur in-
telligent affectif.
[4] http://csnet.otago.ac.nz/cosc453/student-tutorials/principal-components.pdf. A tu-
torial on Principal Components Analysis by Lindsay I Smith.
[5] http://diuf.unifr.ch/courses/05-06/improc/series/serie08/serie08.pdf. Traitement
dimages et compression dimages.
[6] http://dmawww.ep.ch/rappaz.mosaic/support/support/node43.html. Cours de
pr econditionnement.
[7] http://dmawww.ep.ch/rappaz.mosaic/support/support/support.html. Cours
danalyse num erique pour ing enieurs.
[8] http://iacl.ece.jhu.edu/projects/gvf/. Active Contours, Deformable Models, and Gra-
dient Vector Flow.
[9] http://lcavwww.ep.ch/ zpecenov/dspmp/eigenfaces/html/. Signal Processing Mini-
project : Face recognition using eigenimages.
[10] http://math.cmaisonneuve.qc.ca/plantagne/maple/9.3gradient-
extremum(2sur3)/gradient-extremum(2sur3)4.html. Gradients et extremums
relatifs.
[11] http://minet.mi.ec-lyon.fr/ alex/images-l.ch/polycopier-be-analyse-images.pdf.
Travaux pratiques danalyse dimage.
[12] http://numod.ins.uni-bonn.de/research/. On line publications on Image Analysis.
119
BIBLIOGRAPHY 120
[13] http://www-ccrma.stanford.edu/ jos/pubs.html. JOS on line publications on model-
ing and image processing.
[14] http://www-timc.imag.fr/thomas.guyet/enseignements/ti/ti-05-06/. Cours dimage
num erique de lIUT S er ecom.
[15] http://www.cmap.polytechnique.fr/ peyre/adtf/illustrations/. Lissage sous MAT-
LAB.
[16] http://www.cmap.polytechnique.fr/ peyre/cours/x2004signal/. Traitement Du Signal
: Projets EA 2004.
[17] http://www.cs.princeton.edu/ Face recognition using Eigenfaces by Christopher
Decoro.
[18] http://www.csse.uwa.edu.au/ pk/research/matlabfns/. Tutorial of MATLAB func-
tions for computer vision and image processing.
[19] http://www.ecs.soton.ac.uk/ aiw99r/eigenfaces/. Example of MATLAB code for
Eigenfaces.
[20] http://www.loria.fr/ vthomas/chiers/pdf/jnnr.pdf. Utilisation dun module de vision
stochastique pour localiser un robot mobile.
[21] http://www.pages.drexel.edu/ sis26/eigenface Eigenface tutorial of the DREXEL
University.
[22] http://www.viearticielle.com/article/index.php?id=188. Contourage et analyse
dimage.
[23] A. Pentland M. Turk. Eigenfaces for recognition. Massachusets Institute of Tech-
nology, June 1991.
[24] B. Triggs N. Dalal. Histograms of oriented gradients for human detection. INRIA
Rh ones-Alps, June 2005.
[25] M. Roth T. Freeman. Orientation histograms for hand gesture recognition. In TR-
94-03A. Mitsubishi electric laboratories Cambridge research center, June 1995.

DeBoisset Report PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DeBoisset Report PDF

Загружено:

Авторское право:

Доступные форматы

HAND GESTURE RECOGNITION

Gradient orientation histograms and

. Our training set of image will be B

Вам также может понравиться