Академический Документы
Профессиональный Документы
Культура Документы
X =
n
i=1
X
i
n
It is important to note that we will call
X the mean of the set X. The mean of the data
set will not give us so many indications, apart of the middle point.
For example, we can have the same mean for two really different data sets. Therefore, we
will see what is important to better dene the data sets below:
[0 8 12 20] and [8 9 11 12]
CHAPTER 2. THEORY AND BACKGROUNDS 12
Here, what is really different between the two sets is the standard deviation. This
is a way to measure the spread out of the data in a set. Here is the denition of the
standard deviation: This is the square of the add of the average distance from the mean of
the set to the point, divided by n 1, when n is the number of points in the set. Here is
the formula:
s =
n
i=1
(X
i
X)
2
(n 1)
Where s is the usual symbol for standard deviation of a sample.
We can wonder why we are dividing the sample by n 1 and not by n. We will not give
any explanations of that here, because it would be too long to explain, and it is not impor-
tant for our project. But what is important to remember is that when we use a sample of
a population and that we want an approximation results for the entire population, then we
will have to use n 1. But if we calculate the standard deviation on the entire population
directly, then we will have to use n instead of n 1. We can nd further information on
the web site http://mathcentral.uregina.ca/RR/database/RR.09.95/weston2.html
This page is explaining a bit more about standard deviation and about the differences
between the denominators choice. It also gives interesting experiments which are well
describing the difference between the samples or population used and therefore on the
denominators choose.
We will draw tables of the standard deviation calculation for the 2 sets written upper.
Set 1:
X (X
X) (X
X)
2
0 -10 100
8 -2 4
12 2 4
20 10 100
Total 208
Divided by (n1) 69,333
Square root 8,3266
Set 2:
CHAPTER 2. THEORY AND BACKGROUNDS 13
Xi (X
i
X) (X
i
X)
2
8 -2 4
9 -1 1
11 1 1
12 2 4
Total 10
Divided by (n1) 3,333
Square root 1,8257
As expected, the rst set has a much bigger standard deviation as the second one. In-
deed, the rst data set has really spread out data instead of the second one.
We can just watch quickly another set, which will have a standard deviation of zero:
[10 10 10 10]
Here, the standard deviation is equal to zero, although the mean is still of 10. This is
because all the points are the same so the data are not spread out. None of them deviate
from the mean.
Variance Variance is another measure of the spread out of data in a set. In fact it is
quite the same as the standard deviation.
We can take a look on the formula:
s
2
=
n
i=1
(X
i
X)
2
(n 1)
We can notice that this is just the square of the standard deviation (thats why the sym-
bol s
2
is used).
Usually, we use the symbol s
2
for the variance of a sample. The variance is just another
way of measuring the spread out of data in a sample. We can say that the variance is less
used as the standard deviation. In fact, the variance will be useful for the next section
which is the covariance.
Covariance The covariance will differ from the two rst measurements explained in
the upper sections on one principle way: the covariance is a 2-dimensional measurement.
The covariance is a really important knowledge for the PCA method, because we will
CHAPTER 2. THEORY AND BACKGROUNDS 14
need this calculation later.
So, the calculation of standard deviation or of variance will be useful in the case of one
dimension data set, like the set of the marks obtained by all the ENSAM students for their
FYP (Final Year Project). But, for the PCA method, which will deal with more dimen-
sions, we will need the covariance and not the variance knowledge.
The covariance will allow us to see if there are any relationship between the different
dimensions of the data set. For example, we could realize a 2-dimensional set of the
marks obtained by the ENSAM students and their age. Then, we could see if the age has
an effect on the mark received by the student. It is exactly the kind of test that we could
perform with the covariance (We can yet imagine where we want to go with that in our
project: watch if our different pictures are in relations or not).
The covariance formula is really near from the variance formula. We can write the vari-
ance formula like this, to better understand the covariance formula:
var(X) =
n
i=1
(X
i
X)(X
i
X)
(n 1)
Now we can take a look on the covariance formula:
cov(X, Y ) =
n
i=1
(X
i
X)(Y
i
Y )
(n 1)
We can just notice that if we try to calculate the covariance between a dimension and
itself, we will get the variance.
In fact, we just replace the second part of the formula with the second dimension to ana-
lyze to obtain the covariance formula!
We can also say that it is possible to calculate the covariance between more dimensions
than two. We can calculate covariance on three dimensions for example. The lonely thing
to know is that we will calculate 9 covariances between dimensions (2 by 2) and then
create a matrix (called the covariance matrix), that will be 3 3, in case of three dimen-
sions. In fact, the diagonal will be the result of the variance for each dimension and the
other terms will be the covariance between terms (for example, line 2 column 1 will be
the covariance between the y and the x dimensions). By the way, we can notice that the
covariance is commutative (we can easily replace each dimension per the other without
changing the results). Therefore the covariance matrix will be symmetrical.
Then, we can get lots of really important information with the covariance calculation.
In any case, it is important to notice that the value returned will not be as important as the
sign returned.
Indeed, if the result is positive, that will mean that the two dimensions increase together
CHAPTER 2. THEORY AND BACKGROUNDS 15
(For our example on the ENSAM students -marks received and age- ) this will mean that
the mark increases when the age increases.
And if the result is negative, then it will mean that when one dimension is increasing, then
the other is decreasing.
Last case, the result returned is null. That will just mean that our 2 dimensions do not
have any kind of relations between them. They are independent.
Therefore, the covariance calculation can bring us really important indications on the
set of data we are studying. With it, we can then represent the covariance between 2 di-
mensions in a graph to get an idea of the relation that exists between them.
Of course, it will not be possible to represent the covariance when our data set will have
more than 3 dimensions.
Although the covariance can just be calculated between two dimensions and it is not
possible to represent the relationship between the data when we get more than 3 dimen-
sions, the covariance is often used for big data set with many dimensions. Indeed, we
can calculate the relationship between the dimensions and have some exploitable results.
Moreover, it will be pretty hard to visualize the relationship between dimensions when
we have a huge data set with many dimensions without the calculation of the covariance.
Therefore, the calculation of the covariance will bring us lots of help to see the relation-
ships between dimensions in a data set like we have in our project.
The covariance matrix Recall that covariance is always measured between 2 dimen-
sions. If we have a data set with more than 2 dimensions, there is more than one covari-
ance measurement that can be calculated. For example, from a 3 dimensional data set
(dimensions x, y ,z ) we could calculate cov(x, z), cov(y, z)...
In fact, for an n-dimensional data set, we can calculate
n!
2(n2)!
different covariance values.
The other will be the variance in the diagonal.
A useful way to get all the possible covariance values between all the different dimen-
sions is to calculate them all and put them in a matrix. Lets have a quick overview of the
denition for the covariance matrix for a set of data with n dimensions:
C
nxn
= (c
i,j
, c
i,j
= cov(Dim
i
, Dim
j
))
Where C
nxn
is a n by n matrix (n rows and n columns), and Dim
x
is the xth dimension.
We can here notice that the covariance matrix will be square in any case, and that each
CHAPTER 2. THEORY AND BACKGROUNDS 16
part of the matrix is the result for a covariance calculation between two dimensions (ex-
cept for the diagonal as said before).
For example, we will build the covariance matrix for a 3 dimensional data set, using the
usual dimensions x , y and z . Then, as the matrix is square we will have the values below:
C =
_
_
_
cov(x, x) cov(x, y) cov(x, z)
cov(y, x) cov(y, y) cov(y, z)
cov(z, x) cov(z, y) cov(z, z)
_
_
_
As said earlier, the matrix will be symmetrical, and the diagonal will be the variance
calculation. Therefore, we can say that the matrix will have this form:
C =
_
_
_
var(x) cov(x, y) cov(x, z)
cov(x, y) var(y) cov(y, z)
cov(x, z) cov(y, z) var(z)
_
_
_
Therefore, we will have 6 terms to calculate for the 9 terms.
Matrix Algebra
This section is made to provide a background for the matrix algebra required in PCA. We
will especially take a closer look at the eigenvectors and eigenvalues of a given matrix.
Lets see an example of matrix:
_
2 3
2 1
_
_
3
2
_
= 4
_
3
2
_
For example, 4 is an eigenvalue of the matrix.
Eigenvectors First of all we will give the wikipedia denition of an eigenvector:
In linear algebra, the eigenvectors (from the German eigen meaning inherent, character-
istic) of a linear operator are non-zero vectors which, when operated on by the operator,
result in a scalar multiple of themselves. The scalar is then called the eigenvalue associ-
ated with the eigenvector.
As we can see in the example upper, the results of the multiplication between a vector
and a matrix returns exactly 4 times the beginning vector. We have here an example of
CHAPTER 2. THEORY AND BACKGROUNDS 17
eigenvector. We will try to explain this example to better understand the eigenvectors.
The vector is a 2-dimensional one. The vector
_
3
2
_
represents an arrow going from the
origin (0, 0) to the point (3, 2). The matrix
_
2 3
2 1
_
can be imagined as a transformation
matrix. Therefore, if we multiply this matrix with a vector, the result returned will be
another transformed vector. If this transformed vector is just a multiplication of itself
by a scalar, then it is an eigenvector and the scalar will be the eigenvalue associated to the
eigenvector.
Now we will try to see the different properties of these eigenvectors:
First of all, we can just nd eigenvectors for square matrixes. We can also say that
not every square matrixes do have eigenvectors. In the case they have, then they can
not have more eigenvectors than their dimension (for a 3 3 matrix, the maximum
number of eigenvectors is 3).
You can multiply an eigenvectors by a scalar, it will still be an eigenvector (because
we just change the length and not its direction).
All the eigenvectors are orthogonal between them, no matter the number of dimen-
sions.
Most of the time, the returned eigenvectors are normalized (norm = 1). It will be
then easier to exploit.
We can nd further information on eigenvectors on the web site:
http://www.mathphysics.com/calc/eigen.html.
Eigenvalues Each eigenvector is associated to an eigenvalue. The eigenvalue could give
us some information about the importance of the eigenvector. The eigenvalue are really
important in the PCA method, because they will permit to realize some threshold to lter
the non-signicative eigenvectors, so that we can keep just the principal ones.
MATLAB will return the eigenvalues and the eigenvectors of the covariance matrix with-
out any problem.
CHAPTER 2. THEORY AND BACKGROUNDS 18
2.4.3 Main steps of the method
Finally we arrived to Principal Components Analysis (PCA), the interesting part of our
project. We could rst answer a question: What is it exactly?. We can answer that it
is an algebraical way to compare images by compressing the set of data and highlighting
the principal components of the set.
The main advantage of PCA is that once we have found the principal components of
the set, which express pretty well the data, we can take back the beginning data (images
in our case) with a low loss, even if the compression is really high!
In this section, we will try to explain how we went through the problems to realize this
method to make gesture recognition. Therefore, we will describe the work made step by
step, to understand each part of the work.
We can then split up the method into its main parts:
First of all, we had to create the data set. Indeed, we had to take some pictures of
the hand that could do the database for the PCA recognition. The aim is to choose
a good number of pictures and a good resolution of these in order to have the best
recognition with the smallest database. Then, the aim is to make the database. To
create it, the theory is to transform all the pictures in a simple vector, which will
have a dimension of the number of pixels. Then, we create a matrix where each line
is an image-vector... The results for 12 pictures and a 640 480 denition will be a
12 307200 matrix.
Then, the next step is to subtract the mean from each of the data dimensions. The
mean subtracted is simply the average across each dimension.For example, for a
three dimensions x, y and z, we will have to subtract x from x, y from y and z from
z. The aim is to center our set in the space of all the dimensions (we will see later
further explanations of the different spaces used, but what is important to remember
here is that we have to subtract the mean to center our set of data).
The step three is to calculate the covariance matrix of the database. It will be quite
difcult in our case, because the data set is really huge! So we have found a method
to simplify this calculation. We will explain the method:
Indeed, we can not calculate the covariance matrix of the rst matrix , because it
will be too huge. So we had to nd a way to nd out the principal eigenvectors
without calculating the big covariance matrix.
I have found the solution in a paper written by M. Turk and A. Pentland. [23]
CHAPTER 2. THEORY AND BACKGROUNDS 19
The method consists in choosing a new covariance matrix.
Indeed we will call our second matrix (all the images with the mean subtracted)
12 307200 :