Вы находитесь на странице: 1из 19

IMPORTANT - This algorithm was written in MATLAB!

Not the best language to write this algorithm in, but I do not
think anybody else
has done it in MATLAB, for a reason!
This is my first try at machine learning. I knew very little about it when I started, but I feel that I'm ok enough to
give you a tutorial
on what I did and how i perceived this algorithm should be implemented.
I decided that I would implement my version of the Viola and Jones method for face detection. Using Haar like
features and
adaptive boosting... blah blah, we don't care about the technicals right? Just want to understand how this is done.
so here
is my video of my implementation. I still need to tweak it some more to make some improvements.
Yes that is me! I should have hired some beautiful model instead of having my face here, then maybe this page
would get more hits.
Anyways, lets get started.

Training
Step 1
To do face detection first we need a dataset of images of faces and non-faces. You need as many as you can
get ( in the thousands or ten thousands range). Because you will do supervised training, which means you are
already telling
the computer if each picture is a face or non-face, you need to get those images.
In my case, I obtained a database from MIT which was composed of 2429 faces and 4547 non faces. The images
are 19 X 19 ( later I
realized that I should have looked for something in the range of 25 X 25, it seems to capture features better).
They are all black and
white images. Here are some of them:

Once you have your own nice database, you need to understand the Haar features. They are rectangles that map
over the faces
of people and tell you if you are a face or non face. I will explain how this is accomplished later. In here I will just
focus on what
took me a long time to figure out. Why are there so many Haar features in a 19 x 19 Image? So here are what
they look like:

There are so many more variations, but these are the only ones I used. So lets take the first one in the list here. A
white rectangle
above a black rectangle. If you see your image as a 19 x 19 matrix ( that is from x1 to x19 and y1 to y19), and you
start with this
feature being a 1 x 2 in size and in position x1,y1, then you have your first feature:

You would do all your calculations based on this classifier, then move on to the next size, which would be a 1 x 4
in position x1,
y1, your second feature:

And you get the point, it would keep increasing in size.It would be 1 x 8, 1 x 16 and well, it cannot go farther than
that.
Then the classifiers would start with a 2 x 2.

And it would continue 2 x 6, 2 x 8, and so on. Eventually, you will come to an end of this classifier. The code for
this would be:

The feature matrix explains the first sizes that each of the 5 classifiers can be.

- Each feature (5 total), this is i.


- They all must start at 1 x 2, this is sizeX and sizeY.
- They cannot go over the size of 19 x 19, this is x and y.
- winLength and winWidth are parameters that each feature will increase in size through the image.
CalcBestThresh() is the function called that will see if the feature being tested is a good one or not.
Go to the next section following the link below:

Step 2
Now that all the features to be used have been setup for each classifier, it is time to actually do something on
each
feature; this is CalcBestThresh ( calculate best threshold obiously).
Each feature that is passed to CalcBestThresh has to be passed to every single image in the database. In my
case, I
have 2429 face images and 4547 non face images. Here is how it starts:

threshPos and threshNeg are the numbers received from calculating the Integral image. weightsFile is just a file
in which I will keep the weight of each classifier ( for Adaptive boosting).

Integral Image
The integral image is a summed area matrix that allows for fast computation of pixel values. The best way is to
give
an example:
The first classifier will have pixels 1 x 2. The white area of the classifier will be a 1 x 1 and the black area will be a
1 x 1.
To calculate the Haar feature, I must subtract one from the other. So if
WpixelVal = 81
BpixelVal = 76
Haar Feature threshold = 81 - 76 = 5
This will go into my threshPos(i) or treshNeg(i). These thresholds are what will find if the classifier is a good
classifier or a bad one.

Step 2 continued
below weightsfile = 'weights.mat' is:

The first two lines adaWeights is what opens the file so that it can be modified. What we want to focus on in this
code is the for loop below adaWeights.
As the comments say, haar feature calculation will happen for every single face image using the function
HaarFeatureCalc.

HaarFeatureCalc
All this is doing is parsing the information given to it ( x,y,winWidth,winLength,classifier), in order to calculate
which pixels need to be added for the Integral Image ( with winWidth and winLength) and from which pixels
starting ( x and y) and which classifier. Here is the whole code for HaarFeatureCalc.

There are two extra functions here IntImg and CalcIntRec.

IntImg

This is matlab's commands to let me sum up all the pixels within the 19 x 19 image and put them in a matrix of its
sum from left to right top to bottom.

CalcIntRec

My comment's explanation is perfect. I am getting four reference points and calculating the sum of pixels inside it.
This is just how Viola and Jones'method of face detection does it. This is pretty basic, but if you do not
understand it I suggest a quick google search. Google is your friend!

Back to Step 2
Hopefully you have understood until now what I have been doing. Once haar feature calculation has been
completed for FACE images, the mean, standard deviation, min and max are calculated for all those values that
each image gives me. WHY?
As I did some research on how to categorize a classifier as good or bad, I thought....Why not make it look like a
Gaussian distribution. I will start from the mean value and expand the standard deviation. The more I expand the
standard deviation, the greater the recognition will be for faces, however, the greater the error for non-faces. Let
me explain this better:
I get all my image haar features and I find that the mean value is 100. This means that most images are in the
range of 100. But some are 95 or 87, or even 67 (the outliers). However, I know that most of my images have a
value of 100 for this particular classifier.
So instead of saying, "Anything that is 100 is a face, and anything else is a non-face", I say...."Well, 100 is a face
for sure for this classifier, but 105 can be also, or 95". I increase the width of this number on both sides without
reaching my MAX and MIN value. Standard deviation is the rate at which I increase or decrease depending on
the classifier. this is what I call CALCULATING THE BEST THRESHOLD for a classifier
In order to do all of this above, I must calculate the same for non face images. So below the code for getting the
mean, max, min, std, I write:

Great!!!

Step 3
CalcBestThresh is not done yet. I just feel this is the part for step 3 since I will now start looking for a best
threshold.
I do this by expanding the range explained in the previous step and doing it 50 times. I start at the mean
(average) value and expand little by little and see how many faces are in the range. When the range reaches
three conditions, I stop and decide if this is a STRONG CLASSIFIER, or not.
1. Get a face recognition of almost 100% (this means almost all face images are detected as faces).
2. Get a really low non-face recognition (this means that most non-face images are not detected as faces).
3. Total error for this classifier is less than 50%.
If and only IF, these three conditions are met, I will send it to my function Adaboost. Here is the code which goes
below the code above, then I will explain adaptive boosting.

AdaBoost
Here is where Adaptive Boosting takes effect. I am assuming if you are reading this article is because you have
seen the adaptive boosting equations and probably do not understand them. Here is me trying to explain them.

So I believe I have already explained setting the weights which are the first two lines of the image above. Next is
For t = 1,....,T:
Each t is a classifier. At this moment in my program, we really do not know how many classifiers we will end up,
but as we keep training our program, we will know which ones are strong enough to be put through AdaBoost.
Finding ht is exactly what I do when checking for the 3 conditions to say my classifier is a good one, this was
described above. So AdaBoost function really starts here:
Set alpha of t (third diamond):
Alphas are calculated with this equation. Getting the Error of the classifier and running it though log, thus
minimizing the convex loss function (got this off wikipedia. No idea what it means), the alpha value is what tells
me how strong of a classifier is this particular classifier. The higher then number, the better it is. This will come on
handy when doing the cascade of classifiers.
Finally weights for each image are updated accordingly. Remember, images that were recognized correctly will be
reduced in weight because are most likely to be recognized as images that were classified incorrectly will be
increased because if another classifier recognizes these hard to recognize images, then it must be a good one.
(Sorry for all the recognized words in here, hope it didnt confuse you).
In my code, I save all this information in a file for future use. If you did not understand everything 100%, hopefully
my code will help since it is all commented out.

Evaluating
After 3 weeks of training, I finally got my classifiers. Total was 122 strong classifiers with the rules that I decided
were best (remember? they were described above). My main function (I really should have been calling them
modules from the beginning) is called TestCascade. Lets see the first lines of code for this:

Pretty much here I am opening my weights file that contains the already trained classifiers with their alpha
weights, etc. This is what we worked so hard on for three weeks (well, our computer did).
The important line here is the AlphaSort(Weights). This function takes care of sorting the weights from highest
alpha weights (classifiers that are most important in classification) to the lowest. Why?
This is done in order to create a cascade of classifiers. Almost done now, follow the link below to finish this
tutorial:
Ok, after 3 weeks of training, I finally got my classifiers. Now lets do some face detection using them. matlab is
nice because it allows us
to get the laptop's internal camera with a few lines of code.
Total was 122 strong classifiers with the rules that I decided were best
(remember? they were described here). My main function (I really should have been calling them modules from
the
beginning) is called TestCascade. Lets see the first lines of code for this:

Pretty much here I am opening my weights file that contains the already trained classifiers with their alpha
weights, etc. This is what we worked
so hard on for three weeks (well, our computer did).
The important line here is the AlphaSort(Weights). This function takes care of sorting the weights from highest
alpha weights (classifiers that are
most important in classification) to the lowest. Why?
This is done in order to create a cascade of classifiers. Here is the code for alpha sort:

This code is Simple right? The rest of the code is setting up the program to take a video. The last part of the
code:

I am grabbing frame by frame a 19 x 19 image from my 480 by 640 image and running it through the function
cascadeClass().
Which is the following:

As you can see, each subimage is being tested with the Weights from the file and compared to a threshold, which
I manually
setup for each environment until I get a good recognition (I know an adaptive filter would be great for this, but I
didn't get
around to it). So for CalcHaarCascade:

Finally, if the part of the image goes through all the classifiers and passes the thresholds, then the part of the
image is seen
as a face and a green rectangle is drawn around it.
That is it, I hope you understood what I tried to explain.

Вам также может понравиться