Вы находитесь на странице: 1из 9

Face Detection in Video

November 10, 2010

Face Detection in Video using cascade of


histogram of oriented gradients
By

Amol Damare

Introduction to Computer Vision Page 1


Face Detection in Video
November 10, 2010

Abstract
I have implemented face detection using cascade of rejecters which uses histograms of
oriented gradients as features. This speeds up the process of detection as compared to only one
classifier used along with histograms of oriented gradients. This idea is based on paper by Q.
Zhu, S. Avidan, M-C Yeh, , K-W Cheng, “Fast Human Detection Using a Cascade of Histograms of
Oriented Gradients”.

Previous Work
Navneet Dalal and Bill Triggs presented the paper on human detection using histograms
of oriented gradients. In this paper they suggested a new method for detection humans in images
that use histograms of oriented gradients. This method proves to be robust to illumination,
camera angles and color changes. Main contribution of this paper was introduction of grids of
densely computed, locally normalized Histograms of Oriented gradients. The HOG descriptors
are calculated over dense and overlapping grid of spatial blocks. Gradient information is
extracted at a fixed resolution and gathered into a high dimensional feature vector. Histograms
are calculated by voting this feature vectors in equally spaced orientation bins. Overlapping
blocks, strong local normalization, spatial binning is used for high performance of the system.
The system developed gives excellent accurate results. But the main disadvantage of this system
is its low speed. The improvement over this approach is suggested by Q. Zhu, S. Avidan, M-C
Yeh, , K-W Cheng in their paper “Fast Human Detection Using a Cascade of Histograms of
Oriented Gradients”.

This system uses a cascade of rejecters with HOG acting as features to speed up the
process. Also it uses increased number of detector windows compared to Dalal and Triggs
approach. It also suggests that varying the size of “block “, in Dalal Triggs approach would
improve accuracy of detection significantly. As system uses varying size of blocks there are
large number of descriptors possible. Hence system uses AdaBoost algorithm to select best
blocks suited for detection. This system gives similar results in terms of accuracy as that of Dalal
and Triggs’; whereas it outperforms in terms of speed.

In next section there will be some background of face detection and introduction to what
I did in the project and how it is related to what has been done in the class.

Introduction to Computer Vision Page 2


Face Detection in Video
November 10, 2010

Background
There appeared to be 3 approaches for face detection. First approach as we did in
class uses color information of face and uses this information to detect faces. This approach is
very vulnerable as skin color varies from person to person and background, illumination changes
can also affect the process of detection. Second approach is to build a template of the face and
use this template in template matching process to detect the faces. Eigen faces, fisher faces,
AAM are some of the popular methods in this approach. The approach includes use of pattern
classification methods to detect the faces. This approach uses machine learning extensively. It
creates classifier by training it on features extracted from training images and uses this classifier
to detect the faces in images. Usually this is done by use of a sliding detector window. Detector
window is slide across image and features are extracted from detector window and provided as
input to classifier. Classifier then gives result whether it is a face or not.

I have used machine learning approach in my system. This system uses HOG as feature
to classify faces and non faces. A cascade of rejecters is used as classifier in this system. I have
calculated HOG over dense and overlapping blocks of cells for a detector window and collected
these HOG features in a single descriptor vector; this vector then is provided as input to a sum
classifier to get face/non-face decision. HOG are calculated using gradient magnitudes and
orientation. A simple edge detection filter applied to image to get the gradients.

Histogram of oriented gradients


I will start with introduction of histogram of oriented gradients algorithm presented by
Dalal and Triggs. Dalal and Triggs used image encoded as histograms of oriented gradients to
detect humans. I have used rectangular HOG for my system. It is calculated over densely and
uniformly sampled grids. Each block is normalized independently. R-HOG consists of C X C
cells per block; each cell in turn consists of N X N pixels. HOGs are calculated over B
orientation bins. Following figure shows a 3 X 3 RHOG block.

(Ref: Navneet Dalal, PhD thesis)

Introduction to Computer Vision Page 3


Face Detection in Video
November 10, 2010

Overall Detection Process

Following figure depicts the overall detection process of system:

Input Image

HOG Encoding

Collect HOG over


Detection Window

Linear SVM

Face- non face


decision

Here input image is image in RGB color space. Gradients are calculated by applying a
simple mask [1 0 -1] in x and y directions. Histograms are calculated for each block by binning
gradient orientation into orientation bins. For a detector window, HOGs are collected into single
descriptor vector. This descriptor vector is provided as input to a trained SVM, this SVM then
gives decision whether detector window has a face or not.

Algorithm for calculating HOG:

Input: Input image scaled

Output: Encoded feature vector

Common Initial Steps:

• Gamma normalize the image

• Apply mask [1 0 -1] to calculate x and y gradients and orientation

Introduction to Computer Vision Page 4


Face Detection in Video
November 10, 2010

Descriptor Calculation:

• Densely and uniformly sample the image for each sampled point

• Divide image into CN X CN square pixel region around sampled points

• Create C X CX B histogram

• Using each pixel gradient magnitude add votes in corresponding bin in histogram

Final Steps:

• Apply normalization on each block

• Collect all histograms into one big descriptor.

Reference: PHD Thesis of Navneet Dalal

Input image is first gamma normalized. This step is optional. After calculation of HOGs
over a block, each block is normalized. This normalization causes invariance to illumination
changes. After calculation of HOG descriptor vector, it’s used to train support vector machine.
Following figure shows a sample HOG with 8 X 8 pixels per block voted over 9 bins:

Introduction to Computer Vision Page 5


Face Detection in Video
November 10, 2010

Support Vector Machines:

Support vector machine is a linear classifier which uses supervised learning technique to
classify the data. It takes set of data as input and converts it into higher dimension and finds a
classifier for it.

I have used MATLAB’s in built in function for support vector machine.


SVMSTRUCT=SVMTRAIN (TRAIN, GROUP):

This function uses data specified in TRAIN and categories specified for each data item in
TRAIN in GROUP, and gives classifier SVMSTRUCT.

GROUP=SVMCLASSIFY (TEST, SVMSTRUCT):

This function uses classifier trained using SVMTRAIN and classifies input data TEST
into groups specified in groups stated while training.

Disadvantages of HOG

Approach presented by Dalal and Triggs is accurate and provides good detection. But it is
very slow in processing for videos. Also it uses fixed size of blocks which are not informative
enough to allow fast rejection. To improve on this deficiencies another approach is presented by
Q. Zhu, S. Avidan, M-C Yeh, , K-W Cheng in their paper “Fast Human Detection Using a
Cascade of Histograms of Oriented Gradients”. They have suggested use of cascade of rejecters
which increases the speed of detection and allowed the use of variable sized blocks to allow fast
rejection in the initial states.

Cascade of Histograms of Oriented Gradients

This approach combines cascade of rejecters with HOG to achieve a high speed accurate
face/human detection system. This approach also uses variable size blocks to capture features
from the image. Following image shows basic structure of cascade

Introduction to Computer Vision Page 6


Face Detection in Video
November 10, 2010

Cascade of rejecters uses classifiers with detection rate decreasing i.e. in above figure
classifier 1 will have maximum detection rate, classifier 2 will have detection rate less than 1 but
greater than 3 and so on. This helps in having rejections in early stages which boosts the speed of
the system.

Following is algorithm for training the cascade:

(Reference: Q. Zhu, S. Avidan, M-C Yeh, K-W Cheng, “Fast Human Detection Using a
Cascade of Histograms of Oriented Gradients”)

Introduction to Computer Vision Page 7


Face Detection in Video
November 10, 2010

Image Database
I have used images from CMU database given by http://www.cs.cmu.edu/~cil/v-
images.html. I have used images from this database and cropped and resized frontal views of
faces form images as shown below:

Results
Results of my experiment were satisfactory but they were not as accurate as stated in the
referred paper for e.g.

Above image has detected face multiple times.

Introduction to Computer Vision Page 8


Face Detection in Video
November 10, 2010

Also in case of video there were too much false detection and multiple detections as well
as some faces were undetected.

Conclusions and Observations


Main reason for such results could be lack of sufficient training over the images. With
more training classifiers would be able to reduce false detection rates considerably. Also I have
not implemented any post processing for the detected images. Post processing would significant
in resolving multiple detections and suppressing any false detection.

References
Source of Data:

CMU/VASC database: http://www.cs.cmu.edu/~cil/v-images.html

References:

1. N. Dalal and B. Triggs: Histograms of oriented gradients for human detection.


2. PhD thesis of N. Dalal “Finding People in Images and Video Sequences".
3. Q. Zhu, S. Avidan, M-C Yeh, , K-W Cheng, “Fast Human Detection Using a Cascade of
Histograms of Oriented Gradients”
4. T. Cootes, G. Edwards, and C. Taylor. Active appearance models.
5. M. Turk and A. Pentland. Eigen faces for recognition.

Introduction to Computer Vision Page 9

Вам также может понравиться