Академический Документы
Профессиональный Документы
Культура Документы
Amol Damare
Abstract
I have implemented face detection using cascade of rejecters which uses histograms of
oriented gradients as features. This speeds up the process of detection as compared to only one
classifier used along with histograms of oriented gradients. This idea is based on paper by Q.
Zhu, S. Avidan, M-C Yeh, , K-W Cheng, “Fast Human Detection Using a Cascade of Histograms of
Oriented Gradients”.
Previous Work
Navneet Dalal and Bill Triggs presented the paper on human detection using histograms
of oriented gradients. In this paper they suggested a new method for detection humans in images
that use histograms of oriented gradients. This method proves to be robust to illumination,
camera angles and color changes. Main contribution of this paper was introduction of grids of
densely computed, locally normalized Histograms of Oriented gradients. The HOG descriptors
are calculated over dense and overlapping grid of spatial blocks. Gradient information is
extracted at a fixed resolution and gathered into a high dimensional feature vector. Histograms
are calculated by voting this feature vectors in equally spaced orientation bins. Overlapping
blocks, strong local normalization, spatial binning is used for high performance of the system.
The system developed gives excellent accurate results. But the main disadvantage of this system
is its low speed. The improvement over this approach is suggested by Q. Zhu, S. Avidan, M-C
Yeh, , K-W Cheng in their paper “Fast Human Detection Using a Cascade of Histograms of
Oriented Gradients”.
This system uses a cascade of rejecters with HOG acting as features to speed up the
process. Also it uses increased number of detector windows compared to Dalal and Triggs
approach. It also suggests that varying the size of “block “, in Dalal Triggs approach would
improve accuracy of detection significantly. As system uses varying size of blocks there are
large number of descriptors possible. Hence system uses AdaBoost algorithm to select best
blocks suited for detection. This system gives similar results in terms of accuracy as that of Dalal
and Triggs’; whereas it outperforms in terms of speed.
In next section there will be some background of face detection and introduction to what
I did in the project and how it is related to what has been done in the class.
Background
There appeared to be 3 approaches for face detection. First approach as we did in
class uses color information of face and uses this information to detect faces. This approach is
very vulnerable as skin color varies from person to person and background, illumination changes
can also affect the process of detection. Second approach is to build a template of the face and
use this template in template matching process to detect the faces. Eigen faces, fisher faces,
AAM are some of the popular methods in this approach. The approach includes use of pattern
classification methods to detect the faces. This approach uses machine learning extensively. It
creates classifier by training it on features extracted from training images and uses this classifier
to detect the faces in images. Usually this is done by use of a sliding detector window. Detector
window is slide across image and features are extracted from detector window and provided as
input to classifier. Classifier then gives result whether it is a face or not.
I have used machine learning approach in my system. This system uses HOG as feature
to classify faces and non faces. A cascade of rejecters is used as classifier in this system. I have
calculated HOG over dense and overlapping blocks of cells for a detector window and collected
these HOG features in a single descriptor vector; this vector then is provided as input to a sum
classifier to get face/non-face decision. HOG are calculated using gradient magnitudes and
orientation. A simple edge detection filter applied to image to get the gradients.
Input Image
HOG Encoding
Linear SVM
Here input image is image in RGB color space. Gradients are calculated by applying a
simple mask [1 0 -1] in x and y directions. Histograms are calculated for each block by binning
gradient orientation into orientation bins. For a detector window, HOGs are collected into single
descriptor vector. This descriptor vector is provided as input to a trained SVM, this SVM then
gives decision whether detector window has a face or not.
Descriptor Calculation:
• Densely and uniformly sample the image for each sampled point
• Create C X CX B histogram
• Using each pixel gradient magnitude add votes in corresponding bin in histogram
Final Steps:
Input image is first gamma normalized. This step is optional. After calculation of HOGs
over a block, each block is normalized. This normalization causes invariance to illumination
changes. After calculation of HOG descriptor vector, it’s used to train support vector machine.
Following figure shows a sample HOG with 8 X 8 pixels per block voted over 9 bins:
Support vector machine is a linear classifier which uses supervised learning technique to
classify the data. It takes set of data as input and converts it into higher dimension and finds a
classifier for it.
This function uses data specified in TRAIN and categories specified for each data item in
TRAIN in GROUP, and gives classifier SVMSTRUCT.
This function uses classifier trained using SVMTRAIN and classifies input data TEST
into groups specified in groups stated while training.
Disadvantages of HOG
Approach presented by Dalal and Triggs is accurate and provides good detection. But it is
very slow in processing for videos. Also it uses fixed size of blocks which are not informative
enough to allow fast rejection. To improve on this deficiencies another approach is presented by
Q. Zhu, S. Avidan, M-C Yeh, , K-W Cheng in their paper “Fast Human Detection Using a
Cascade of Histograms of Oriented Gradients”. They have suggested use of cascade of rejecters
which increases the speed of detection and allowed the use of variable sized blocks to allow fast
rejection in the initial states.
This approach combines cascade of rejecters with HOG to achieve a high speed accurate
face/human detection system. This approach also uses variable size blocks to capture features
from the image. Following image shows basic structure of cascade
Cascade of rejecters uses classifiers with detection rate decreasing i.e. in above figure
classifier 1 will have maximum detection rate, classifier 2 will have detection rate less than 1 but
greater than 3 and so on. This helps in having rejections in early stages which boosts the speed of
the system.
(Reference: Q. Zhu, S. Avidan, M-C Yeh, K-W Cheng, “Fast Human Detection Using a
Cascade of Histograms of Oriented Gradients”)
Image Database
I have used images from CMU database given by http://www.cs.cmu.edu/~cil/v-
images.html. I have used images from this database and cropped and resized frontal views of
faces form images as shown below:
Results
Results of my experiment were satisfactory but they were not as accurate as stated in the
referred paper for e.g.
Also in case of video there were too much false detection and multiple detections as well
as some faces were undetected.
References
Source of Data:
References: