Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION
1
In this paper we apply a detection style approach using information about
motion as well as intensity information. The implementation described is very efficient,
detects pedestrians at very small scales (as small as 18x36 pixels), and has a very low
false positive rate. The system is trained on full human figures and does not currently
detect occluded or partial human figures.
My approach builds on the detection work of Viola and Jones [10]. Novel
contributions of this paper include the development of a representation of image motion
which is extremely efficient and implementation of a state of the art pedestrian
detection system which is operates using AdaBoost and Haar-Like Features.
1.1 OBJECTIVE
The main goal of this project is to develop a system which is capable to detect the
pedestrians. This Project contains the study and implement AdaBooste Pedestrian
detection with Haar - Like Features and analysis the performance of the AdaBoost
system based on the different size of dataset. This project include the development of a
representation of image motion which is extremely efficient and implementation of a
state of the art pedestrian detection system which is operates using AdaBoost and Haar-
Like Features.
In this work, pictorial structures that represent a simple model of a person are used
for this purpose, as was done in previous works [9] [10] [3]. However, rather than using
a simple rectangle segment template, constant colored rectangle, or learned appearance
models specific to individuals for part detection, I train a discriminative classifier for
each body segment to detect candidate segments in images.
2
An overview of the algorithm is shown in figure 1.2.1. A large number of training
examples, both positive and negative, are used to train binary classifiers for each body
segment using the Adaboost algorithm. After training, the classifiers are scanned over
new input images. The detections from the input image for each segment are passed to
an algorithm that determines the best configurations consisting of one of each segment.
The configuration cost is computed efficiently as a function of the segment cost from
the classifier and the kinematic cost of combining pairs of segments as specified by
some deformable model.
c) Choose
ht with lowest error
εt
1−e i
d) Update weights:
ωt +1, i=ωt , i βt where
e i=0 if
xi is correctly classified
εt
βt=
and
e i=1 otherwise and
1−ε t
T T
3
Pictorial structure configurations are considered valid if their cost is below a
predetermined threshold. I examine the ability of an AdaBoost detector to find body
segments as well as the utility of enforcing kinematic constraints on pedestrian
detections. The following sections describe the details of each component of the
detection framework. Many of the ideas used in this work have been presented
previously. I cite the original authors of each algorithm, but reproduce many equations
and algorithms for completeness.
Detecting and tracking a moving object in a dynamic video sequence has been a
vital aspect of motion analysis. This detecting and tracking system has become
increasingly important due to its application in various areas, including communication
(video conferencing), transportation (traffic monitoring and autonomous driving
vehicle), security (premise surveillance) and industries (dynamic robot vision and
navigation).
4
1.4 THESIS OUTLINE
Chapter 1 provides readers a first glimpse at the basic aspects of the research
undertaken, such as objectives, scopes, problem formulation and the application
targeted by the developed moving object detection and tracking system.
Chapter 2 gives an insight to the existing vision-based moving object detection and
tracking algorithms developed by the various researchers, and subjectively classify
them into four categories.
Chapter 5 deals with the summary and conclusions of the research. A number of
research findings obtained from the empirical results of the implemented detection and
tracking system are also discussed. Lastly, some realistic extensions as well as possible
enhancements for the research are provided.
5
CHAPTER 2
LITERATURE REVIEW
The choice of the algorithm that will perform well depends upon many
considerations. Of primary concern is the selection of an efficient model that well suited
the requirements of the particular desired application.
AdaBoost is used both to select the features and to train the classifiers. The
AdaBoost learning algorithm is used to boost the classification performance of a simple
learning algorithm (e.g : it might be used to boost the performance of a simple
perceptron). It does this by combining a collection of weak classification functions to
form stronger classifiers.
6
AdaBoost calls a weak classifier repeatedly in a series of rounds t=1 , … ., T .
For each call a distribution of weights Dt is updated that indicates the importance of
examples in the data set for the classification. On each round, the weights of each
incorrectly classified example are increased (or alternatively, the weights of each
correctly classified example are decreased), so that the new classifier focuses more on
those examples. Below is the boosting algorithm for learning a query online. T
hypotheses are constructed each using a single feature. The final hypotheses are a
weighted linear combination of the T hypotheses where the weights are inversely
proportional to the training error.
1 1
- initialize weightsw 1 ,i= , for y i=0,1 respectively, where m and L
2m 2l
are the number of negatives and positives respectively.
- for t =1,…..T:
1. normalize the weights,
wt;i
wt;i
Pn
j=1 wt;jwt;i
w
wt ,i
wt,i← n
∑ wt , j
j=1
w t , ∈ j=∑ w i|h j ( x i) − y i|
i
7
Where e i=0 if example xi is classified correctly,
e i=1 Otherwise, and
ϵt
βt =
1−ϵ t
T T
1
{
h ( x )= 1 ,∧∑ α t ht ( ¿ x )≥
t−1
∑ α ¿ 0 ,∧Otherwise
2 t−1 t
1
Where: α t=log
βt
Historically, for the task of object recognition, working with only image
intensities (i.e. the RGB pixel values at each and every pixel of image) made the task
computationally expensive. A publication by Papageorgiou et al. discussed working
with an alternate feature set instead of the usual image intensities. This feature set
considers rectangular regions of the image and sums up the pixels in this region. The
value this obtained is used to categorize images. For example, let us say we have an
image database with human faces and buildings. It is possible that if the eye and the hair
region of the faces are considered, the sum of the pixels in this region would be quite
high for the human faces and arbitrarily high or low for the buildings.
8
The value for the latter would depend on the structure of the building, its
environment while the values for the former will be more less roughly the same. We
could thus categorize all images whose Haar-like feature in this rectangular region to be
in a certain range of values as one category and those falling out of this range in
another. This might roughly divide the set of images into ones having a lot of faces and
a few buildings and the other having a lot of buildings and a few faces. This procedure
could be iteratively carried out to further divide the image clusters [8] [11][12].
A slightly more complicated feature is to break the rectangular image to either a left
half and a right half or a top half and a bottom half and trace the difference in the sum
of pixels across these halves. This modified feature set is called 2 rectangle features.
The idea of this approach is to partition an image It into segmented region and to
search for each of the segmented regions the corresponding segmented regions in the
successive image It+1 at time t+1.
From his finding, with feature-based matching technique, nearly 90% of the
image area can be matched in an efficient and accurate way. The correlation-based
matching technique, which requires more elaborate processing, is thus only applied to a
small fraction of the image area. It yields very accurate displacement vectors and most
of the time finds the corresponding color regions.[7]
9
2.4 Gradient-Based Technique
A technique chosen for this research is Adabooste Classification and Haar - like
feature. Basically, this technique needs less computation time and is easy to implement
compared to other techniques. Although the correlation-based technique is less affected
by noise and illumination changes, it suffers from high computational complexity by
summations over the entire template. There are some researchers who use gradient-
based technique for detecting a moving object but this technique is relatively sensitive
to local or global illumination changes.
10
CHAPTER 3
METHODOLOGY
The proposed moving object detection system has been implemented on a Intel®
Pentium®M processor725 2.1 GHz PC with 2 GB RAM running on Windows XP. The
off-line images were acquired through a Sony Color CCD camera. Each frame of the
acquired images is converted to the 8 bit grayscale format and finally 1 bit binary
format before undergoing further processing. Each frame is fixed to a size of 256 x 256
pixel.
The EuroCard Picolo frame grabber is utilized in the project setup. The task of the
frame grabber is to convert the electrical signal from the camera into a digital image
that can be processed by a computer. The frame grabber can be programmed to send the
image data without intervention of the central processing unit (CPU) to the memory
(RAM) of the PC where the images are processed.
The entire motion detection and tracking program has been developed using Microsoft
Visual C++ version 6.0, including the GUI (Graphical User Interface) and OpenCV. It
11
is fully integrated editor, compiler and debugger. Hence, it can be used to create a
complex software system.
The table diagram in Table 3.2.1 gives an overview of the main time
schedule in the developed detection system:
N
O WEEK TASK TASK FOR
WEEKS 4: 4-10FEB 2008 RESEARCH ABOUT THE PROCESS
1 IMAGE PROCESSING MUZAMMIL
RESEARCH ABOUT THE
FEATURES OF IMAGE
PROCESSING:
1)FEATURE: DIFERRENCING
TECHNIQUE
2)FEATURE:BASED MATCHING
2 WEEKS 5: 11-17FEB 2008 TECHNIQUE MUZAMMIL
RESEARCH ABOUT THE
FEATURES OF IMAGE
PROCESSING:
1)ADABOOSTE TECNIQUE
3 WEEKS 6: 18-24FEB 2008 2)HAAR-LIKE FEATURES MUZAMMIL
4 WEEKS 7: 25-29FEB 2008 SELECTION TECHNIQUE MUZAMMIL
WEEKS 8: 10-16MARCH
5 2008 SUBMIT SUMMARY MUZAMMIL
WEEKS 9: 17-23MARCH MATCHING OPERATION:
6 2008 FIRST STAGE(RESEARCH) MUZAMMIL
RESEARCH: MATCHING
WEEKS 10: 24-30MARCH OPERATION OF MOTION
7 2008 DETECTION AND TRACKING MUZAMMIL
8 WEEKS 11: 1-6APRIL 2008 PREPARATION PAPERWORK MUZAMMIL
SUBMIT PAPERWORK AND
9 WEEKS 12: 7-13APRIL 2008 PRESENTATION MUZAMMIL
FIND AND IDENTIFY THE
HARDWARE:FIRST
10 WEEKS 13: 14-20APRIL 2008 STAGE(CAMERA) MUZAMMIL
FIND AND IDENTIFY THE
HARDWARE:SECOND
11 WEEKS 14: 21-27APRIL2008 STAGE(SOFTWARE) MUZAMMIL
12 BREAK SEMESTER BREAK
MATCHING OPERATION:
13 WEEKS 1: 14-20 JULY 2008 SECOND STAGE(HARDWARE) MUZAMMIL
12
14 SETUP SOFTWARE:
WEEKS 2: 21-27 JULY 2008 FIRST STAGE(OPEN CV AND C++) MUZAMMIL
SETUP SOFTWARE:
15 WEEKS 3: 28-31 JULY 2008 SECOND STAGE(OPEN CV AND C++) MUZAMMIL
MATCHING OPERATION:
WEEKS 4: THIRD STAGE(SOFTWARE AND
16 4-10 AUGUST 2008 HARDWARE) MUZAMMIL
WEEKS 5: TESTING PROJECT:
17 11-17AUGUST 2008 FIRST STAGE(WARM UP) MUZAMMIL
WEEKS 6: MATCHING OPERATION:
18 18-24AUGUST 2008 FOURTH STAGE(UPGRADE) MUZAMMIL
WEEKS 7: TESTING PROJECT:
19 25-29AUGUST 2008 SECOND STAGE(IN SITUATION) MUZAMMIL
MATCHING OPERATION:
20 WEEKS 8: 8-14 SEPT 2008 FIFTH STAGE(UPGRADE) MUZAMMIL
TESTING PROJECT:
THIRD STAGE(FINAL)
21 WEEKS 9: 15-21 SEPT 2008 RESULT ANALYSIS MUZAMMIL
MATCHING OPERATION:
FINAL STAGE
22 WEEKS 10: 22-28 SEPT 2008 RESULT ANALYSIS MUZAMMIL
23 WEEKS 11: 1-5 OCT 2008 PREPARATION REPORT MUZAMMIL
24 WEEKS 12: 6-12 OCT 2008 PREPARATION PRESENTATION MUZAMMIL
25 WEEKS 13: 13-19 OCT 2008 PRESENTATION MUZAMMIL
26 WEEKS 14: 20-26 OCT2008 SUBMIT REPORT MUZAMMIL
Figure 3.2.1
13
3.3. Gantt Chart.
The table diagram in Figure 3.3.1, 3.3.2, 3.3.3, 3.3.4, 3.3.5, 3.3.6 and
3.3.7 gives an overview of the Gantt-chart for the project:
Figure 3.3.1
Figure 3.3.2
14
Figure 3.3.3
Figure 3.3.4
15
Figure 3.3.5
Figure 3.3.6
16
Figure 3.3.7
17
3.4 Cost Planning
The table diagram in Table 3.4.1 gives an overview of the cost for
developed detection system:
N QUANTITY UNIT
O ITEM DESCRIPTION UNIT PRICE AMOUNT
Table 3.4.1
18
3.5 System Overview
The block diagram in Figure 3.5.1 gives an overview of the main stages
in the developed detection system. This overview is example of detection system
for pedestrian.
Figure 3.5.1
19
3.6 Haar-like Prototypes
Figure 3.6.1
20
3.7 Integral Image
Rectangles features can be computed very rapidly using an intermediate representation
for the image which we call the integral image. The integral image at location x, y
contains the sum of the pixels above and to the left of x,y, inclusive:
ii ( x , y )= ∑ i ( x' , y' );
'
X ≤ x , y' ≤ y
Where ii(x,y) is the integral image and i(x,y) is the original image (see figure 3.6.1)
using the following pair of recurrences:
s ( x , y )=s ( x , y −1 )+ i ( x , y )
ii ( x , y )=ii ( x−1 , y ) +s ( x , y )
(where s(x,y) is the cumulative row sum, s(x,-1) =0, and ii(-1,y) =0) the integral image
can be computed in one pass over the original image. Using the integral image any
rectangular sum can be computed in four array references (see figure 3.6.2). clearly the
difference between the two rectangular sums can be computed in eight references. Since
the two-rectangle features defined above adjacent rectangular sums they can be
computed in six array references, eight in the case of the three-rectangle, and nine for
four rectangle features.
One alternative motivation for the integral image comes from the “boxlets”
work of Simard, et al[6][7][8]. The authors point out that in the case of linear operations
(e.g. f.g), any invertible linear operation can be applied to f or g if its inverse is applied
to the result. For example in the case of convolution, if the derivative operator is applied
both to the image and the kernel the result must then be double integrated:
21
The authors go on to show that convolution can be significantly accelerated if the
derivatives of f and g are sparse ( or can be made so ). A similar insight is that the
invertible linear operation can be applied to f if its inverse as applied to g:
i. r =¿
The integral image is in fact the double integral of the image ( first along rows and then
among column ). The second derivative of the rectangle ( first row and then column )
yields four delta functions at the corners of the rectangle. Evaluation of the second dot
product is accomplished with four array accesses.
22
Each stage was trained using the AdaBoost algorithm. Adaboost is a powerful machine
learning algorithm that can learn a strong classifiers based on a (large) set of weak
classifiers by re-weighting the training samples. The feature-based classifier that best
classifies the weighted training samples is added at each round of boosting. As the stage
number increases the number of weak classifiers, needed to achieve the desired false
alarm rate at the given hit rate, also increases.
The cascade training process involves two types of tradeoff. In most cases, the
classifiers with the most features will achieve higher detection rates and lower false
positive rates. At the same time classifiers with more features require more time to
compute. In principle one could define an optimization framework in which : i) the
number of classifier stage, ii) the number of features in each stage, and iii) the threshold
of each stage, are traded off in order to minimize the expected number of evaluated
features. Unfortunately finding this optimum is a tremendously difficult problem [6][7]
[8][9].
A target is selected for the minimum reduction in false positives and the maximum
decrease in detection. Each stage is trained by adding features until the target detection
and false positives rates are met (these rates are determined by testing the detector on
validation set). Stages are added until the overall target for false positive and detection
rate is met.
23
3.10 Training Process
The training process uses AdaBoost to select a subset of features and construct the
classifier. In each round the learning algorithm chooses from a heterogenous set of
filters, including the appearance filters, the motion direction filters, the motion shear
filters, and the motion magnitude filters.
The AdaBoost algorithm also picks the optimal threshold for each feature as
well as the _ and _ votes of each feature. The output of the AdaBoost learning algorithm
is a classifier that consists of a linear combination of the selected features. For details on
AdaBoost, the reader is referred to [12, 5].
The important aspect of the resulting classifier to note is that it mixes motion
and appearance features. Each round of AdaBoost chooses from the total set of the
various motion and appearance features, the feature with lowest weighted error on the
training examples. The resulting classifier balances intensity and motion information in
order to maximize detection rates.
24
Figure 3.10.1: Cascade architecture.
Figure 4.3.3: Cascade architecture. Input is passed to the first classifier with decides true or
false (pedestrian or not pedestrian). A false determination halts further computation and causes
the detector to return false. A true determination passes the input along to the next classifier in
the cascade. If all classifiers vote true then the input is classified as a true example. If any
classifier votes false then computation halts and the input is classified as false. The cascade
architecture is very efficient because the classifiers with the fewest features are placed at the
beginning of the cascade, minimizing the total required computation.
Viola and Jones [14] showed that a single classifier for face detection would
require too many features and thus be too slow for real time operation. They proposed a
cascade architecture to make the detector extremely efficient (see figure 4.3.1). We use
the same cascade idea for pedestrian detection. Each classifier in the cascade is trained
to achieve very high detection rates, and modest false positive rates. Simpler detectors
(with a small number of features) are placed earlier in the cascade, while complex
detectors (with a large number of features are placed later in the cascade).
Detection in the cascade proceeds from simple to complex. Each stage of the
cascade consists of a classifier trained by the AdaBoost algorithm on the true and false
positives of the previous stage. Given the structure of the cascade, each stage acts to
reduce both the false positive rate and the detection rate of the previous stage. The key
is to reduce the false positive rate more rapidly than the detection rate.
A target is selected for the minimum reduction in false positive rate and the
maximum allowable decrease in detection rate. Each stage is trained by adding features
until the target detection and false positives rates are met on a validation set. Stages of
the cascade are added until the overall target for false positive and detection rate is met.
25
Figure 3.10.2: A small sample of positive training examples. A pair of image patterns comprises
a single example for training.
CHAPTER 4
PROTOTYPE/PRODUCT DEVELOPMENT
26
This chapter gives an overview on the stage of prototype development and
technical material used in developing the entire pedestrian detection system.
Subsequently, the pedestrian detection (coding, diagram, flow chart) modules are
elaborated. Explanation of the pedestrian detection development is focused on Haar-like
features prototype; integral image and classification function with the cascade of
classifiers.
27
Figure 4.1.1: Sample of Training Dataset of Pedestrian
I used six of the sequences to create a training set from which I learned both a
dynamic pedestrian detector and a static pedestrian detector. The other two sequences
28
were used to test the detectors. The dynamic detector was trained on consecutive frame
pairs and the static detector was trained on static patterns and so only uses the
appearance filters described above. The static pedestrian detector uses the same basic
architecture as the face detector described in [14].
Figure 4.3.1: Sample frames from each of the 6 sequences we used for training. The manually
marked boxes over pedestrians are also shown.
Each stage of the cascade is a boosted classifier trained using a set of 5000
positive examples and 5000 negative examples. Each positive training example is a pair
of 18 x 36 pedestrian images taken from two consecutive frames of a video sequence.
Negative examples are similar image pairs which do not contain pedestrians. Positive
examples are shown in figure 4.3.2. During training, all example images are variance
29
normalized to reduce contrast variations. The same variance normalization computation
is performed during testing.
Each classifier in the cascade is trained using the original 5000 positive
examples plus 5000 false positives from the previous stages of the cascade. The
resulting classifier is added to the current cascade to construct a new cascade with a
lower false positive rate. The detection threshold of the newly added classifier is
adjusted so that the false negative rate is very low.
The cascade training algorithm also requires a large set of image pairs to scan
for false positives. These false positives form the negative training examples for the
subsequent stages of the cascade. We use a set of 5000 full image pairs which do not
contain pedestrians for this purpose. Since each full image contains about 50,000
patches of 20x15 pixels, the effective pool of negative patches is larger than 20 million.
The static pedestrian detector is trained in the same way on the same set of images. The
only difference in the training process is the absence of motion information. Instead of
image pairs, training examples consist of static image patches.
CHAPTER 5
RESULT
30
This section demonstrates some of the tested image sequences that are able to
highlight the effectiveness of the proposed detection system. These experimental results
are obtained using the proposed detection algorithm that has been discussed in Chapter
3 and Chapter 4. The proposed algorithm includes Haar-like features and Adaboost
Classification. The system can detect moving objects a 300 x 500 pixel image at 17
frames/s[7][8][9] on 2.0 GHz PC with 2 Giga Byte RAM running on Windows XP. The
output of the system is represented by yellow rectangle, meaning that in the search sub-
window corresponding to the red rectangle the output of the cascade of the classifier
was true, in other words, it as detected the object. Figure 4.1 and 4.2 are the examples of
output.
5.1 Stage
31
rises with its number of features, the images are first classified by classifiers with a
small size (Figure 6.4). In this figure the eficiency of the AdaBoost algorithm is
demonstrated. Due to the arranging of the boosted classifiers according to their size,
only a reduced number of samples reach higher classifiers in the cascade, which are
more expensive to compute.
At the last of the performance evaluation a static, The result in figure 5.2.1 show that
the Number of Classifiers is different and follow the increment of dataset. The reason
for using the different dataset is to prove that the AdaBooste system which is more
dataset will be producing more number of weak classifier. If the Adabooste system has
more numbers of weak classifier, the good performance in the AdaBooste system will
produced.
Training and Test Dataset for character recognition was obtained by running our
pedestrian detector on several hours of video and extracting sequences of images for
each tracked pedestrian features depend on the Dataset used.
32
We can see in figure 5.3.1, it shows that accuracy of Dataset. The 100 data
collection produces the lower performance which is percentage of Hit equal to 11.89%
and produced the higher percentage of Missed, 85.11%. At this performance, our result
obtained by Adabooste system is non-superior to the cascade classifier with respect to
accuracy, while having the lowest performances of Machine learning.
In addition, compared with the 4500 data collection based system, the
performance classifier based system is 89.36% percentages Hits and 10.64% percentage
Missed in detection and tracking, respectively to the law of Adabooste System, while
preserving the higher performances.
We can see in the figure 5.4.1, all the data collection produced the higher percentage
false alarm. Compared the percentages false alarm between 100 data collection and
33
4000 data collection is equal to 5.03% only. This value shows that the classifier have
the problem.
One problem with the classifier returned by AdaBoost is that the threshold is much too
low. This is because the prior probability for a pedestrian is much lower than the prior
probability for a non-pedestrian, but the AdaBoost algorithm assumes they have equal
priors.
Attempts were made to adjust the threshold automatically based on a holdout set, but
not enough negative examples were present to accurately do this. Instead, the threshold
was manually adjusted until the false positive rate was qualitatively low enough. While
the holdout set had a dismal detection rate, the performance on actual images was much
better because many windows overlap each pedestrian, giving the classifier multiple
chances to detect a given pedestrian. The thresholds for the cascade were manually
chosen so that as many negative examples were rejected as possible while still allowing
almost all positive examples though.
34
Figure 5.4.1: Percentage of False Alarm
35
Figure 5.5.2: Correct detection with false alarm
The boosted sum-of-pixel feature technique introduced by Viola and Jones has many
potential uses. One such use would be to introduce it into a particle filter context where
36
the sum-of-pixel classifier could be used to estimate a likelihood. This would enable an
extremely simple parameterization of pixel coordinates, scale, and velocity. This would
also increase the robustness of the Viola and Jones algorithm and make it faster since
full image searches would no longer be necessary. The simplicity of the
parameterization also would have a huge benefit over more complex contour based
parameterizations. Another possible application of this algorithm would be for behavior
classification. For this to work, however, longer-term motion analysis might be
necessary. Perhaps looking at N successive frames instead of 2 would improve
classification performance.
CHAPTER 6
37
I have presented an approach for object detection which minimizes computation time
while achieving high detection accuracy. Preliminary experiments, which will be
described elsewhere, show that highly efficient detectors for other objects, such as
pedestrians, can also be constructed in this way.
This paper brings together new algorithms, representations, and insights which are quite
generic and may well have broader application in computer vision and image
processing. The first contribution is a new a technique for computing a rich set of image
features using the integral image. In order to achieve true scale invariance, almost all
object detection systems must operate on multiple image scales.
The integral image, by eliminating the need to compute a multi-scale image pyramid,
reduces the initial image processing required for object detection significantly. In the
domain of pedestrian detection the advantage is quite dramatic. Using the integral
image, pedestrian detection is completed before an image pyramid can be computed.
While the integral image should also have immediate use for other systems which have
used Harr-like features (such as Papageorgiou et al. [12]), it can foreseeable have
impact on any task where Harr-like features may be of value. Initial experiments have
shown that a similar feature set is also effective for the task of parameter estimation,
where the expression of a pedestrian, the position of a pedestrian, or the pose of a
pedestrian is determined.
The second contribution of this paper is a technique for feature selection based on
AdaBoost. An aggressive and effective technique for feature selection will have impact
on a wide variety of learning tasks. Given an effective tool for feature selection, the
system designer is free to define a very large and very complex set of features as input
for the learning process. The resulting classifier is nevertheless computationally
38
efficient, since only a small number of features need to be evaluated during run time.
Frequently the resulting classifier is also quite simple; within a large set of complex
features it is more likely that a few critical features can be found which capture the
structure of the classification problem in a straightforward fashion.
REFERENCES
39
[1] Syed Abdul Rahman, Phd Thesis: “Moving Object Feature Detector &
Extractor Using a Novel Hybrid Technique”, University of Bradford, Bradford,
UK.1997.
[2] Wan Ayub Bin Wan Ahmad, Master Thesis: “Menjejaki Objek Yang Bergerak
Dalam Satu Jujukan Imej”, Universiti Teknologi Malaysia, Skudai,
Malaysia.2002.
[3] Yeoh Phaik Yong, Master Draft Thesis: “Integration of Projection Histograms
For Real Time Tracking Of Moving Object”, Universiti Teknologi Malaysia,
Skudai, Malaysia.2003.
[4] Bernd Heisele, W.Ritter and U.Krebel, “Tracking Non-Rigid, Moving Objects
Based on Color Cluster Flow”. Proc. Computer Vision and Pattern Recognition,
pages 253-257, San Juan, 1997
[5] Bernd Heisele, “Motion-Based Object Detection and Tracking in Color Image
Sequences”. Fourth Asian Conference on Computer Vision, pages 1028-1033,
Taipei, 2000
[7] Yoav Rosenberg and Michael Werman, “Real-Time Object Tracking from a
Moving Video Camera: A Software Approach on a PC”. 4th IEEE Workshop
on Applications of Computer Vision (WACV'98), New Jersey, 1998
[8] Paul Viola and Michael Jones“Rapid Object Detection using a Boosted Cascade
of simple Features”, Accepted Conference On Computer Vision and Pattern
Recognition, 2001.
[9] Goncalo Monteiro, Paulo Peixoto, and Urbano Nunes, “Vision – Based
Pedestrian Detection Haar-like Features”. Institute of Systems and Robotics
University of Coimbra Polo II,Portugal, 2002
[10] Paul Viola and Michael Jones “Robust Real –Time Object Detection”, Second
International Workshop on Statistical and Computational Theories of Vision-
Modeling, Learning, Computing and Sampling, Canada, July2001.
40
[12] Papageorgiou, Oren and Poggio, "a general framework for object detection",
International Conference on Computer Vision, (1998).
[13] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for
rapid scene analysis. IEEE Patt. Anal. Mach. Intell., 20(11):1254–1259,
November 1998.
[14] Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and tree
classifiers,1997.
[17] GeoffreyW. Gates. The reduced nearest neighbor rule. IEEE Transactions on
Information Theory, pages 431–433, 1972.
[18] Peter E. Hart. The condensed nearest neighbor rule. IEEE Transactions on
Information Theory, IT-14:515–516,May 1968.
[19] Robert C. Holte. Very simple classification rules perform well on most
commonly used datasets. Machine Learning, 11(1):63–91, 1993.
[21] Michael Kearns and Yishay Mansour. On the boosting ability of top-down
decision tree learning algorithms. In Proceedings of the Twenty-EighthAnnual
ACM Symposium on the Theory of Computing, 1996.
[22] Eun Bae Kong and Thomas G. Dietterich. Error-correcting output coding
corrects bias and variance. In Proceedings of the Twelfth
InternationalConference on Machine Learning, pages 313–321, 1995.
[23] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann,
1993.
41
National Conference on Artificial Intelligence, 1996.
[26] Patrice Simard, Yann Le Cun, and John Denker. Efficient pattern recognition
using a new transformation distance. In Advances in Neural Information
Processing Systems, volume 5, pages 50–58, 1993.
APPENDIX
A-1:Coding
42
#ifndef _EiC
#include "cv.h"
#include "highgui.h"
//#include "highguid.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <math.h>
#include <float.h>
#include <limits.h>
#include <time.h>
#include <ctype.h>
#endif
#ifdef _EiC
#define WIN32
#endif
43
return -1;
/*input_name = argc > 1 ? argv[1] : 0;*/
}
if( !cascade )
{
fprintf( stderr, "ERROR: Could not load classifier cascade\n" );
return -1;
}
storage = cvCreateMemStorage(0);
cvNamedWindow( "result", 1 );
if( capture )
{
for(;;)
{
if( !cvGrabFrame( capture ))
break;
frame = cvRetrieveFrame( capture );
if( !frame )
break;
if( !frame_copy )
frame_copy = cvCreateImage( cvSize(frame->width,frame->height),
IPL_DEPTH_8U, frame->nChannels );
if( frame->origin == IPL_ORIGIN_TL )
cvCopy( frame, frame_copy, 0 );
else
cvFlip( frame, frame_copy, 0 );
detect_and_draw( frame_copy );
44
cvReleaseImage( &frame_copy );
cvReleaseCapture( &capture );
}
else
{
const char* filename = input_name ? input_name : (char*)"lena.jpg";
IplImage* image = cvLoadImage( filename, 1 );
if( image )
{
detect_and_draw( image );
cvWaitKey(0);
cvReleaseImage( &image );
}
else
{
/* assume it is a text file containing the
list of the image filenames to be processed - one per line */
FILE* f = fopen( filename, "rt" );
if( f )
{
char buf[1000+1];
while( fgets( buf, 1000, f ) )
{
int len = (int)strlen(buf);
while( len > 0 && isspace(buf[len-1]) )
len--;
buf[len] = '\0';
image = cvLoadImage( buf, 1 );
if( image )
{
detect_and_draw( image );
cvWaitKey(0);
cvReleaseImage( &image );
}
}
fclose(f);
}
}
cvDestroyWindow("result");
return 0;
}
45
void detect_and_draw( IplImage* img )
{
int scale = 1;
IplImage* temp = cvCreateImage( cvSize(img->width/scale,img->height/scale), 8,
3 );
CvPoint pt1, pt2;
int i;
if( cascade )
{
//CvSeq* faces = cvHaarDetectObjects( img, cascade, storage,
// 1.1, 2, CV_HAAR_DO_CANNY_PRUNING,
// cvSize(40, 40) );
CvSeq* faces = cvHaarDetectObjects( img, cascade, storage,
1.1, 2, CV_HAAR_DO_CANNY_PRUNING,
cvSize(40, 40) );
#ifdef _EiC
main(1,"facedetect.c");
#endif
46