Вы находитесь на странице: 1из 5

Vehicle detection combining gradient analysis and

AdaBoost classification
Ayoub Khammari, Etienne Lacroix, Fawzi Nashashibi, Claude Laurgeau
Robotics Center
Ecole des Mines de Paris
60, Bd. Saint-Michel, 75272 Paris Cedex06, France
{khammari,lacroix,nashashibi,laurgeau}@ensmp.fr

Abstract – This paper presents a real-time vision-based projective invariant and vanishing lines to derive the motion
vehicle detection system using gradient based methods and constraint of the ground plane and the surface plane of the
Adaboost classification. Our detection algorithm consists of vehicles.
two main steps : gradient driven hypothesis generation and Target validation approaches can be classified mainly
appearance based hypothesis verification. In the hypothesis
into two categories: (1) template based, and (2) appearance-
generation step, possible target locations are hypothesized.
This step uses an adaptive range-dependant threshold and based. Template-based methods use predefined patterns of
symmetry for gradient maxima localization. Appearance-based the vehicle class and perform a correlation between an input
hypothesis validation verifies those hypothesis using AdaBoost image and the template. Betke et al. [4] proposed a multiple-
for classification with illumination independent classifiers. The vehicle detection approach using deformable gray-scale
monocular system was tested under different traffic scenarios template matching. In [5], a deformable model is formed
(e.g., simply structured highway, complex urban street, from manually sampled data using Principal Component
varying lightening conditions), illustrating good performance. Analysis (PCA). Both the structure and pose of a vehicle
can be recovered by fitting the PCA model to the image.
Index Terms – intelligent vehicles, vehicle detection and Appearance-based methods acquire the characteristics
tracking, gradient, AdaBoost classification.
of the vehicle class from a set of training images which
I. INTRODUCTION capture the variability in vehicle appearance. Usually, the
variability of the non-vehicle class is also modeled to
Intelligent driver assistance is an area of active research improve performance. First, each training image is
among automotive manufacturers, suppliers and represented by a set of local or global features. Then, the
Universities with the aim of reducing injury and accident decision boundary between the vehicle and non-vehicle
severity. The ability to process sensing data from multiple classes are learned either by training a classifier (e.g.,
sources (radar, camera, and wireless communication) and to Neural Network (NN)) or by modeling the probability
determine the appropriate actions (belt-pretensioning, airbag distribution of the features in each class (e.g., using the
deployment, brake-assistance,…) forms the basis of this Bayes rule assuming Gaussian distributions). In Matthews et
research and is essential in the development of active and al. [6], feature extraction is based on PCA. Goerick et al. [7]
passive safety systems. Monocular vision based vehicle used a method called Local Orientation Coding (LOC) to
detection systems are particularly interesting for their low extract edge information. The histogram of LOC within the
cost and for the high-fidelity information they give about area of interest was then fed to a NN for classification. In
the driver environment. Detection is a two step process. In [8] Wavelet transform was used.
the first step, all regions in the image plane that potentially The challenges of a monocular visual system are
contain a vehicle are identified. This is what we call “Target twofold. First, the system lacks the depth clues used for
generation”. In the next step, the selected regions are target segmentation and instead pattern recognition
validated and tracked in time, which is then designated by techniques must be heavily relied on to compensate for the
the “Targets validation”. lack of depth. The question that arises therefore is whether
Various monocular target generation approaches have pattern recognition can be sufficiently robust to meet the
been suggested in literature and can be divided in two stringent detection accuracy requirements for a serial
categories: (1) knowledge-based, (2) motion-based. production product? Hence the focus of this paper.
Knowledge-based methods employ information about Moreover, once the target is detected, the system must be
vehicle shape and color as well as general information about able to get the accuracy required for ACC (Adaptive Cruise
streets, roads, and freeways. A good synthesis of these Control) applications while measuring the obstacle’s vehicle
different clues such as shadow, edges, entropy and distance and velocity.
symmetry is given in [1]. Motion-based methods detect We have built a monocular visual processing system
vehicles and obstacles using optical flow[2]. Generating a that follows the 2 step paradigm described earlier. The
displacement vector for each pixel , however, is time- system that will be presented in this paper will be organized
consuming and thus impractical for a real-time system. as follows: after a description of the gradient-driven
Moreover it works well only under important relative hypothesis generation phase, we will detail the appearance-
motion situations such as for passing vehicles. Okada et al based hypothesis verification using AdaBoost. Then, we
[3] proposed another motion-based approach that uses
will show some experimental results and issues followed by B. Temporal filtering
a conclusion and future works. The idea behind this step is to eliminate some basic
false detections due to road irregularities or inert shadows.
II. GRADIENT BASED TARGET GENERATION
This will make the work of the validation module easier and
The goal of this module consists in initializing the less time-consuming. For this sake, we evaluate the
detection system with possible locations of vehicles in the temporal presence of each binary object as described in
scene. It does not matter if it generates false detections. Fig.2.
However this module must run very quickly since it is The temporal presence of each binary object of the
executed at each frame and explores a relatively large intersection image is incremented. As long as this duration
region of interest. In addition, it must detect, at least once, a is less than 200 ms, which represents 1/10 of the 2 sec
vehicle as soon as it appears in the scene. For this sake, safety distance norm to respect, the binary object will not be
shadows underneath vehicles are considered as one of the considered in the next steps.
most significant clues indicating an obstacle presence. In
C. Improving the obstacle’s bounding box
our approach, we tried to find a more general clue based on
We have obtained in the previous steps an accurate v-
the negative horizontal gradient, due to shadows, wheels
localization of vehicle candidates. However the u-
and bumpers found in the bottom rear view of a vehicle.
localization needs to be improved. In fact, solid-drop
To reduce the computation time due to the scene
shadows are not usually located under the obstacle vehicle.
investigation, we start by applying a 3 level gaussian
It depends on day-time and the corresponding sun position.
pyramid filter as described in [8]. Then the hypotheses is
To deal with this, we use two known clues : vertical edges
generated at the lowest level of detail (the 3rd level : 128 x
and symmetry.
96 pixels) which is very useful since this level contains only
For each binary object, we define a ROI (Region Of
the most salient structural features so that candidate vehicle
Interest) which is supposed to include the vehicle candidate.
locations are easier to get.
The height of this ROI is defined respecting a standard
A. Local gradient maxima detection predefined vehicle height to width proportion. Then the
After applying a horizontal sobel filter on the 3rd level cumulated vertical gradient histogram is computed for this
of the gaussian pyramid (see Fig. 1.a), we have to detect ROI. The two most significant peaks, respecting the
local gradient maxima which will help us locate vehicle perspective projection constraints under the assumption of a
candidates. This step is the most important one in the flat road, are considered as two vertical edges of the vehicle.
targets generation stage. All the next operations depend on At the same time, we compute the intensity-base symmetry
its results. If this step misses a maxima, it will be difficult to coefficient and axes for the original ROI using the method
find in the next steps. Therefore, particular attention must be described in [1], but on the first level of the pyramid where
given to this point. We have developed a special adaptive the resolution is the highest. This helps us improve the
threshold that operates on the gradient image. Three obstacle’s bounding box in at least two cases :
different sizes were used for the filter to deal with different If the horizontal gradient analysis gives us just a
vehicles’ width in the image (near, mid-range, distant cars). part of a vehicle candidate, the symmetry axis will
The size is approximately twice the width of the vehicle for help us to retrieve the missing part.
a known v-position in the image. For each pixel of the If the yaw angle of the target estimated in the host
image, we get the maximum and minimum intensity value vehicle related reference is significant
and calculate their mean. Pixels with a gradient intensity (curves, …), the width of the vehicle candidate will
higher than the threshold are then retained for gradient often be over-estimated as we can see on Fig. 3.
maxima as shown on Fig. 1.b
IV. APPEARANCE-BASED TARGET VALIDATION AND
The binary image is then labeled. For each object, we
TRACKING
extract the longest horizontal segment. It should be noted
that due to the complexity of the scenes, some false maxima Verifying a hypothesis is essentially a two-class pattern
are expected to be found. We use some heuristics and classification problem: vehicle vs non-vehicle. To deal with
constraints to suppress them using perspective projection this problem, the most common method found in literature
constraints under the assumption of a flat road. is based on Wavelet transform for feature extraction and
SVM for classification [8] and [9]. Another classification
method, Adaboost, described in [10] show satisfactory
results in the case of pedestrian detection [11] and
classification [12]. The
(a) (b)
Fig. 1 (a) Gradient image, (b) Maxima detection

Fig. 3 Bounding box improvement using symmetry; the bounding box


Fig. 2 Temporal filtering process given by the gradient analysis is shown in blue. The symmetry one is
indicated in red
major advantage of this method lies in the fact that We collected a total of 1500 vehicle subimages
classification criteria are generated automatically. That is (positive samples) and 11000 non-vehicle subimages
why we decided to adapt this technique to a vehicle (negative samples) that were divided into the training set
classification case. We use Adaboost/GA [10] with the and the validation according to a (2/3, 1/3) proportion. We
illumination-independent classifiers [13]. This approach will run AdaBoost/GA with 2000 weak classifiers. As one can
be detailed in next sections. see in Fig. 5, the classification error in the training set
decreases considerably once 150 classifiers are reached,
A. Classification Algorithm
while we need about 1000 classifiers to get an error less
Boosting consists of linearly combining a set of weak
than 10-2 in the validation set.
classifiers to obtain a strong one.
With 1000 classifiers, the false detection rate is about
In our case, we use the weak classifiers described in [9]
0.01. That means that only 1 vehicle subimage out of 100 is
composed of two sets of control points : x1…xn and, y1..y, a
not correctly identified as a vehicle. However non detection
threshold T and the scale on which it operates. An example
rate is about 0.4, which seems high. This is due to the
is shown in Fig. 4. The classifier answers “yes” if the
severity of the classification algorithm which is sensitive to
following condition is verified for a given scale :
the vehicle position in the subimage. It must be centered and
∀ n ∈[1..N] , ∀ m∈[1..M] ||val(xn)-val(ym)|| > T T∈ℜ+
have the right proportion. This difficulty will be overcome
AdaBoost iteratively takes the best simple classifier it in the target validation phase.
could find at each step, and adds it to its final set of
classifiers while calculating its coefficient. Note that at each
round of the algorithm, it tries to find the best classifier for
the learning examples which have been least treated so far.
Obviously, choosing the best simple classifier at each step
of AdaBoost cannot be done by testing all the possibilities.
Therefore, a genetic-like algorithm is used [12], which starts
with a set of random simple classifiers and iteratively
improves them while using the following mutations :
changing the control points’ number and positions, the
threshold and the scale.
The genetic-like algorithm maintains a set of simple
classifiers which are initialised as random ones. During each Fig. 5 Classification error as a function of the number of weak
step of the algorithm, a new ”generation” of simple classifiers.
classifiers is produced by applying the 4 types of mutations
on each of the simple classifiers. All 4 mutations are tested Fig. 6 shows the classification score histogram over the
and the one with the lowest error may replace the ”parent” if validation set for different classifiers’ number. One can
it has a lower error. In addition, some random simple notice that the higher the classifiers’ number is, the higher
classifiers are added at each step. the distance is between negatives and positives, so that we
B. Off-line learning process can dissociate them easily. However, the computation time
The performance of the real-time target validation increases with the number of classifiers. This is why their
module depends on the off-line learning AdaBoost process. number must be chosen carefully. We can see that 500
To ensure a good variety of data in each session, the images classifiers seems to offer a good compromise. The logical
used in the off-line training were taken on different day score threshold to use is 0.5. The next section will describe
times, as well as on different highways and urban scenes. the validation process using this threshold.
The training set contains subimages of 48x36 pixels of rear
vehicle views and non-vehicles which were extracted semi-
automatically using the SEVILLE* (SEmi-automatic VIsuaL
Learning) software developed in our laboratory by Y.
Abramson based on his research in collaboration with Y.
Freund from Columbia University. This software offers a
method for fast collection of high-quality training data for
visual object detection. (a) (b)

(c) (d)
Fig. 4 Examples of control points of a weak classifier.

*
For more information about SEVILLE, visit :
http://caor.ensmp.fr/~abramson/seville
(e) (f)
Fig. 6 Classification score histograms for different classifiers’ numbers

C. Target validation
Once the off-line training done, we get a set of
satisfactory classifiers that we use to identify the ROIs Fig.8 Tracking diagram
given by the hypothesis generation step. Each ROI is resized
to match the training subimages then its AdaBoost score is
calculated. If it is more than the score threshold, the ROI
will be validated. Otherwise, we search in the proximity
looking for a subimage having the required score. If not
found, we assume that it is a false detection.
To speed up this process, we take into consideration the
temporal factor as described on the following diagram
reported on Fig.7.
Fig.9 Similarity criteria between 2 given ROIs
Ideally for a detected vehicle, the AdaBoost score will
be computed only once as soon as it appears in the scene. V. EXPERIMENTS AND RESULTS
Then as long as it is present, it will just be added to the final
The prototype vehicle we used for this application is
targets without validation.
equipped with a long-range radar, a 2D laser scanner, 4
D. Tracking digital color CCD cameras, a trimble DGPS receiver, a FOG
The goal of this step is first of all to eliminate non Crossbow inertial sensor and odometers. Vehicle
detections that could occur when the hypothesis generation information is transmitted via a CAN bus. We also use a
module fails and secondly, to be able to identify the same Navtech GIS for map matching and geo-localization. All
vehicle over time to evaluate its distance and relative speed sensor information is synchronized using the RTMAPS †
if needed. We developed an approach similar to SVT system which is a real-time framework for prototyping
(Support Vector Tracking), described in [14], but multi-sensor automotive applications [15]. This system was
substituted SVM by AdaBoost. Operations for a given ROI developed in our laboratory and is currently installed in the
are described on figure Fig.8. prototype vehicle.
For each ROI of the t-1 frame, we check if it was The video stream was acquired from the frontal camera
already associated to a ROI of the t frame according to the mounted near the rear-view mirror with a 50° field of view
similarity criterion S1(see Fig.9). If not, we check if it is and a dynamic range of 50dB.
similar to a t frame’s ROI using the S2 criterion(see Fig.9). In order to evaluate the performance of our detection
In neither case, we check its surroundings to find a system, tests were conducted under both simulation and
subimage giving a good score in case the target was lost by real world conditions. Using RTMAPS, we recorded
the hypothesis generation stage. If we find nothing and the different scenarios including highway, rural and urban
target lies in low vanishing probability location in the scene scenes at different times of day. Fig.10 shows some
(middle of the road just in front of the host vehicle), it is just representative results under various conditions. We also
reinserted in the place giving the best score. installed the system on our host vehicle and conducted real-
time tests. We were able to achieve, for a 100km/h speed, a
frame rate of approximately 10 frames per second using a
standard PC machine (Bi-Xeon 1 GHz) and without
performing specific software optimisations. This system was
also experimented on in the context of the ARCOS project
on different ACC scenarios and showed good results. The
systems sometimes failed to detect vehicles situated beyond
60 m. This is predictable since the hypothesis generation is
performed in low resolution.
Fig.7 Target validation diagram There are no parameters to tune in our system. Tt is
completely autonomous, which makes it suitable for
hardware implementation that could makes it less time
consuming and suitable for a serial production product.

† RT
M@ps is a product of Intempora Inc.
[3] R. Okada et al, “Obstacle detection using projective invariant and
vanishing lines”, Proceedings of the 9th ICCV 2003.
[4] M. Betke, E. Haritaglu and L. Davis, “Multiple vehicle detection
andand tracking in hard real time,” IEEE Intelligent
VehiclesSymposium, pp. 351–356, 1996.
[5] J. Ferryman, A. Worrall, G. Sullivan, and K. Baker, “A generic
deformable model for vehicle recognition,” Proceedings of British
Machine Vision Conference, pp. 127–136, 1995.
[6] N. Matthews, P. An, D. Charnley, and C. Harris, “Vehicle detection
and recognition in greyscale imagery,” Control Engineering Practice,
(a) (b)
vol. 4, pp. 473–479, 1996.
[7] C. Goerick, N. Detlev and M.Werner, “Artificial neural networks in
real-time car detection and tracking applications,” Pattern Recognition
Letters, vol. 17, pp. 335–343, 1996.
[8] Z. Sun, R. Miller, G. Bebis and D. Dimeo, “A real-time precrash
vehicle detection,” IEEE Intelligent Vehicles Symposium 2000.
Dearborn, MI, USA.
[9] S. Avidan, “Subset selection for efficient SVM tracking,” Computer
Vision and Pattern Recognition, June 2003.
[10]Y. Freund and R. Schapire. “A decision-theoretic generalization
(c) (d) of on-line learning and an application to boosting,”. Journal of
Computer and System Sciences, 55(1):119–139, 1997
[11]P. Viola, M. Jones, “Rapid object detection using a boosted cascade of
simple features,” Conf. CVPR 2001. Dearborn, MI, USA
[12]Y. Abramson and B. Steux, “Hardware-friendly detection of
pedestrians from an on-board camera,” IEEE Intelligent Vehicles
Symposium. Parma, Italy, June 2004
[13]Y. Abramson and B. Steux, “Illumination-independent pedestrian
detection in real-time,” unpublished, submitted to Conf. CVPR 2005.
[14]S. Avidan, “Support Vector Tracking,” Computer Vision and Pattern
Recognition, Dec 2001.
(e) (f)
[15]F. Nashashibi et al, “RT-MAPS: a framework for prototyping
Fig.10 Vehicle detection examples in different situations : (a) a
automotive multi-sensor applications,” Mobile Robots, vol. 8, no. 2, pp.
highway scene with low traffic, (b) a highway scene with high traffic,
520-531, March 2001.
(c) an urban scene with low traffic, (d) an urban scene with low
traffic, (e) bad lightening conditions, (f) a tunnel scene

IV. CONCLUSIONS AND FUTURE WORK


We have presented a system that uses a single frontal
camera for vehicle detection. Experimental results show that
this system is capable of detecting vehicles, except very
distant vehicles. It works in real-time conditions and can
achieve a high reliability target detection with low false
positive rate in demanding situations such as complex urban
environments.
For future works, we plan to explore in detail the
influence of camera characteristics on detection results to
find the ideal “camera” to use. We will also continue
increasing our training set to improve the AdaBoost
classification. Furthermore, we will focus on ACC
application needs such as precisely evaluating obstacle
distance, velocity and TTC (Time To Collision).
ACKNOWLEDGMENT
This work is sponsored by the ARCOS French research
project which aims at improving the safety on roads. It
involves automotive industrials and ITS research
laboratories in France. For further information, refer to :
www.arcos2004.com. The authors would like to thank Y.
Abramson for his useful comments and help on the
AdaBoost classification.

REFERENCES

[1] M. B. Van Leeuwen, “Vehicle detection with a mobile camera,”


Technical report, Computer Science Institute, University of Amsterdam,
The Netherlands, October 2001.
[2] A. Giachetti, M. Campini, and V. Torre, “The use of optical flow for
road navigation,” 1998.

Вам также может понравиться