Вы находитесь на странице: 1из 6

2013 IEEE Intelligent Vehicles Symposium (IV) June 23-26, 2013, Gold Coast, Australia

Design and implementation of a high performance pedestrian detection


Antonio Prioletti1 , Paolo Grisleri2 , Mohan M. Trivedi3 and Alberto Broggi4

Abstract Research on pedestrian detection system still presents a lot of space for improvements, both on speed and detection accuracy. This paper presents a full implementation of a pedestrian detection system, using a part-based classication for the candidates identication and a feature based tracking for increasing the result robustness. The novelty of this approach relies on the use of part-based approach with a combination of Haar-cascade and HOG-SVM. Tests have been conducted using standard datasets showing results aligned with those of the other state-of-the-art systems available in literature. Real world tests also show high speed performance.

I. INTRODUCTION Pedestrian detection is a research eld where signicant improvements are still possible despite its importance in the safety systems arena. Due to the extreme variability of the targets in terms of poses, trajectories, dresses, and carried objects the recognition cannot succeeded in every weather and illumination condition. Vehicle ego motion also complicates things, especially at high speeds, where the benet of the tracking are reduced. Also cluttered backgrounds contribute to degrade the recognition due to higher probability of getting a false positive. High level functions such as warnings or emergency braking systems needs to be triggered as soon as dangerous situations occur. Therefore pedestrian detection shall occur in the shortest possible time. Pedestrian trajectory also shall be taken into account, this is typically done by the high level and tracking function, since a dangerous situation. Pedestrian detection systems certainly deserve a special place among the obstacle detectors, being able to detect pedestrians allows to activate additional protection systems for the vulnerable users outside the vehicle, such as additional airbags or warnings. This paper is laid out like this: an overview of related works will be provide in section II. The implemented system will be described in section III, with a detailed description of the algorithms stages in sections III-A, III-C, III-D and III-E. A thorough set of experiments to build the nal system follows in section IV, investigating the impact of parameters involved such as the braking time. Section V presents a nal evaluation of the performance, compared to the state-of-the-art of vision based detectors. II. RELATED WORKS Pedestrian safety embraces various techniques as described in [1]: from passive solution such as car design, to active
1 A. Prioletti is a post-graduate student at Vislab, University of Parma, Italy. E-mail: antonio.prioletti@studenti.unipr.it 2 P. Grisleri is a researcher at Vislab at University of Parma, Italy. 3 M. M. Trivedi is head of the CVRR lab at University of California, San Diego 4 A. Broggi is head of Vislab at University of Parma, Italy.

solutions based on pedestrian detection. Furthermore, several kind of sensors can be used, ranging from ordinary and infrared cameras to RADAR and LIDAR [2], [3]. A combination of T.O.F. sensor with a camera is used in [4], where a scenario-driven is used to search approach looking only for pedestrian in relevant areas. A detailed description on protection system can be found in [5][9]. A fast pedestrian detector is based on a Haar-cascade [10] allowing to delete in the rst stage the most false positive but, the strong dependence on the pedestrians appearance, makes the algorithm not very robust. Other approaches [11], [12] use the HOG-SVM provides a very robust system in terms of false positives but, with a performance degradation, in terms of detection rate. Combining them, it is possible to obtain a fast system without renouncing robustness. Many other features and learning algorithms can be also used, such as edgelets [13], variations of gradient maps, or simple intensity images. A new approach, gaining interest from the research community, is to decompose the pedestrians shape in multiple parts to increase the robustness of the system (mainly for tolerance to occlusions). Various works explain the bodypart approach, using various types of features and different kinds of environments: from interfering object detection used in [14] to Bayesian combination of edgelets part detectors by Nevatia et al. [13] or a deformable part model by Felzenszwalb et al. [15]. Several dataset for pedestrian detection system are publicly available. In addiction to the well-known MIT [10] and INRIA [11] dataset a much larger and comprehensive dataset as Daimler Detection Bechmark (DaimlerDB) [6] and Calthech [16] have been put forth. III. SYSTEM OVERVIEW A system including a novel part based pedestrian detection high performance approach with a combination of Haarcascade and HOG-SVM, together with a features-based tracking, will be described in the following of this paper. Considering a simple on board-camera already present in many new high-end cars, a monocular vision system has been implemented. The system has been congured by providing the possibility to vary the image size and change, consequently, the processing time; the parameters related to image size will be automatically adjusted to ensure the correct behavior of the algorithm. The system is built on a part-based, stage detection approach, which was rst put forth in [17] decomposing the system in a detection stage, verication stage (where it was introduce a part-based approach) and, a nal combined verication stage.

978-1-4673-2754-1/13/$31.00 2013 Crown

1398

Part verication stage

Full-body verication Detection stage

Tracking stage

Haar-cascade for detection

Lower-body verication

Combined verication

Features calculation

Output pedestrians Features matcher

Past features Upper-body verication

Fig. 1. The owchart of the part based pedestrian detection high performance system described in this paper. Output of detection stage and following stages are bounding boxes.

Since missing detections are more likely than false positives due to the high robustness of the part-based system, a feature based tracking was introduced with the aim of counteracting a possible missing detection in two subsequent frames and of increasing the number of true positives and eliminate false positives appearing in single frame. A owchart of the system is shown in Fig.1. The system has been developed on the latest version of the GOLD [18] software, a prototyping software platform, allowing its optimization in the laboratory and the following porting on a real hardware platform: BRAiVE [19]. The platform is equipped with 10 cameras, and one of these, looking forward, has been physically connected to the application. The camera has a rewire interface which is connected to an adapter in an industrial PC located in the trunk. The PC is equipped with a Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz, and 8GB of DDR2 RAM. A. Detection stage The weak classier used is Adaboost. The relationship used between between k and the weak classier is reported [20]. The structure of a cascade classier allows to speedup considerably the system ltering the most of pedestrians candidates in the rst stage. The variance of false positives is closely linked to the classiers stages, here denoted as k . The choice of the stages number is driven by the trade-off between the number of detection windows and, the quality of these windows. Noteworthy is attributable to the type of data sets used: using the old INRIA dataset, containing human in general and not specically pedestrians, the system results less robust than when it was used the more accurate DaimlerDB dataset. A detailed analysis of the impact of this factor will be illustrated in chapter IV. The cut-off performance criteria used for each cascade stage is correct detections versus false positives. B. Candidate ltering The bounding boxes of candidate pedestrians obtained in the detection stage (an example can be seen in Fig. 2) are ltered and then passed to the verication stage. Filtering is applied to eliminate pedestrians candidate whose dimensions do not respect the human aspect ratio, removing people

Fig. 2. Detection stage output. Several false positives are contained but these will be removed in the verication stage.

Fig. 3.

Example of part boundaries for the 2 and 3-part verication.

with size outside of a selected range [1.45 2.20]m. Using the Inverse Perspective mappings(IPM) technique [21], [22] it is possible to know the pedestrian height in the world knowing its height in image coordinates, using the at road assumption. C. Part verication stage As previously described, a part-based approach has been introduced in this phase, evaluating two different compositions of body parts (shown in Fig.3): full body, upper body and lower body; full body, head, torso and legs.

1399

Dense HOG descriptors, by integral images to speed up the system, are calculated for each part of the body and passed to the corresponding SVM. Two different types of SVM were tested, a linear SVM and a non-linear SVM. Each one have two variants: binary, providing only the classication (pedestrian or non-pedestrian) of the element and, regression, [23] that provides the estimated function value. The use of different datasets has also been analyzed at this stage. D. Combined verication stage A combination of the three values returned by the verication stage is need to classify the detection window; two different approaches have been implemented: a majority vote approach, where at least two out of three classier must label the window as pedestrian. a regression output classication approach, where the three oat values coming from the detection stage are used to train a new classier. Several types of classiers were tested: a linear SVM, a non-linear SVM, and a Bayesian classier [24]; in the results section the different performances of each one will be shown. The majority vote approach allows, unlike regression output classication approach, to detect pedestrians partially occluded but, with an higher false positive rate. Occluded pedestrians are only detected using tracking. This decision has been taken in order to keep a low false detection rate. Training the last SVM including occluded pedestrians in the training dataset could provide similar results but decreasing the number of false positives. E. Tracking stage To counteract the high selectivity of part-based approach, a feature-based tracking system (Fig.1) has been introduced exploiting the features translation and light invariance. A set of features described in [25] based on multiple local convolution, key point and descriptor, are extracted from two different images: horizontal and vertical gradients. One of the features-based tracking system downsides is the strict dependence on the vehicle ego-motion. High vehicle speed, in conjunction with a low frame-rate, means large differences between the same pedestrian framed in consecutive frames and, therefore, higher complexity for the matching phase. The matching is exploited using the functions offered by the feature library. To cope with this problem a higher frame rate can be used, reducing the variability among consecutive frames. The tracker considers a pedestrian candidate as true pedestrian and, inserts him in the tracking system, if it is detected for at least 250 milliseconds by the SVM. The pedestrian will be matched with new pedestrian candidates in the following frames using the features extracted from the images and then updating its position. In the scenario of a detection missed by the SVM, a feature matching with the entire image will be done introducing a ghost pedestrian when a match is found. It will be updated up to 0.5 s. When the position is not matched for 0.5s the target is removed from the tracking system. A further drawback could be the impact of the background

pixels in the matching: if their incidence is greater than that of the foreground pixels, the update of ghost pedestrians would be wrong. However, we can omit this scenario since in the real-world experiments the incidence was very low and decreased with the movement of pedestrians. When running on hardware the system runs at 15Hz. IV. EXPERIMENTAL ANALYSIS A thorough evaluation of the different setups parameters is done in this section, searching the best combination for the nal system. The training set has been choosed selecting 24000 images from the Dimler database Training Set, which comprehensively counts 26000 images; the remaining 2000 images of this database has been used for the parameters optimization. The nal test has been performed using the Dimler database Test Set, which is not included in the Training Set. The experiments are laid out like this: 1) the best training dataset is determined and, will be searched the optimal value of k in the detection stage; 2) the signicance of each body part and their combination is evaluated comparing it, also, with a full body system of Geismann at al. [12]; the comparisono between this paper, which corresponds to our one based on one part, and our system is shown in gure 5. 3) the various approaches described in III-C are analyzed; 4) nally, the system speed is tested and the time is broken down into individual stages. To compare each result with the ground truth will be used the PASCAL VOC: a detection is consider correct if the overlap between the area of ground truth bounding box and the area of the bounding box obtained is more than 50%. The results are shown using an ROC curves, created by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs. the fraction of false positives out of the negatives (FPR = false positive rate), at various threshold settings. A. Detection stage The best training dataset and the best choice of k are determined. Several dataset for the Haar-cascade training has been tested: 2400 DaimlerDB images, 2400 INRIA images, 6000 images composed of 2400 INRIA and 3600 DaimlerDB imaFges and, nally, 10000 DaimlerDB images. Fig.4 shows the large difference between the system trained with the INRIA dataset and the DaimlerDB dataset, crediting the excessive generality of INRIA at the expense of specicity of DaimlerDB. The choice of k is driven by the tradeoff between false positives and true positives: the most of false positive will be removed in the part verication stage whereas here is fundamental do not remove any true positive. Fig.4 shows, also, how a smaller value of k means an higher detection rate but, at some point, the bounding boxes quality provided by the detection stages degrade to a level where the verication stage only verify a few candidates. This denes a lower limit

1400

Fig. 4. Comparison of different training sets for the detection stage. As described in IV-A, 13 is the best number of stages. Data have been generated using the validation dataset.

Fig. 6. Detection performance with single parts, showing the reliability of each part type. The graph conrms the assumptions regarding the difculty to detect the head.

Fig. 5. Detection performance with varying numbers of parts. Note that two parts out perform, while three parts are just as bad as one since the low quality of the images and the high difculty in identifying small areas such as the head.

Fig. 7. Comparison of different methods for combined part-verication. With a low false positive rate, the radial approach performs better.

C. Combined verication stage on the number of stages in the Haar-cascade, after which the performance degrades. B. Part verication stage An SVM has been used in this stage. One of the most widely discussed topic about the part-based approach regards in how many parts divide the shape of pedestrian. A performance analysis with 1, 2, and 3-part verication is shown in Fig.5, where it is possible to see how 2 parts performs better than 1 and 3 part. In order to justify these results, the relevance of each part is explained in Fig.6: due to the low images resolution, it is very difcult to detect the head and, this causes the consequent bad result of the three parts approach. Furthermore, observing the low detectability of the legs and the high detectability of the upper body, the combination of these two parts is the best possible combination to manage the trade-off between true and false positives. In order to obtain the classication of the detection window is need to merge the three results from the verication stage. Four options have been investigated, as shown in Fig.7: linear SVM, radial SVM, bayesian classication and a majority vote approach. The bayesian approach [24] seems to provide the best results, but it provides also an high false positive rate. These results are justiable by the linear separation on set of non-linear data and, the resulting misclassication of many pedestrians. Assuming an appropriate trade-off between true positives and false positives, the radial approach can be considered the best, also justied by the non linear data returned from the verication stage. D. Speed evaluation An evaluation of system speed performance on a given hardware is shown in Fig.8. The largest contribution in processing time is the Haar-features calculation; however, the variance in computational time is related to the classication of the body parts. Incrementing k , the detection stage will

1401

TABLE I F INAL SYSTEM PERFORMANCE Detection rate Basic system without tracking With resized image (320x240) 0.673 0.55 False positive rate 0.046 0.02

Frame rate Our system with tracking Fastest system in [5] 15.5 FPS 30 FPS 2.6 FPS

Image size 640x480 320x240 640x480

Fig. 8. Speed versus detection rate and false positive rate. The time has been measured for each stage, denoted as k. We see the reduction of false positives and the increase of true positive rising the number of stages.

TABLE II C OMPARISON OF OUR PART BASED PEDESTRIAN DETECTION HIGH PERFORMANCE SYSTEM WITH STATE - OF - THE - ART PEDESTRIAN DETECTION SYSTEMS AT 0.1 FPPF ON THE DAIMLER DB DATASET, ET. AL . [5]. T HE PAPER CONTAINS AN COURTESY OF D OLL AR
EXPLANATION OF EACH OF THE SYSTEMS . ALSO LISTED , ALTHOUGH DETECTION RATES FOR THE DATASET ARE UNKNOWN .

T HE FASTEST SYSTEM IS DAIMLER DB

A BBREVIATIONS ARE THE SAME DESCRIBED IN [5] . Average detection rate 0.72 0.71 0.62 0.61 0.51 0.45 0.43 0.42 0.40 0.06 0.05 N/A

Algorithm Our Algorithm MultiFtr+Motion LatSvm-V2 MultiFtr+CSS HogLbp HikSvm MultiFtr LatSvm-V1 HOG Shapelet VJ FPDW

Part-based yes no yes no no no no yes no no no no

Speed 15.5 FPS 0.004 0.164 0.005 0.014 0.036 0.017 0.098 0.054 0.010 0.089 FPS FPS FPS FPS FPS FPS FPS FPS FPS FPS

Fig. 9. Final test detection. Performance evaluation on the last part of Daimler-DB images.

2.670 FPS

provide more candidate pedestrians having a large impact on the system speed. A good trade-off between true positives, false positives and speed is given by the 13 stages, obtained a processing time of 64.363 ms, corresponding to about 15.5 frame per seconds at a resolution of 640x480. V. FINAL PERFORMANCE EVALUATION Following the results explained in section IV, the nal system was built using the Daimler-DB dataset with 10000 images to train the classier, choosing 13 as number for the cascade stages, a 2-part approach and a radial SVM for the nal stage. Perfomance on the DaimlerDB test set and our dataset are shown in Table I. A. Final test without tracking Performance on Daimler-DB dataset are shown in Fig.9: an ROC curves is plotted reporting several results varying the number of cascade stages (k ). Considering the 13th stages as the best, a detection rate of about 0.673 with a false positive rate of 0.046 is obtained. To provide more consistency to the results obtained, a comparison of our algorithm with the

main pedestrian detectors of state of the arts as described in [5] is shown in table II. As it is possible to see, our system performs better of each other algorithm and, furthermore, with a huge speedup. The reported comparison does not include some more recent works such as [26][28], however they perform on different architecture and on different test set. Moreover, where it was possible made some comparison, it has been seen how our performance in terms of speed and detection rate are similar or better. B. Tracking improvements As described in III-E, to appreciate the tracking approach benets sequences are required with an high frame rate. In this case, DaimlerDB has a low frame rate (10 Hz), too small to ensure a good matching between features. In order to test the improvements achievable with the tracking system, a new test set has been rec (two sequences of 5182 and 11490 frames respectively) captured from the real prototype described in section III.

1402

An increase of 27% and 22% of true positives on the two dataset was obtained with a reduction of 5% and 10% of false positives with respect to the same system without tracking. Tracking system provides more robust results that allows to handle missed detections of pedestrians in intermediate frames and avoid possible single frame misclassication. VI. CONCLUDING REMARKS AND FUTURE WORKS The implementation of a novel high-performance pedestrian detector on a real prototype has been described in this work. A combination of Haar-cascade and HOG-SVM has been used introducing a novel combination of HOG features with a part-based approach; a features-based tracking was also developed to counteract the close bonds of a partbased approach. Exploiting the CPU multicore features, it was obtained an high-speed system working at 15.5 Hz that outperforms all the others state-of-the-arts classiers with a detection rate of 0.72 considering a false positive rate of 0.1 . Despite these achievements, signicant improvements are still possible in order to enhance the pedestrian detector. A partial use of processors (about 70 %) suggests the presence of parts of OpenCV code not yet parallelized: a low-level reimplementation of the system. An high-level processing could be also included, using the Kalman lter to compute the future trajectory of the pedestrian and avoiding dangerous situations. Further improvements are achievable considering the vehicle ego-motion in the tracking phase and the pitch of the vehicle in the candidate ltering stage. VII. ACKNOWLEDGMENTS The authors would like to thank our colleagues in the LISA-CVRR lab and the important contributions of Andreas Mogelmose in developing the initial version of the part based algorithm. The authors would also like to acknowledgment Cassa di Risparmio di Parma e Piacenza for funding the test platform used for this work. The work described in this paper has been developed in the framework of the Open intelligent systems for Future Autonomous Vehicles (OFAV) Project funded by the European Research Council (ERC) within an Advanced Investigators Grant. R EFERENCES
[1] T. Gandhi and M. Trivedi, Pedestrian protection systems: Issues, survey, and challenges, Intelligent Transportation Systems, IEEE Transactions on, vol. 8, no. 3, pp. 413430, 2007. [2] K. Fuerstenberg, Pedestrian protection using laserscanners, in Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE, sept. 2005, pp. 437442. [3] A. Yoshizawa, M. Yamamoto, and J. Ogata, Pedestrian detection with convolutional nerual networks, Intelligent Vehicles Symposium, Proceedings IEEE, pp. 224229, 2005. [4] A. Broggi, P. Cerri, S. Ghidoni, P. Grisleri, and H. Jung, A new approach to urban pedestrian detection for automatic braking, Intelligent Transportation Systems, IEEE Transactions on, vol. 10, no. 4, pp. 594605, 2009. [5] P. Dollar, C. Wojek, B. Schiele, and P. Perona, Pedestrian Detection: An Evaluation of the State of the Art, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 4, pp. 743761, april 2012.

[6] M. Enzweiler and D. Gavrila, Monocular pedestrian detection: Survey and experiments, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 12, pp. 21792195, 2009. [7] T. Gandhi and M. Trivedi, Image based estimation of pedestrian orientation for improving path prediction, IEEE Intelligent Vehicles Symposium, pp. 506511, 2008. [8] S. Krotosky and M. Trivedi, On color-, infrared-, and multimodalstereo approaches to pedestrian detection. IEEE Transactions on Intelligent Transportation Systems, vol. 8, pp. 619629, 2007. [9] T. Gandhi and M. Trivedi, Parametric ego-motion estimation for vehicle surround analysis using an omnidirectional camera, Machine Vision and Applications, vol. 16, pp. 8595, 2005. [10] C. Papageorgiou and T. Poggio, A trainable system for object detection, International Journal of Computer Vision, vol. 38, pp. 15 33, 2000. [11] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. Ieee, 2005, pp. 886893. [12] P. Geismann and G. Schneider, A two-staged approach to visionbased pedestrian recognition using Haar and HOG features, in Intelligent Vehicles Symposium, 2008 IEEE. IEEE, 2008, pp. 554559. [13] B. Wu and R. Nevatia, Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors, International Journal of Computer Vision, vol. 75, no. 2, pp. 247266, 2007. [14] X. Mao, F. Qi, and W. Zhu, Multiple-part based Pedestrian Detection using Interfering Object Detection, in Natural Computation, 2007. ICNC 2007. Third International Conference on, vol. 2. IEEE, 2007, pp. 165169. [15] P. Felzenszwalb, D. McAllester, and D. Ramanan, A discriminatively trained, multiscale, deformable part model, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1 8, june 2008. [16] P. Dollar, C. Wojek, B. Schiele, and P. Perona, Pedestrian detection: A benchmark, in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, june 2009, pp. 304311. [17] A. Mgelmose, A. Prioletti, M. M. Trivedi, A. Broggi, and T. B. Moeslund, A Two-stage Part-Based Pedestrian Detection System Using Monocular Vision, 15th IEEE International Conference on Intelligent Transportation Systems, pp. 7377, sept 2012. [18] M. Bertozzi, L. Bombini, A. Broggi, P. Cerri, P. Grisleri, and P. Zani, GOLD: A framework for developing intelligent-vehicle vision applications, IEEE Intelligent Systems, vol. 23, no. 1, pp. 6971, Jan.Feb. 2008. [19] P. Grisleri and I. Fedriga, The BRAiVE platform, in Procs. 7th IFAC Symposium on Intelligent Autonomous Vehicles, Lecce, Italy, Sept. 2010. [20] P. Viola and M. Jones, Robust real-time object detection, International Journal of Computer Vision, vol. 57, no. 2, pp. 137154, 2001. [21] H. A. Mallot, H. H. B ulthoff, J. J. Little, and S. Bohrer, Inverse perspective mapping simplies optical ow computation and obstacle detection. Biol Cybern, vol. 64, no. 3, pp. 177 85, 1991. [Online]. Available: http://www.biomedsearch.com/nih/ Inverse-perspective-mapping-simplies-optical/2004128.html [22] M. Bertozzi, A. Broggi, A. Fascioli, and R. Fascioli, Stereo inverse perspective mapping: Theory and applications, Image and Vision Computing Journal, vol. 8, pp. 585590, 1998. [23] C. Chang and C. Lin, LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 127, 2011. [24] K. Fukunaga, Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990. [25] A. Geiger, J. Ziegler, and C. Stiller, Stereoscan: Dense 3d reconstruction in real-time, Intelligent Vehicles Symposium (IV), IEEE, pp. 963968, 2011. [26] D. Mitzel, P. Sudowe, and B. Leibe, Real-Time Multi-Person Tracking with Time-Constrained Detection, British Machine Vision Conference, 2011. [27] R. Benenson, M. Mathias, R. Timofte, and L. Van Gool, Pedestrian detection at 100 frames per second, Computer Vision and Pattern Recognition, pp. 29032910, jun 2012. [28] R. Doll ar, P. Appel and W. Kienzle, Crosstalk Cascades for FrameRate Pedestrian Detection , European Conference on Computer Vision, 2012.

1403

Вам также может понравиться