Вы находитесь на странице: 1из 15

Deep learning for object

detection
Wenjing Chen

*Created in March 2017, might be outdated the time you read.

Slide credit: CS231n


Outline
1. Introduction
2. Common methods
Region proposal based methods
R-CNN, Fast R-CNN, Faster R-CNN, R-FCN, Mask R-CNN
Single shot based methods
YOLO, YOLOv2, SSD
1. Comparison
Introduction

one image -> one label one image -> labels + bounding boxes
Region based methods - R-CNN

Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer
vision and pattern recognition. 2014.
Region based methods - Fast R-CNN

Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
Region based methods - Faster R-CNN

Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems.
2015.
Region based methods - Faster R-CNN
Region based methods - R-FCN

Average
pooling

Li, Yi, Kaiming He, and Jian Sun. "R-fcn: Object detection via region-based fully convolutional networks." Advances in Neural Information Processing Systems.
2016.
Region based methods - Mask R-CNN
Object instance segmentation:
 Extend Faster R-CNN by adding a
branch for predicting segmentation
masks on each RoI
 Running at 5 fps
 Without tricks, outperforms all existing,
single-model entries on every task in
all three tracks of the COCO suite of
challenges, including instance
segmentation, bounding-box object
detection, and person keypoint
detection !!!

He, Kaiming, et al. "Mask R-CNN." arXiv preprint arXiv:1703.06870 (2017).


Single shot based method - YOLO
1. Resize input image to 448*448.

1. Run a single convolutional network.


Predicts B bounding boxes (4 coordinates + confidence) and
C class probabilities for S*S grids, encoded as an
S*S*(B*5+C) tensor.

1. Non-maximum suppression.
S*S*B bounding boxes per image and C class probabilities
for each box.

Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
2016.
Single shot based method - YOLOv2
YOLO problem:
1. Significant number of localization errors.
2. Low recall compared to region proposal based methods.

Improvements:

Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." arXiv preprint arXiv:1612.08242 (2016).
Single shot based method - SSD

Improvements:
1. Use a small convolutional filter to predict object categories and offsets in bounding box
locations
2. Use multiple layers for prediction at different scales.

Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
Comparison
R-FCN
R-FCN
83.6% mAP
5.8fps

From YOLOv2 From SSD


PASCAL VOC 2012

http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4
Comparison

Speed
single shot > region based
Accuracy
region based > single shot
Complexity
YOLO < SSD ≤ Faster R-CNN < R-FCN < YOLOv2(?)

Вам также может понравиться