Вы находитесь на странице: 1из 9

Object Detection Using Faster R-CNN Deep Learning

This example shows how to train an object detector using a deep learning technique named Faster R-CNN
(Regions with Convolutional Neural Networks).

Overview
Deep learning is a powerful machine learning technique that automatically learns image features for training
robust object detectors. There are several techniques for object detection using deep learning such as Faster
R-CNN and you only look once (YOLO) v2. This example trains a Faster R-CNN vehicle detector using the
trainFasterRCNNObjectDetector function.

Learn more about object detection using deep learning.

Note: This example requires Computer Vision Toolbox™ and Deep Learning Toolbox™. Parallel Computing
Toolbox™ is recommended to train the detector using a CUDA-capable NVIDIA™ GPU with compute capability
3.0.

Load Dataset
This example uses a small vehicle data set that contains 295 images. Each image contains one or two labeled
instances of a vehicle. A small data set is useful for exploring the Faster R-CNN training procedure, but in
practice, more labeled images are needed to train a robust detector.

clear ;clc;close all;


% Load pikachu dataset ground truth.
data = load('DOyMADatasetGroundTruth.mat');
DOYMADataset = data.DOyMADatagTruth;

The ground truth data is stored in a table. The first column contains the path to the image files. The remaining
columns contain the ROI labels for vehicles.

% Display first few rows of the data set.


DOYMADataset(1:4,:)

ans = 4×2 table


imageFileName DOYMA

1 'C:\Users\IPN\... [960,58...
2 'C:\Users\IPN\... [602,55...
3 'C:\Users\IPN\... [473,55...
4 'C:\Users\IPN\... [563,55...

Display one of the images from the data set to understand the type of images it contains.

% Add the fullpath to the local vehicle data folder.


% pikachuDataset.imageFileName = fullfile(pwd, pikachuDataset.imageFileName);

% Read one of the images.

1
I = imread(DOYMADataset.imageFileName{10});

% Insert the ROI labels.


I = insertShape(I, 'Rectangle', DOYMADataset.DOYMA{10});

% Resize and display image.


I = imresize(I,3);
figure
imshow(I)

Split the data set into a training set for training the detector, and a test set for evaluating the detector. Select
60% of the data for training. Use the rest for evaluation.

% Set random seed to ensure example training reproducibility.


rng(0);

% Randomly split data into a training and test set.


shuffledIdx = randperm(height(DOYMADataset));
idx = floor(0.6 * height(DOYMADataset));
trainingData = DOYMADataset(shuffledIdx(1:idx),:);
testData = DOYMADataset(shuffledIdx(idx+1:end),:);

Configure Training Options

2
trainFasterRCNNObjectDetector trains the detector in four steps. The first two steps train the region
proposal and detection networks used in Faster R-CNN. The final two steps combine the networks from the
first two steps such that a single network is created for detection [1]. Specify the network training options for all
steps using trainingOptions.

% Options for step 1.


options = trainingOptions('sgdm', ...
'MaxEpochs', 5, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);

The 'MiniBatchSize' property is set to 1 because the vehicle dataset has images with different sizes. The
prevents them from being batched together for processing. Choose a MiniBatchSize greater than 1 if the
training images are all the same size to reduce training time.

The 'CheckpointPath' property is set to a temporary location for all the training options. This name-value pair
enables the saving of partially trained detectors during the training process. If training is interrupted, such as
from a power outage or system failure, you can resume training from the saved checkpoint.

Train Faster R-CNN


The Faster R-CNN object detection network is composed of a feature extraction network followed by two sub-
networks. The feature extraction network is typically a pretrained CNN such as ResNet-50 or Inception v3. The
first sub-network following the feature extraction network is a region proposal network (RPN) trained to generate
object proposals (object or background). The second sub-network is trained to predict the actual class of each
proposal (car or person).

This example uses a pretrained ResNet-50 for feature extraction. Other pretrained networks such
as MobileNet v2 or ResNet-18 can also be used depending on application requirements. The
trainFasterRCNNObjectDetector function automatically adds the sub-networks required for object detection.
You many also create a custom Faster R-CNN network.

Train Faster R-CNN object detector if doTrainingAndEval is true. Otherwise, you can load a pretrained
network.

% Train Faster R-CNN detector.


% * Use 'resnet50' as the feature extraction network.
% * Adjust the NegativeOverlapRange and PositiveOverlapRange to ensure
% training samples tightly overlap with ground truth.
[detector, info] = trainFasterRCNNObjectDetector(trainingData, 'alexnet', options, ...
'NegativeOverlapRange', [0 0.3], ...
'PositiveOverlapRange', [0.6 1]);

*************************************************************************
Training a Faster R-CNN Object Detector for the following object classes:

* DOYMA

Step 1 of 4: Training a Region Proposal Network (RPN).

3
Warning: Invalid bounding boxes from 1 out of 252 training images were removed. The following rows in
trainingData have invalid bounding box data:

Invalid Rows
____________

52

Bounding boxes must be fully contained within their associated image and must have positive width and height.
Training on single GPU.
|=======================================================================================================|
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning |
| | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate |
|=======================================================================================================|
| 1 | 1 | 00:00:02 | 1.3094 | 67.19% | 0.86 | 0.0010 |
| 1 | 50 | 00:00:13 | 0.5515 | 91.41% | 0.48 | 0.0010 |
| 1 | 100 | 00:00:24 | 0.7343 | 89.76% | 0.56 | 0.0010 |
| 1 | 150 | 00:00:34 | 0.8839 | 54.69% | 0.66 | 0.0010 |
| 1 | 200 | 00:00:45 | 0.8803 | 72.44% | 0.68 | 0.0010 |
| 1 | 250 | 00:00:55 | 0.6347 | 86.72% | 0.55 | 0.0010 |
| 2 | 300 | 00:01:14 | 0.7146 | 85.83% | 0.77 | 0.0010 |
| 2 | 350 | 00:01:25 | 1.5682 | 61.42% | 1.01 | 0.0010 |
| 2 | 400 | 00:01:36 | 0.6590 | 83.59% | 1.16 | 0.0010 |
| 2 | 450 | 00:01:46 | 0.7015 | 79.69% | 1.00 | 0.0010 |
| 2 | 500 | 00:01:57 | 0.8008 | 86.72% | 0.77 | 0.0010 |
| 3 | 550 | 00:02:15 | 0.3830 | 96.09% | 0.53 | 0.0010 |
| 3 | 600 | 00:02:26 | 1.0613 | 73.23% | 1.27 | 0.0010 |
| 3 | 650 | 00:02:36 | 0.5723 | 82.03% | 0.54 | 0.0010 |
| 3 | 700 | 00:02:47 | 0.5809 | 89.06% | 0.68 | 0.0010 |
| 3 | 750 | 00:02:58 | 0.7283 | 92.19% | 0.74 | 0.0010 |
| 4 | 800 | 00:03:16 | 0.3657 | 99.22% | 0.58 | 0.0010 |
| 4 | 850 | 00:03:27 | 0.3398 | 90.63% | 0.70 | 0.0010 |
| 4 | 900 | 00:03:38 | 0.3847 | 90.63% | 0.39 | 0.0010 |
| 4 | 950 | 00:03:49 | 0.4135 | 91.34% | 0.52 | 0.0010 |
| 4 | 1000 | 00:04:00 | 0.5336 | 90.48% | 0.52 | 0.0010 |
| 5 | 1050 | 00:04:18 | 0.4953 | 71.65% | 0.49 | 0.0010 |
| 5 | 1100 | 00:04:29 | 1.3643 | 92.19% | 1.20 | 0.0010 |
| 5 | 1150 | 00:04:40 | 0.4657 | 94.53% | 0.53 | 0.0010 |
| 5 | 1200 | 00:04:51 | 0.3389 | 93.75% | 0.39 | 0.0010 |
| 5 | 1250 | 00:05:02 | 0.6125 | 89.76% | 0.66 | 0.0010 |
| 5 | 1255 | 00:05:03 | 0.6255 | 73.44% | 0.65 | 0.0010 |
|=======================================================================================================|

Step 2 of 4: Training a Fast R-CNN Network using the RPN from step 1.
--> Extracting region proposals from 251 training images...done.

Training on single GPU.


|=======================================================================================================|
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning |
| | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate |
|=======================================================================================================|
| 1 | 1 | 00:00:00 | 0.6188 | 75.78% | 0.92 | 0.0010 |
| 1 | 50 | 00:00:12 | 0.1467 | 97.66% | 0.89 | 0.0010 |
| 1 | 100 | 00:00:25 | 0.1958 | 96.09% | 0.97 | 0.0010 |
| 1 | 150 | 00:00:37 | 0.1143 | 97.66% | 0.86 | 0.0010 |
| 1 | 200 | 00:00:49 | 0.0761 | 99.22% | 0.73 | 0.0010 |
| 2 | 250 | 00:01:08 | 0.0700 | 100.00% | 0.82 | 0.0010 |
| 2 | 300 | 00:01:20 | 0.0664 | 100.00% | 1.29 | 0.0010 |
| 2 | 350 | 00:01:32 | 0.0967 | 97.66% | 0.75 | 0.0010 |
| 2 | 400 | 00:01:44 | 0.0596 | 99.22% | 0.76 | 0.0010 |
| 2 | 450 | 00:01:56 | 0.1952 | 96.88% | 0.93 | 0.0010 |
| 3 | 500 | 00:02:16 | 0.1255 | 97.66% | 1.02 | 0.0010 |
| 3 | 550 | 00:02:28 | 0.0619 | 100.00% | 0.90 | 0.0010 |
| 3 | 600 | 00:02:40 | 0.0730 | 99.22% | 1.12 | 0.0010 |
| 3 | 650 | 00:02:52 | 0.1263 | 98.44% | 0.81 | 0.0010 |

4
| 3 | 700 | 00:03:04 | 0.0395 | 100.00% | 0.79 | 0.0010 |
| 4 | 750 | 00:03:23 | 0.0729 | 99.22% | 1.01 | 0.0010 |
| 4 | 800 | 00:03:35 | 0.0518 | 100.00% | 0.88 | 0.0010 |
| 4 | 850 | 00:03:47 | 0.1180 | 100.00% | 1.06 | 0.0010 |
| 4 | 900 | 00:03:59 | 0.0634 | 99.22% | 0.77 | 0.0010 |
| 4 | 950 | 00:04:11 | 0.0068 | 100.00% | 0.44 | 0.0010 |
| 5 | 1000 | 00:04:30 | 0.1325 | 96.88% | 0.91 | 0.0010 |
| 5 | 1050 | 00:04:43 | 0.0575 | 99.22% | 1.37 | 0.0010 |
| 5 | 1100 | 00:04:55 | 0.0316 | 100.00% | 0.81 | 0.0010 |
| 5 | 1150 | 00:05:07 | 0.0884 | 100.00% | 0.73 | 0.0010 |
| 5 | 1200 | 00:05:19 | 0.0935 | 98.44% | 0.81 | 0.0010 |
| 5 | 1230 | 00:05:26 | 0.0574 | 99.22% | 0.64 | 0.0010 |
|=======================================================================================================|

Step 3 of 4: Re-training RPN using weight sharing with Fast R-CNN.


Training on single GPU.
|=======================================================================================================|
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning |
| | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate |
|=======================================================================================================|
| 1 | 1 | 00:00:00 | 0.3564 | 92.19% | 0.37 | 0.0010 |
| 1 | 50 | 00:00:09 | 0.4212 | 91.41% | 0.54 | 0.0010 |
| 1 | 100 | 00:00:19 | 0.4631 | 87.50% | 0.68 | 0.0010 |
| 1 | 150 | 00:00:28 | 0.8544 | 89.84% | 2.85 | 0.0010 |
| 1 | 200 | 00:00:38 | 0.3388 | 96.06% | 0.51 | 0.0010 |
| 1 | 250 | 00:00:47 | 0.2194 | 96.09% | 1.30 | 0.0010 |
| 2 | 300 | 00:01:03 | 0.5693 | 92.19% | 0.62 | 0.0010 |
| 2 | 350 | 00:01:13 | 0.8334 | 77.17% | 0.76 | 0.0010 |
| 2 | 400 | 00:01:22 | 0.3694 | 85.04% | 0.43 | 0.0010 |
| 2 | 450 | 00:01:32 | 0.3427 | 85.94% | 1.24 | 0.0010 |
| 2 | 500 | 00:01:41 | 0.4397 | 89.84% | 0.53 | 0.0010 |
| 3 | 550 | 00:01:58 | 0.5842 | 88.28% | 0.67 | 0.0010 |
| 3 | 600 | 00:02:07 | 0.3209 | 92.19% | 0.83 | 0.0010 |
| 3 | 650 | 00:02:17 | 0.2096 | 96.83% | 0.31 | 0.0010 |
| 3 | 700 | 00:02:26 | 0.4169 | 88.28% | 0.55 | 0.0010 |
| 3 | 750 | 00:02:36 | 0.3184 | 93.75% | 0.44 | 0.0010 |
| 4 | 800 | 00:02:52 | 0.2236 | 96.88% | 0.49 | 0.0010 |
| 4 | 850 | 00:03:01 | 0.4299 | 89.06% | 0.48 | 0.0010 |
| 4 | 900 | 00:03:11 | 0.2265 | 96.09% | 0.37 | 0.0010 |
| 4 | 950 | 00:03:21 | 0.5541 | 85.16% | 0.56 | 0.0010 |
| 4 | 1000 | 00:03:30 | 0.6608 | 82.81% | 0.85 | 0.0010 |
| 5 | 1050 | 00:03:46 | 0.5002 | 88.19% | 0.47 | 0.0010 |
| 5 | 1100 | 00:03:56 | 0.5473 | 85.94% | 0.53 | 0.0010 |
| 5 | 1150 | 00:04:05 | 0.3383 | 87.50% | 0.41 | 0.0010 |
| 5 | 1200 | 00:04:15 | 0.1642 | 96.09% | 0.73 | 0.0010 |
| 5 | 1250 | 00:04:24 | 0.4395 | 97.64% | 0.51 | 0.0010 |
| 5 | 1255 | 00:04:25 | 0.1770 | 94.53% | 0.50 | 0.0010 |
|=======================================================================================================|

Step 4 of 4: Re-training Fast R-CNN using updated RPN.


--> Extracting region proposals from 251 training images...done.

Training on single GPU.


|=======================================================================================================|
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning |
| | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate |
|=======================================================================================================|
| 1 | 1 | 00:00:00 | 0.1555 | 99.22% | 0.97 | 0.0010 |
| 1 | 50 | 00:00:10 | 0.0321 | 100.00% | 0.84 | 0.0010 |
| 1 | 100 | 00:00:21 | 0.0825 | 99.22% | 0.85 | 0.0010 |
| 1 | 150 | 00:00:32 | 0.1182 | 98.44% | 0.89 | 0.0010 |
| 1 | 200 | 00:00:42 | 0.0694 | 99.22% | 0.69 | 0.0010 |
| 1 | 250 | 00:00:53 | 0.0518 | 100.00% | 0.68 | 0.0010 |
| 2 | 300 | 00:01:11 | 0.0273 | 100.00% | 0.81 | 0.0010 |
| 2 | 350 | 00:01:21 | 0.0589 | 100.00% | 0.78 | 0.0010 |

5
| 2 | 400 | 00:01:32 | 0.1029 | 99.22% | 0.89 | 0.0010 |
| 2 | 450 | 00:01:43 | 0.0625 | 100.00% | 0.72 | 0.0010 |
| 2 | 500 | 00:01:54 | 0.0536 | 100.00% | 0.68 | 0.0010 |
| 3 | 550 | 00:02:11 | 0.0309 | 100.00% | 0.85 | 0.0010 |
| 3 | 600 | 00:02:22 | 0.0919 | 97.66% | 0.71 | 0.0010 |
| 3 | 650 | 00:02:33 | 0.0935 | 100.00% | 0.87 | 0.0010 |
| 3 | 700 | 00:02:43 | 0.0589 | 98.44% | 0.62 | 0.0010 |
| 3 | 750 | 00:02:54 | 0.0687 | 97.66% | 0.60 | 0.0010 |
| 4 | 800 | 00:03:12 | 0.0310 | 100.00% | 0.83 | 0.0010 |
| 4 | 850 | 00:03:22 | 0.0684 | 99.22% | 0.72 | 0.0010 |
| 4 | 900 | 00:03:33 | 0.1031 | 99.22% | 0.86 | 0.0010 |
| 4 | 950 | 00:03:44 | 0.0451 | 99.22% | 0.60 | 0.0010 |
| 4 | 1000 | 00:03:55 | 0.0327 | 100.00% | 0.56 | 0.0010 |
| 5 | 1050 | 00:04:12 | 0.0273 | 100.00% | 0.76 | 0.0010 |
| 5 | 1100 | 00:04:23 | 0.0457 | 100.00% | 0.70 | 0.0010 |
| 5 | 1150 | 00:04:34 | 0.0908 | 100.00% | 0.90 | 0.0010 |
| 5 | 1200 | 00:04:44 | 0.0418 | 100.00% | 0.62 | 0.0010 |
| 5 | 1250 | 00:04:55 | 0.0363 | 100.00% | 0.59 | 0.0010 |
|=======================================================================================================|

Detector training complete (with warnings):


Warning: Invalid bounding boxes from 1 out of 252 training images were removed. The following rows in
trainingData have invalid bounding box data:

Invalid Rows
____________

52

Bounding boxes must be fully contained within their associated image and must have positive width and height.
*******************************************************************

% Note: This example verified on an Nvidia(TM) Titan X with 12 GB of GPU


% memory. Training this network took approximately 10 minutes using this setup.
% Training time varies depending on the hardware you use.

As a quick sanity check, run the detector on one test image.

% Read a test image.


I = imread(testData.imageFileName{1});

% Run the detector.


[bboxes,scores] = detect(detector,I);

% Annotate detections in the image.


I = insertObjectAnnotation(I,'rectangle',bboxes,scores);
figure
imshow(I)

6
Evaluate Detector Using Test Set
Evaluate the detector on a large set of images to measure the trained detector's performance. Computer
Vision Toolbox™ provides object detector evaluation functions to measure common metrics such as average
precision (evaluateDetectionPrecision) and log-average miss rates (evaluateDetectionMissRate). Here,
the average precision metric is used. The average precision provides a single number that incorporates the
ability of the detector to make correct classifications (precision) and the ability of the detector to find all relevant
objects (recall).

The first step for detector evaluation is to collect the detection results by running the detector on the test set.

numImages = height(testData);
results = table('Size',[numImages 3],...
'VariableTypes',{'cell','cell','cell'},...
'VariableNames',{'Boxes','Scores','Labels'});

% Run detector on each image in the test set and collect results.
for i = 1:numImages

% Read the image.


I = imread(testData.imageFileName{i});

% Run the detector.


[bboxes, scores, labels] = detect(detector, I);

7
% Collect the results.
% Collect the results.
results.Boxes{i} = bboxes;
results.Scores{i} = scores;
results.Labels{i} = labels;
end
% Extract expected bounding box locations from test data.
expectedResults = testData(:, 2:end);

% Evaluate the object detector using Average Precision metric.


[ap, recall, precision] = evaluateDetectionPrecision(results, expectedResults);

The precision/recall (PR) curve highlights how precise a detector is at varying levels of recall. Ideally, the
precision would be 1 at all recall levels. The use of additional layers in the network can help improve the
average precision, but might require additional training data and longer training time.

% Plot precision/recall curve


figure
plot(recall,precision)
xlabel('Recall')
ylabel('Precision')
grid on
title(sprintf('Average Precision = %.2f', ap))

% Run detector on each image in the test set and show results.

8
% for i = 1:numImages
%
% % Read the image.
% I = imread(testData.imageFileName{i});
%
% % Run the detector.
% [bboxes, scores, labels] = detect(detector, I);
%
% I = insertObjectAnnotation(I,'rectangle',bboxes,scores);
% figure
% imshow(I)
% input('')
% end
% % Extract expe

Summary
This example showed how to train a vehicle detector using Faster R-CNN. You can follow similar steps to train
detectors for traffic signs, pedestrians, or other objects.

References
[1] Ren, S., K. He, R. Gershick, and J. Sun. "Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks." IEEE Transactions of Pattern Analysis and Machine Intelligence. Vol. 39, Issue 6, June
2017, pp. 1137-1149.

[2] Girshick, R., J. Donahue, T. Darrell, and J. Malik. "Rich Feature Hierarchies for Accurate Object Detection
and Semantic Segmentation." Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern
Recognition. Columbus, OH, June 2014, pp. 580-587.

[3] Girshick, R. "Fast R-CNN." Proceedings of the 2015 IEEE International Conference on Computer Vision.
Santiago, Chile, Dec. 2015, pp. 1440-1448.

[4] Zitnick, C. L., and P. Dollar. "Edge Boxes: Locating Object Proposals from Edges." European Conference on
Computer Vision. Zurich, Switzerland, Sept. 2014, pp. 391-405.

[5] Uijlings, J. R. R., K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. "Selective Search for Object
Recognition."_ International Journal of Computer Vision_. Vol. 104, Number 2, Sept. 2013, pp. 154-171.

Copyright 2016-2019 The MathWorks, Inc.

Вам также может понравиться