Вы находитесь на странице: 1из 8

Chapter 4

Figure 4.1. Conceptual Design

4.1 Vehicle Detection Module

The vehicle detection module has two parts, a Gaussian Mixture Model and a CNN vehicle detector. To
detect the moving vehicles in the video, the Gaussian Mixture Model is used to detect moving blobs.
Each blob is passed to the CNN vehicle detector to filter out the moving objects that are not vehicles.
Detection is done on a set interval of frames. In this system, the interval is set to the frame rate divided
by 6. On a video with 60 frames per second, the system will detect every 10 frames. Even if the interval
can be set to a larger value (e.g., every one second), the detection is responsible for giving blobs that will
be used in the perspective transform module. Therefore, a lower value will help speed up the calibration
process.

The Gaussian Mixture Model is pre-trained for some frames for better results. The minimum blob is set
by measuring some bounding boxes from the datasets and determining a value based on the measures.
A lower threshold is used to make the model more sensitive to movement. As a consequence, the
detection is more prone to have noise, but the problem is handled later on by the tracking module.
Morphological operations are also applied to remove noise, fill in holes and overall improve the shape of
the blobs. Bounding boxes are created after blob analysis and these are passed on to the tracking
module.

To filter out non-vehicle objects, a CNN vehicle detector is used. In the system, two CNN detectors are
used, Matlab’s default Faster-RCNN Vehicle Detector from the Automated Driving Toolbox, and the CNN
classifier from Vemon (). Nevertheless, the quality of detection can be negatively affected when the CNN
detectors give erroneous results. In such cases, turning off the CNN filter may This issue is discussed
further in Chapter 5.

4.2 Tracking Module

Once the blobs and bounding boxes are detected, the system starts tracking the objects. For every
region of interest (i.e., bounding box), strong points are detected and tracked through frames using the
KLT tracking algorithm.

When tracking, some points may be incorrectly identified as part of the actual vehicle. Thus, some noisy
points can remain in scene. Noisy points can also appear when the detection module sends a non-
vehicle blob. On that regard, the tracking module attempts to remove noise and unwanted boxes by
setting a minimum and maximum bounding box area. Existing boxes are also filtered using this method.
In some cases, points that are not part of the vehicle are incorrectly detected. In this scenario, as the
vehicle moves, it is likely that the wrong points will diverge from the object. To mitigate this, outlier
points are determined using the standard deviation and the median point.

Because the tracker has access to the moving points in the objects, it is responsible for accomplishing
the following tasks: creating the motion lines and computing the speed.

To determine the motion line of each object, the centroid of the points is computed. On a given interval
of frames, the centroid is tracked and a line is drawn from the previous position to the current position
of the centroid. A similar approach can be used to compute the speed. Instantaneous speed can be
determined by tracking the centroid, computing its distance moved and divide it by the time elapsed.

4.3 Vanishing Point Detection Module

Vanishing point detection is done on an initial frame which can be the first frame of the video or a
background model. In this step, either a single vanishing point or two vanishing points will be detected
depending on the type of scene. First, the main vanishing point of the scene is detected through the
method described in Chapter 3. The detected lines are sampled to remove redundant lines and improve
the accuracy and efficiency of the detection. If the main vanishing point is located horizontally in the
middle of the scene, a second vanishing point will not be detected. The middle portion is determined
using a given percentage of the image width. When the vanishing point lies in the middle portion, the
scene is assumed to have a single vanishing point perspective.
Figure 2. Two vanishing points detected (left), single vanishing point (right)

When the vanishing point is not horizontally located in the middle, a second vanishing point is detected.
While this can be achieved by using lines that intersect at another point, in most cases, these lines are
not present in the scene. Even when such lines are present, because there can be a noisy set of lines
that are detected, no reliable vanishing point could be detected. Hence, the second vanishing point is
estimated with certain assumptions:

 First, the vanishing line, the line that connects the two vanishing points, is assumed to be purely
horizontal. This means that the camera is assumed to have minimal to no rotation on the Z-axis.
 Second, the horizon line is always above the vertical center of the scene and above the objects
of interest (i.e., the vehicles). This assumes that the camera is always angled downwards to the
road.
 Lastly, the distance between the camera and the scene is assumed to be similar across the
different videos.

Figure 3. Example of a horizon line


Figure 4. Variables used for multiple linear regression

Given these assumptions, the second vanishing point can be estimated through a simple method.
Because the line connecting the two vanishing points is horizontal, the second vanishing point must
have the same vertical position as the first vanishing point. In other words, the y-values of the two
vanishing points are the same. With this, only the x-value of the second vanishing point is needed. For
this, the horizontal distance between the two vanishing points or the length of the vanishing line must
be known. The available perspective information given the first vanishing point are the following: its ‘y-
distance’ from the center, ‘x-distance’ from the center, the width and height of the video. To adjust to
different video sizes, the ‘y-distance’ and ‘x-distance’ can be divided by the height and width of the
video respectively. This creates two variables called ‘x-ratio’ and ‘y-ratio’. With only the ‘x-ratio’ and ‘y-
ratio’ as the available information, a relationship must be created using the two variables with the
horizontal distance to the second vanishing point. A multiple linear regression fit is used using ground-
truthed scenes. In the multiple regression, apart from the variables ‘x-ratio’ and ‘y-ratio’, another
independent variable is included, that is, the change in ‘y’ between the two vanishing points. This is to
account for non-horizontal horizon lines caused by camera rotation. The dependent variable is the
distance between the two vanishing points divided by the width. This is basically the multiplier value to
retrieve the horizontal distance from the width of the video. For this reason, this value will be called
‘multiplier’. Using the ground truth of the three independent variables ‘x-ratio’, ‘y-ratio’, ‘change-in-y’,
and the dependent variable ‘multiplier’’, a multiple linear regression fit is solved. To estimate the
horizontal position, the multiplier value computed from the regression is multiplied to the width of the
video. This value is assumed to be the horizontal distance between the two vanishing points. Therefore,
the ‘x-position’ of the second vanishing point is the ‘x-position’ of the first vanishing point offset by the
horizontal distance to the left or right depending on which the opposite side is.

Once the two vanishing points have been determined, the vertical vanishing point is chosen. The
vanishing point closer to the center horizontally is the vertical vanishing point. In addition, as the video
is processed, the vanishing point is re-detected on a set interval to allow the system to adjust to some
movement if needed.
4.4 Perspective Transform Module

4.4.1. Perspective Bounding Box

To estimate a perspective transform, a four-point correspondence is created from the four corners of a
vehicle in the scene and a rectangle representing the real-world dimensions of the vehicle. This means
that the box should effectively show the floor space occupied by the vehicle which makes traditional
bounding boxes unusable. Hence, a perspective bounding box is created using the detected vanishing
points. The system creates the boxes in following manner, similar to Dubska et al.’s (2013) approach.
Visualization can be found in Figure 5. Using the first vanishing point, a line that passes through the
upper edge of the vehicle, line A, is drawn. From the same vanishing point, Line B, passes through the
bottom of the vehicle. The same steps are done for the second vanishing point if it exists. If there is no
second vanishing point, horizontal lines are drawn to the top and bottom of the vehicle. This produces
line C and D. Intersection points 1 and 2 result from line E and F and C and D. The intersection points are
used to obtain line A2 and C2. Lines A2, B, C2 and D represent the correct perspective bounding for the
vehicle. Essentially, the lines that cover the top portions are adjusted since they do not represent the
floor but rather the roof of the vehicle.

Figure 5

However, this approach only works when the vehicles are aligned with the perspective. Figure()
illustrates cases where the approach produces incorrect boxes. This also happens when the vanishing
point is found horizontally at the center of the vehicle as seen in Figure(). The perspective bounding is
created differently for such cases. Because two vanishing points are used, the vehicle can be aligned on
either perspective. For this reason, the worst alignment is at 45 degrees, where the vehicle aligned with
neither perspective. A vehicle is treated as aligned when it is less than 30 degrees and greater than 60.
As described earlier, the top lines are adjusted to pass through the bottom corners but in this case,
there is no reliable way of determining this adjustment. To create a better bounding box, the
adjustment is estimated based on the horizon distance from the vertical center. A larger distance results
to a smaller adjustment and a smaller distance to a larger adjustment respectively. From figure(), lines A
and B are adjusted by the estimate, creating the correct floor bounding.

4.4.2 Perspective Transform and Scaling

The perspective bounding boxes from the detections are used to create the perspective transform
matrix. The bounding boxes are adjusted based on the alignment of the vehicles. Geometric calculations
are done to retrieve the true rectangular space occupied by the vehicle when it is oriented diagonally. A
correspondence between the bounding points and a rectangle defined by the given vehicle dimensions.
Thus, transform matrices are created from the given bounding boxes. Scores are given based on the
alignment of the vehicle, ratio of the length and width, and the area of the bounding box.

The ratio is used to reject boxes that do not follow the general shape of the bounding boxes. In the
simplest sense, the ratio should follow the real-world dimensions of vehicles. However, pixel distances
are not representative of the true ratio, unless the view is orthogonal. On perspective scenes, the ratio is
affected by the angle at which the camera is pointed to the road. When the camera is pointed directly
down to the road, the ratio corresponds to the true value; generally, the length should have a greater
value than the width (i.e. ratio of more than 1). However, on the opposite extreme, when the camera is
angled parallel to the road, the length will appear to be zero. With this observation, it can be said that
the length is longer when the camera is more tilted down to the road. While the tilt or the angle of the
camera is unknown, it is reflected by the distance of the horizon line from the center. Thus, the ratio is
determined by creating a regression where the ratio, increases when the distance of the horizon from
the center is greater.

Despite having a good ratio, the bounding box may not have the expected size. To prioritize boxes with
correct sizes, the area of the bounding box is also considered. When a new bounding box is created, its
area is compared to the mean area of the previous bounding boxes. During the first transform
computation, the areas are assumed to be correct, but as the system detects several boxes throughout
the video, the mean area is computed, and the bounding box area scores are re-computed. The last
heuristic for computing the score is the alignment. Because a more systematic bounding approach is
performed on aligned vehicles, their bounding boxes are assumed to be more reliable.
The system computes the transform to be used every time it gathers a set number of new perspective
bounding boxes. When a new batch of transforms are added and assigned with scores, the system gets
all the transforms with scores above a set threshold. The median of the selected matrices is used as the
final transform matrix.
yes
Detect
Vanishing Point
If Initial Start loop End loop
Video
no Frame
Track existing
objects Video
Frame

Vanishing Vehicle Speed


no no Calibrate no no
Point Detect Compute
Interval
Detect Interval Interval
Interval

yes yes
yes yes
Detect Moving
Re-detect
Blobs
Vanishing Point
Perspective
no Transform
Bounding no
Boxes ready
Filter out non- Available
Check each
vehicles
blob if it exists

yes yes
Detect points
and track new Create transform
If
no object matrix from Compute Speed
object
available boxes for tracked objects
exists

Create motion
yes lines for objects
being tracked Assign scores

If object
has motion no Clean noisy
line points and Compute final
invalid objects transform matrix
from matrices
with high scores
yes

Create Reevaluate scores


Perspective of existing
Bounding Box transforms

Вам также может понравиться