Вы находитесь на странице: 1из 12

08-Feb10

ECE/CS 181B
COMPUTER VISION
SIFT
Date: 8 Feb 2010

08-Feb10

Introduction:
Scale-Invariant Feature Transform (SIFT) is an algorithm to detect and
describe local features in images that is independent of the image scale,
noise, illumination and local geometric distortion. The SIFT algorithm was
first published by David Lowe in 1999, and is described in detail in his paper
Distinctive image features from scale-invariant keypoints from 2004. One
of the main uses of the SIFT algorithm is For any object in an image, there
are many 'features' which are interesting points on the object, that can be
extracted to provide a "feature" description of the object. This description
extracted from a training image can then be used to identify the object
when attempting to locate the object in a test image containing many other
objects.

The Algorithm:
Lowe's method for image feature generation called the Scale Invariant
Feature Transform (SIFT) transforms an image into a large collection of
feature vectors. The first step in the algorithm is to identify interest points,
this is done by locating what is called scale-space extremas. Essentially
this is done by blurring the image by convolving with Gaussian filters at
different scales, and then the difference of successive Gaussian-blurred
images are taken. Keypoints are then taken as maxima/minima of
the Difference of Gaussians (DoG) that occur at multiple scales. The next
step is to filter out unwanted points, this is done by removing points with
low contrast or are poorly localized along an edge (done by looking at
principal curvatures). Then all keypoints are assigned an orientation and a
magnitude. Finally a descriptor is calculated for each point based on the
image closest in scale to the keypoint's scale.

Method:
1. Scale-space extrema detection:
The starting point of the algorithm is to identify a set of features that are
invariant to scaling transforms by choosing 'key points' in regions of interest
in the image. These features are each described by a multidimensional
feature descriptor vector that will be described later. To achieve invariance
at different image scales, the given image is repeatedly smoothed using
Gaussian lowpass filters of varying values of variance.
We choose a set of images smoothed by variance values from to 2
in an Octave and evaluate the difference of Gaussian smoothed images
(DoG) as follows:

08-Feb10

Where
and
are two reference kernels of
different variances and I(x,y) is the original image under consideration.
We extend this idea to further octaves ranging from 2 to 4, etc and
construct sets of Difference of Gaussian images to evaluate local extrema.

Local Extrema:
We seek such pixels in each set which are either the local maximum or
minimum points across their 26 neighbours in the same plane and the
planes above and below the image under consideration, as shown:

These are the initial set of keypoints of interest in the set of smoothed
images in each octave.

08-Feb10

2. Keypoint Localization and Filtering:


Once a set of keypoints are identified in each octave, we need to filter out
inconsistent pixels and retain only those which are truly significant for
detection and remain invariant to scale space transformations. The goal is
to filter out low contrast keypoints as well as keypoints observed near
edges, as they are not stable in futher steps of processing. To achieve this,
the procedure adopted is the following:
Low contrast keypoint elimination:
The scale space function D(x,y,) is shifted on each sample point and the
location of the extrema point is obtained using the relation:

Where each of the partial derivatives shown are applied at the location of
the sample points x. The value of this function evaluated at the offset points
can be used to filter out keypoints with low contrast such as 0.03 (as
suggested in Lowes paper).
Edge Response Elimination:
Edges have a large principal curvature in one direction but low curvature in
the perpendicular direction. Therefore, to evaluate these parameters and
filter out such keypoints resident on edges, we consider a 2x2 Hessian
matrix of the form below:

Where Di,j are the respective partial derivatives mentioned earlier. To


simplify our decision on edge responses along Dxx, Dyy and Dxy directions, we
use some linear algebra to obtain the following observation;

08-Feb10

Where r is the threshold. As suggested in the paper, we choose a value of


10 for r to eliminate keypoints along edges.
3. Dominant orientation: One or more orientations assigned to the keypoint,
and data normalized w.r.t orientation.
After localizing the keypoint location, we need to determine their magnitude
and direction. These keypoints can be treated like vectors since they need
both magnitude and direction for their complete description.

The magnitude and orientation are computed as follows, the scale of the
keypoint is used to select the Gaussian smoothed image, , with the closest
scale, so that all computations are performed in a scale-invariant manner.
For each image sample,
, at this scale, the gradient magnitude,
,
and orientation,
), is precomputed using pixel differences:

An orientation histogram is formed from the gradient orientations of sample


points within a region around the keypoint. The orientation histogram has 36
bins covering the 360 degree
range of orientations. Each sample added to the histogram is weighted by
its gradient magnitude and by a Gaussian-weighted circular window with a
that is 1.5 times that of the scale
of the keypoint.Peaks in the orientation histogram correspond to dominant
directions of local gradients. The highest peak in the histogram is detected,

08-Feb10

and then any other local peak that is within 80% of the highest peak is used
to also create a keypoint with that orientation.

4. Keypoint descriptor: a representation of the local region around the


detected keypoints based on histogram of oriented edges.

Till now we have got the location of the keypoints, their magnitude and
orientation. The next step is to compute a descriptor for the local image
region that is highly distinctive yet is as invariant as possible to remaining
variations, such as change in illumination or 3D viewpoint.

First the image gradient magnitudes and orientations are sampled around
the keypoint location, using the scale of the keypoint to select the level of
Gaussian blur for the image. In order to achieve orientation invariance, the
coordinates of the descriptor and the gradient orientations are rotated
relative to the keypoint orientation. A Gaussian weighting function with
equal to one half the width of the descriptor window is used to assign a
weight to the magnitude of each sample point. The keypoint descriptor is
shown on the right side of the above figure. It allows for significant shift in
gradient positions by creating orientation histograms over 4x4 sample

08-Feb10

regions. The figure shows eight directions for each orientation histogram,
with the length of each arrow corresponding to the magnitude of that
histogram entry. A gradient sample on the left can shift up to 4 sample
positions while still contributing to the same histogram on the right, thereby
achieving the objective of allowing for larger local positional shifts. The
descriptor is formed from a vector containing the values of all the
orientation histogram entries, corresponding to the lengths of the arrows on
the right side of the Figure. A 4x4x8 = 128 element feature vector for each
keypoint is computed.Finally, the feature vector is modified to reduce the
effects of illumination change. First, the vector is normalized to unit length.
A change in image contrast in which each pixel value is multiplied by a
constant will multiply gradients by the same constant, so this contrast
change will be canceled by vector normalization. A brightness change in
which a constant is added to each image pixel will not affect the gradient
values, as they are computed from pixel differences.

Results:
SIFT was implemented using MATLAB. The keypoints with their magnitude
and direction are plotted over the original image. The arrow points to the
dominant direction of a keypoint and the size of the arrow the magnitude.
Minimum angle of the dot product of the keypoint descriptor of one image
with all keypoint descriptors of another image is computed and those
corresponding to minimum angle are connected using lines.
The results below show Lena image with noise added, sub sampled Lena
image, Gaussian smoothed Lena image and rotated Lena image matched
with original Lena image.

08-Feb10

Lena image with arrows pointing the dominant angle

08-Feb10

Arrows showing dominant angle in a noisy Lena image

08-Feb10

Matching of keypoints of Lena image with rotated Lena image

Matching of keypoints of Lena image with noisy Lena image

08-Feb10

Matching of keypoints of Lena image with sub sampled Lena image

Matching of keypoints of Lena image with Gaussian smoothed Lena image

08-Feb10

References:
[1] http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
[2] http://people.cs.ubc.ca/~lowe/keypoints/
[3] Test Images:
http://www.ece.ucsb.edu/~manj/ece178/lena.gif

Вам также может понравиться