14-Featurebased Image Matching PDF

Feature-based methods for image matching
n Bag of Visual Words approach

n Feature descriptors
l SIFT descriptor
l SURF descriptor
n Geometric consistency check
n Aggregation of local descriptors into global descriptors
l Vocabulary trees
l Fisher vectors
n Image-based retrieval
l MPEG CDVS standard
l Mobile visual search
l Augmented reality
Digital Image Processing: Bernd Girod, © 2013-2018 Stanford University -- Image Matching 1
A Bag of Words
self-evident
Liberty truths
happiness
endowed
inalienable
Creator pursuit
Life
Representing a Text
as a “Bag of Words”
We hold these truths to be self-evident, that all men are created equal, that
they are endowed by their Creator with certain unalienable Rights, that
among these are Life, Liberty and the pursuit of Happiness. That to secure
these rights, Governments are instituted among Men, deriving their just
powers from the consent of the governed, That whenever any Form of
Government becomes destructive of these ends, it is the Right of the People
to alter or to abolish it, and to institute new Government, laying its foundation
on such principles and organizing its powers in such form, as to them shall
self-evident
seem most likely to effect their Safety and Happiness. Prudence, indeed, will Liberty truths
dictate that Governments long established should not be changed for light
and transient causes; and accordingly all experience hath shewn, that mankind happiness
are more disposed to suffer, while evils are sufferable, than to right themselves endowed
by abolishing the forms to which they are accustomed. But when a long train inalienable
of abuses and usurpations, pursuing invariably the same Object evinces a
design to reduce them under absolute Despotism, it is their right, it is their Creator pursuit
duty, to throw off such Government, and to provide new Guards for their
Life
future security.
Representing an Image
as a “Bag of Visual Words”
Feature descriptors
n Represent local pattern around a keypoint by a vector (“feature descriptor”)

n Establish feature correspondences by finding the nearest neighbor in
descriptor space
Scale/rotation invariant feature descriptors
72 deg
144 deg 144 deg
72 deg
180 deg
180 deg
n Scale invariance: extract features at scale provided by keypoint detection

n Rotation invariance:
l Detect dominant orientation by finding peak in orientation histogram
l Rotate coordinate system to dominant orientation
l Multiple strong orientation peaks: generate second feature point
SIFT descriptors
n SIFT - Scale-Invariant Feature
Transform [Lowe,1999, 2004]
n Sample thresholded image gradients at
16x16 locations in scale space
(in local coordinate system for rotation and
scale invariance)
n For each of 4x4 subregion, generate
orientation histogram with 8 directions
each; each observation weighted with
magnitude of image gradient and a
window function
n 128-dimensional feature vector
SURF descriptors
n SURF – Speeded Up Robust Features [Bay et al. 2006]

n Compute horizontal and vertical pixel differences, dx, dy (in local coordinate system for rotation
and scale invariance, window size 20σ x 20σ, where σ2 is feature scale)
n Sum dx, dy, and |dx|,|dy| over 4x4 subregions (SURF-64) or 3x3 subregions (SURF-36)
n Normalize vector for gain invariance, but distinguish bright blobs and dark blobs based on sign
of Laplacian (trace of Hessian matrix)
Computing feature descriptors
Σ dx Σ
dx
Σ dy Color
Gray Σ
Σ|dx| Σ
Σ|dy| Σ
dy
al e
Σ
SURF Descriptor
Σ dx SIFT Descriptor Dxx
sc
Σ
Σ dy Σ
Σ|dx| Σ Maxima
Σ|dy|
Dxy y
… … DxxDyy-
(0.9Dxy)2
Σ dx x
Σ dy
Dyy
Orient
Σ|dx| along
dominant
Σ|dy| gradient Oriented
Gradient Patch
Field
Filters Blob Response
“Bag of Visual Words” Matching
Pairwise
Comparison
Geometric mapping
n Notation: T
l Homogeneous coordinates; reference image x = x y 1
T ( )
l Inhomogeneous coordinates; target image x ! = x ! y ! ( )
n Translation
xʹ = ⎡⎣ I t ⎤⎦ x
xʹ = x + t or
n Euclidean transformation (rotation and translation)
⎡ cosθ −sin θ t ⎤
xʹ = ⎢
x ⎥
x
⎢ sin θ cosθ t ⎥
⎣ y ⎦
n Scaled rotation (similarity transform)

⎡ s ⋅ cosθ −s ⋅ sin θ tx ⎤
xʹ = ⎢ ⎥x
⎢ s ⋅ sin θ s ⋅ cosθ ty ⎥
⎣ ⎦
Geometric mapping
n Affine transformation
⎡ a a a ⎤
xʹ = ⎢ 00 01 02 ⎥x
⎢ a a a ⎥
⎣ 10 11 12 ⎦
n Motion of planar surface in 3d under orthographic projection
n Parallel lines are preserved
Geometric mapping
n Motion of planar surface in 3d under perspective projection
n Homography ⎛ h h h ⎞⎜ 00 01 02 ⎟
xʹ ∼ ⎜ h10 h11 h12 ⎟x
⎜ ⎟
⎜ h20 h21 h22 ⎟
⎝ ⎠
n Inhomogeneous coordinates (after normalization)
h00 x + h01 y + h02 h10 x + h11 y + h12

xʹ = yʹ =
h20 x + h21 y + h22 h20 x + h21 y + h22
n Straight lines are preserved
RANSAC
n RANdom Sample Consensus [Fischer, Bolles, 1981]
n Randomly select subset of k correspondences
n Compute geometric mapping parameters by linear regression
n Apply geometric mapping to all keypoints
n Count no. of inliers (closer than ε from the corresponding keypoint, typical ε = 1…3 pixels)
n Repeat process S times, keep geometric mapping with largest no. of inliers
n Required number of trials
Total probability of success
P=0.99
S=
(
log 1− P ) q=0.3
(
log 1− q k ) Probability of k=3 -> S=168
valid correspondence k=4 -> S=
566
n Use small number of correspondences
RANSAC with Affine Model
RANSAC with Homography
SURF features & affine RANSAC
Pairwise
Comparison
Local Feature Descriptor Aggregation
n Nearest-neighbor matching of variable-size sets of local features is costly
n Compare images based on a global binary signature of constant size
(“hash”) instead
n Simple: VQ of feature vectors to generate histogram,
compare non-empty histogram bins (“bag of features,” “bag of visual
words”)
n Better: binarize gradient of log likelihood of w.r.t. to parameter vector
(“Fisher vector”)
Comparing Feature Histograms
n Speed up by comparing histograms of features:
pairwise image comparison only for similar histograms
n Histogram intersection Query histogram Histogram of
database entry
ρ=
∑ i=1
min (Qi , Di )
n
∑ i=1
Di
[Swain, Ballard 1991]
n Equivalent to mean absolute difference, if both histograms

contain same number of samples
Growing Vocabulary Tree
[Nistér and Stewenius, 2006]



k=3

k=3

Querying Vocabulary Tree
Query
Hard Binning vs. Soft Binning
query
w1db = 0 w1q = 1 feature w2db = 1 w2q = 0 ⎛ ( d db )2 ⎞ ⎛ ( d q )2 ⎞ ⎛ ( d db )2 ⎞ ⎛ ( d q )2 ⎞
w1db ∼ exp ⎜ − w1q ∼ exp ⎜ − 2 ⎟ w2db ∼ exp ⎜ − w2q ∼ exp ⎜ − 2 ⎟
1 1 2 2
2 ⎟ ⎟
⎜⎝ σ ⎟⎠ ⎜⎝ σ ⎟⎠ ⎜⎝ σ 2 ⎟⎠ ⎜⎝ σ ⎟⎠
d1q
node 1 node 2 node 1 node 2
d1db
w1db + w2db + w3db = 1
database w1q + w2q + w3q = 1

node 3 node 3
feature
⎛ ( d db )2 ⎞ ⎛ ( d q )2 ⎞
w3db = 0 w3q = 0 w3db ∼ exp ⎜ −
3
⎟ w3q ∼ exp ⎜ − 2 ⎟
3
⎜⎝ σ 2 ⎟⎠ ⎜⎝ σ ⎟⎠
Hard Binning Soft Binning

[Nistér and Stewenius, CVPR 2006] [Philbin et al., CVPR 2008]
Stanford Mobile Visual Search Dataset
Stanford Mobile Visual Search Dataset
Querying: Hard Binning vs. Soft Binning
Precision ~ 97%
SURF features
6-level vocab tree
1M leaf nodes
Affine RANSAC
for 100 top tree results
25 inliers min.
Fisher Vector
n Discriminative score function
d-dimensional k-dimensional d Parameters

vector
d≫k feature vector
n Typical, we use Gaussian mixture model (GMM) for

n Parameters : mean (and variance) of Gaussian clusters
n For GMM, feature scores U(X) are soft-assigned distance vectors (and squared distance vectors)
relative to cluster centers
n Sums of feature scores of an image are “Fisher vector” that can be used to compare images
n Binarization & Hamming distance comparison results in only minor performance loss
(“Binarized Fisher vector”)
MPEG standard “Compact Descriptors for Visual Search” (CDVS)
xy-location needed for
Non-orthogonal object location (and
transform + geometric verification)
quantization
LoG
peaks
Query
512,
304, 1K,
384, 2K,
404, 4K,
Statistically optimized SIFT 1117, 8K,
based on peak
response, scale, descriptor Fisher vector 1117, 16K bytes
location, … based on GMM 1117 bytes
CDVS Evaluation Framework
Graphics
Paintings
Video Frames
Landmarks
Common Objects
`
1M Distractor Images
MPEG CDVS Performance
On-Device Image Matching Demo
Demo Video
Database of 100K Images
Samsung Galaxy S3 Smartphone
On-Device Timing Measurements
Samsung Galaxy S3 Smartphone
1.4 GHz Processor
1 GB RAM
Database of 100K Images
400 queries
100
Global signature
database search
80
Feature 54%
extraction
Frequency
60
32%
40
20
14%
0
0.5 0.6 0.7 0.8 0.9 1 Geometric
Time (sec) verification
Augmented Reality Glasses
Right-eye LCD Left-eye LCD
Camera
Android
controller

14-Featurebased Image Matching PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

14-Featurebased Image Matching PDF

Загружено:

Авторское право:

Доступные форматы

Feature-based methods for image matching

n Bag of Visual Words approach

n Represent local pattern around a keypoint by a vector (“feature descriptor”)

n Scale invariance: extract features at scale provided by keypoint detection

n SURF – Speeded Up Robust Features [Bay et al. 2006]

Σ dx SIFT Descriptor Dxx

n Scaled rotation (similarity transform)

h00 x + h01 y + h02 h10 x + h11 y + h12

n Equivalent to mean absolute difference, if both histograms

[Nistér and Stewenius, 2006]

[Nistér and Stewenius, 2006]

[Nistér and Stewenius, 2006]

[Nistér and Stewenius, 2006]

[Nistér and Stewenius, 2006]

database w1q + w2q + w3q = 1

Hard Binning Soft Binning

d-dimensional k-dimensional d Parameters

n Typical, we use Gaussian mixture model (GMM) for

Samsung Galaxy S3 Smartphone

Right-eye LCD Left-eye LCD

Вам также может понравиться