Вы находитесь на странице: 1из 8

Available online at www.sciencedirect.

com

Physics Procedia 33 (2012) 811 – 818

2012 International Conference on Medical Physics and Biomedical Engineering

Panorama Stitching Based on SIFT Algorithm and


Levenberg-Marquardt Optimization
ZHONG Min ,ZENG Jiguo, XIE Xusheng
College of Computer Information Engineering
Jiangxi Normal University
Nanchang, China
zhong_min@hotmail.com,jgzeng@qq.com,xushengxie@163.com

Abstract

Panorama stitching algorithm based on scale invariant feature transform and Levenberg-Marquardt optimiza-tion is
proposed. Mapping the source planar images to a cylind-rical surface is the first step. Then, correct matching feature
pairs are detected between two images by SIFT and RANSAC algorithm. Finally the Levenberg-Marquardt algorithm
is used for calculating the initial estimates of motion parameters. Experiment results show that this method is
effective against the hand-held camera motion with small rotation around the optical axis, small horizontal and
vertical movement.

©2012
© 2011Published
Published by Elsevier
by Elsevier Ltd. Selection
B.V. Selection and/or
and/or peer peer-review
review under responsibility
under responsibility of [name Committee.
of ICMPBE International organizer]
Open access under CC BY-NC-ND license.
Keywords—panorama; cylindrical projection; SIFT algorithm; Levenberg-Marquardt Optimization

Introduction

Panorama is a very popular image-based rendering technique. It is a simply way to simulate a reality
sense. To build a panorama, we need mosaic the images which express different regions of one sense.
General speaking, we must register all the images to the same coordinate system, then determine the scope
of their overlapping, after fusion processing we can gain a panorama.
The most frequently used traditional method of generate panorama is 8-parameter method as described
in [1]. This method use non-linear least square method to estimate motion parameters between two images,
and then calculate the camera focal length according to these parameters. Then by solve the camera
rotation matrix between two images to get the relative relation of camera coordinate system and standard
system. After those processes, map the planar images to the cylindrical or spherical surface to get the
panorama. But the initial estimate of motion parameter must given by user and the overlapping area could
not too small. Zhong and Zhang [2] used a tripod just let the camera doing plane rotary motion. Then
project the planar images onto the cylindrical surface to ensure the invariance of the geometric relationship.

1875-3892 © 2012 Published by Elsevier B.V. Selection and/or peer review under responsibility of ICMPBE International Committee.
Open access under CC BY-NC-ND license. doi:10.1016/j.phpro.2012.05.139
812 Zhong Min et al. / Physics Procedia 33 (2012) 811 – 818

Then extract some pixels in overlap region of first image to construct a feature template. Searching the
proximal region in the second image can match these two images. This method only calculates the
horizontal and vertical movement of two images, so need a high-quality tripod to overcome rotation around
the optical axis and small horizontal and vertical movement. Eventually get the cylindrical panorama.
Currently used panorama generation method mostly based on the above two documents. Usually on the
basis of their method, change the match method to the more robust image matching algorithm which based
on features (feature point [3][4], feature line [5] and so on). After extract a considerable amount of the
correct matching point or matching line, use the method of parameter estimation to determine the spatial
relationship between images, and then mosaic images to a panorama.
In order to reduce the source image capture conditions, this paper propose a method using the exact
match results of SIFT algorithm and fast convergence Levenberg-Marquardt optimiz-ation method. After
project the images to a cylindrical surface, gain the matching feature point pairs by SIFT algorithm and
remove the mismatch by RANSAC algorithm. Then use those correct matching point pairs and Levenberg-
Marquardt method to calculate the initial estimates of motion parameters. In the process of solve the
motion parameters a simplified motion model is used. Finally obtain a panorama.

Cylindrical projection transformation

When taking the source images, the camera lens must be moved to shoot the different senses of different
directions. This means that the coordinate of every image is not the same. Stitching the input images
directly is not accord with cylindrical environment model. To maintain the effectiveness of that model, we
must map each planar image to a cylindrical surface which radius is equals to focal length of camera.
Figure 1 shows the projection method. The image I and K are the planar image (i.e. input image) and
cylindrical surface. O is the camera optical center. This part is to get the projection J on the cylindrical
surface K of image I observe from the point O . Based on their relationship, the relation between point
P ( x , y ) on image I and corresponding point Q ( x , y ) on image J can be described as:

W
x
x r arctan( 2 ) r arctan( W )
r 2r
H (1)
r( y )
H 2
y
2 W 2
r 2 (x )
2

where

W
r . (2)
2 tan
2

The coordinates after cylindrical projection are floating-point numbers, so we use the bilinear
interpolation to calculate the RGB value of every pixel on image J .
Zhong Min et al. / Physics Procedia 33 (2012) 811 – 818 813

Figure 1. Cylindrical Projection Transformation

Feature matching

The key of image mosaic is to determine the spatial relationships of two images. The well-known SIFT
(Scale-Invariant Feature Transform, [6]) algorithm which is developed by David Lowe in 2004 proposed
that the algorithm can extract massive feature points in two images and find the matched feature points.
A. Feature detection
The only scale-space kernel is the Gaussian function, so the scale-space is defined as L ( x , y , ) which is
the convolution of a variable-scale Gaussian G ( x , y , ) , with an input image I ( x , y ) :

L ( x, y , ) G ( x, y , ) * I ( x, y ) , (3)

where * is the convolution operation, and

x2 y2
1 2 2
G ( x, y , ) 2
e , (4)
2

where is a scale-space factor. Smaller the value, the scale of image is smaller. Large-scale corresponds
to the profile of the image characteristics, small-scale corresponding to image details.
814 Zhong Min et al. / Physics Procedia 33 (2012) 811 – 818

To efficiently detect stable keypoint locations in scale-space, David Lowe use scale-space extrema in the
difference-of-Gaussian(DOG) function convolved with the image, D ( x , y , ) , which can be computed
from the difference of two nearby scales separated by a constant multiplicative factor k :

D ( x, y , ) (G ( x, y, k ) G ( x, y, )) * I ( x, y )
. (5)
L ( x, y , k ) L ( x , y , )

Figure 2 shows the construction of image pyramid [7]. Left side of the figure demonstrates two octaves
of a Gauss scale-space image pyramid with 2 intervals. The first image in the second octave is created by
down-sampled the previous octave images by a factor of 2. The right side means that the differences of two
adjacent intervals in the Gaussian scale-space pyramid create an interval in the difference-of-Gaussian
pyramid.

Figure 2. Gaussian scale-space image pyramid and DOG pyramid

Figure 3 illustrates that to detect the maxima and minima of the difference-of-Gaussian images, we need
comparing a pixel (In the center) to its 26 neighbors in 3 3 regions at the current and adjacent scales
( 9 9 8 26 ). And then get rid of the keypoints in low contrast or at the edge to get steady keypoints.
Zhong Min et al. / Physics Procedia 33 (2012) 811 – 818 815

Figure 3. Detection of scale-space extrema in DOG images

B. The local image descriptor


In order to obtain the descriptor which is invariance to image rotation, we need assign a consistent
orientation to each keypoint based on the image properties. Using the gradient of the keypoints and
adjacent pixels distribution direction, the gradient magnitude and orientation can be calculated:

( L( x 1, y ) L( x 1, y )) 2
m ( x, y ) (6)
( L( x, y 1) L( x, y 1)) 2

1 L( x, y 1) L( x, y 1)
( x, y ) tan (7)
L( x 1, y) L( x 1, y)

where L is the scale of each keypoint. An orientation histogram is formed from gradient orientations of
sample points within a region around the keypoint. This histogram has 36 bins, each 10 degree is an
orientation. Each sample is weighted by its gradient and by a Gaussian-weighted circular widow with a
that is 1.5 times that of the scale of the keypoint. The peaks in the histogram are the dominant directions of
local gradients. The highest peak and any other local peak that is within 80% of the highest peak is used to
create a keypoint with that orientation. When the position, scale and orientation of keypoints are
determined, these samples are then accumulated into orientation histograms summarizing the contents over
4 4 subregions. Every pixel has 8 orientations, so an 4 4 8 128 array forms a keypoint descriptor.
C. Feature point matching
Since the keypoints extract by SIFT algorithm for image scaling, rotating, translation is invariant, so the
algorithm is robust. The corresponding relation is obtained by comparing the distance of the nearest
neighbor to that of the second-nearest neighbor. The Best-Bin-Fit (BBF) algorithm [8] is generally used to
find the nearest neighbor and second-nearest neighbor. Because the algorithm still has some mismatch, we
need use RANSAC algorithm to purification the matching set after the matching process.

Image registration

D. Registration model selection


The main movement of camera shooting is horizontal rotation, but when held camera only uses our hand,
can cause little rotation around the optical axis. As long as the level of camera movement, tilt and rotate
816 Zhong Min et al. / Physics Procedia 33 (2012) 811 – 818

around the optical axis is small, the experiments show that you can use the affine transformation model to
estimate the spatial relation-ship between the two images.
After affine transform, point P ( x, y ) becomes Q ( x , y ) , so

x cos sin x u
r (8)
y sin cos y v

where r is scaling, is the angle of rotation, u and v denote the x-direction and y-direction motion. Here
is very small, r 1 , so, we can simplify the model to

x 1 x u
(9)
y 1 y v

Only need to calculate parameters , u and v , we can get motion information accurately
E. Refine the motion parameters by Levenberg-Marquardt algorithm
In order to recover these 3 parameters, we directly solve the nonlinear least squares problem

n
1
min E ( , u, v) min [( xi ai ) 2 ( yi bi ) 2 ]
2 i 1
n
(10)
1 1
min ri 2 ( , u, v) min RT R
2 i 1 2

where ( xi , yi ) is calculate from ( xi , yi ) , ( xi , yi ) and (ai , bi ) are the corresponding point of two images.
Generally, ai xi xi , bi yi . n is the number of correct corresponding point pairs. Iteratively update the
motion parameters using ( , u , v ) ( , u , v ) .
We use the Levenberg-Marquardt nonlinear minimization algorithm. Define X ( , u, v)T , the form of
iterative solution is

Xk 1 Xk Pk , (11)

where

Pk [ J ( X k )T J ( X k ) k I ] 1 J ( X k )T r ( X k ) (12)

J ( X k ) is the Jacobian, where


Zhong Min et al. / Physics Procedia 33 (2012) 811 – 818 817

r1 ( , u, v) r1 ( , u, v) r1 ( , u, v)
u v
r2 ( , u, v) r2 ( , u, v) r2 ( , u, v)
J(Xk ) u v , (13)

rn ( , u, v) rn ( , u, v) rn ( , u, v)
u v

and

ri ( , u , v) ( yi bi ) x i ( x i a i ) y i
ri ( , u, v)
ri ( , u , v) xi ai
(14)
u ri ( , u, v)
ri ( , u , v) yi bi
v ri ( , u, v)

For the selection of initial value iteration, Parameter can take the initial value of 0. The initial values
of parameter u and v can be determined from the matching point pairs as follows

n
1
uinit (ai xi )
n i
n . (15)
1
vinit (bi yi )
n i

The complete process of iterative calculation consists of the following steps; the damping parameter
can be updated as described in [9].
1) Define k 0 , , a threshold value 1.0e 10 .
2) For each pixel i at location ( x, y ) , compute its corresponding position in the other image ( x , y )
using (9). The coefficient matrix is X k . Then compute the totally error E ( X k ) by (10).
3) Calculate the Jacobian J ( X k ) .
4) Gain the Pk using (12).
5) Using (11) to get the parameters estimate X k 1 and get E ( X k 1 ) .
6) Check that the error has decreased; if decreased, make k k 1 (update the coefficient to X k 1 ),
reduce , jump to 2); if not, increase , jump to 4); if decreased and the difference between E ( X k ) and
E ( X k 1 ) is less than , jump to 7).
7) Return the final motion parameters matrix.

Result and conclusion

Panoramas are created by the method proposed in this paper. Although all images are taken from a hand-
held camera, the movement and the rotation around the focal axis cannot too big. Figure 4 shows the
panoramic mosaic. The result is satisfactory.
818 Zhong Min et al. / Physics Procedia 33 (2012) 811 – 818

Figure 4. Panorama built using method in this paper

This paper proposes a new method of panoramic image generation. Take the source images doesn’t need
tripod or any other precise position equipments. A simplified motion model is used. This method first
project all images to the same cylindrical coordinate system, and then get the correct matching feature
points by using SIFT algorithm and RANSAC algorithm, finally calculate the extract motion parameters
with Levenberg-Marquardt optimization method. The initial values of motion parameters can be received
from matching feature point pairs. Experimental results show that the method is reliable to achieve
panorama. Due to the SIFT algorithm is too computationally and require more memory, the process need
more running time.

Acknowledgment

The authors would particularly like to acknowledge the following individuals: WANG Fei, for helping
on code optimization; HE Biqin, for her mathematical theory teaching. This research was supported by
Education Office Youth Fund of Jiangxi Province.

References

[1] R. Szeliski and H. Shum, “Creating Full View Panoramic Image Mosaics and Environment Maps,” C. Proceedings of the
24th annual conference on Computer graphics and interactive techniques, 1997, 251 – 258.
[2] L. Zhong, M. Zhang, L. Sun and Y. Li, “Algorithm and Implementation for 360-Degree Cylindrical Panoramic Image,” J.
Mini-Micro Systems, vol. 20, pp. 899-903, December 1999.
[3] J. Cao, J, Feng and Z. Su, “A Panoramic Image Mosaic Algorithm,” J. Journal of Dalian University of Technology, vol.
43, pp. 180-182, October 2003.
[4] X. Zhang, “Research on The Key Technology of Panorama Mosaic,” D. Harbin Engineering University, March 2009.
[5] H. Zhang and D. Cui, “Study and Implementation of Algorithm in Creating Panoramic Image,” J. Computer Engineering,
vol. 28, pp. 95-96, April 2003.
[6] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” J. International Journal of Computer Vision, vol.
60, pp. 91-110, 2004.
[7] Y. Wang, Y. Wang and Y. Liu, “Image Stitch Algorithm Based on SIFT and Wavelet Transform,” J. Transactions of
Beijing Institute of Technology, vol. 29, pp. 423-426, May 2009.
[8] J. Beis and D. Lowe, “Shape indexing using approximate nearest-neighbour search in high-dimensional spaces,” C.
Proceedings of IEEE 1997 Computer Society Conference on Computer Vision and Pattern Recognition, 1997, 1000-1006.
[9] K. Madsen, H.B.Nielsen and O.Tingleff. (2004). Methods for Non-linear Least Squares Problems (2nd Edition). IMM,
DTU. Available: http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf.

Вам также может понравиться