Вы находитесь на странице: 1из 17

Machine Vision and Applications (2011) 22:563579 DOI 10.

1007/s00138-010-0248-1

ORIGINAL PAPER

Hand-held 3D scanning based on coarse and ne registration of multiple range images


Soon-Yong Park Jaewon Baek Jaekyoung Moon

Received: 10 March 2009 / Revised: 25 November 2009 / Accepted: 14 January 2010 / Published online: 9 February 2010 Springer-Verlag 2010

Abstract A hand-held 3D scanning technique is proposed to reconstruct 3D models of real objects. A sequence of range images captured from a hand-held stereo camera is automatically registered to a reference coordinate system. The automated scanning process consists of two states, coarse and ne registration. At the beginning, scanning process starts at the ne registration state. A fast and accurate registration renement technique is used to align range images in a pair-wise manner. If the renement technique fails, the process changes to the coarse registration state. A featurebased coarse registration technique is proposed to nd correspondences between the last successful frame and the current frame. If the coarse registration successes, the process returns to the ne registration state again. A fast point-to-plane renement technique is employed to do shape-based registration. After the shape-based alignment, a texture-based renement technique matches texture features to enhance visual appearance of the reconstructed models. Through a graphic and video display, a human operator adjusts the pose of the camera to change the view of the next acquisition. Experimental results show that 3D models of real objects are reconstructed from sequences of range images.

Keywords Hand-held 3D scanning Multi-view range images Registration renement Coarse registration

1 Introduction Three dimensional (3D) range data of an object can be acquired from several sensing techniques such as laser ranging, structured light, stereo vision, and so on. Due to the recent advances in such sensing techniques, multi-view 3D reconstruction of real objects is of very interests in computer vision and computer graphics researches. To reconstruct a complete 3D model of an object, multiple range images should be acquired from different viewing directions to obtain the partial shapes of the object. Then, the range images need to be merged to obtain the 3D model. In common, multi-view 3D reconstruction means a complete process of 3D model generation from multiple range data. Multi-view reconstruction, however, is a very time consuming task. One reason is the acquisition time of multiple range images. Even though some real-time 3D acquisition systems are introduced nowadays, general 3D ranging techniques require a lot of time to obtain multiple range images. For example, the laser ranging technique requires many images of laser stripes even for a single view reconstruction. Therefore, reconstruction of a large scene usually takes several hours. Similarly the structured light technique needs a couple of or many pictures of specially designed patterns to obtain a single range image. The stereo vision technique has inherent matching problems which are complex and time consuming tasks to solve. Another reason is the alignment problem of multi-view range images. Each range image captured by a range sensor is represented by an independent coordinate system regardless

S.-Y. Park (B) Department of Computer Engineering, Kyungpook National University, Daegu, 702-701, Korea e-mail: sypark@knu.ac.kr J. Baek NeoMtel Incorporation, Seoul, 135-080, Korea J. Moon School of Electronics, Electrical and Computer Science, Kyungpook National University, Daegu, 702-701, Korea

123

564

S.-Y. Park et al.

of sensor pose. Therefore, range images of different coordinate systems should be aligned with respect to a common coordinate system before being merged into a single 3D model. This process is called registration. Sometimes, the initial poses of range images are known if they are captured by a calibrated range sensor. However, the initial poses should be rened to reconstruct accurate 3D models. Registration renement is a very important task in multi-view 3D reconstruction. Accuracy of a reconstructed 3D model is directly affected by the accuracy of renement. With this reasons, many registration renement techniques have been investigated in recent years. Besl and Mckay [2] propose the idea of ICP (Iterative Closest Point) technique. Registration renement is minimizing the transformation error between the matching conjugates between different range images (Actually they are range surfaces). ICP assumes the closest points between them as the matching conjugates. Based on the ICP technique, various extensions are investigated. Some of them are as follows. Johnson and Kang [7] modify the ICP technique to combine color information in registration. Levoy et al. [9] introduce a voxel-based registration technique. Their technique needs triangulation of range images to measure signed distance from a voxel to overlapping range images. Huber and Hebert [5] introduce a graph-based registration technique which can work without initial poses of range images. Urfalioglu et al. [22] use a global optimization technique for uncalibrated registration. ICP can be used for large scale city modeling. Akbarzadeh et al. [1] combine INS and GPS data to register multi-view images for city-modeling. Recently, there are some investigations to reduce the time of multi-view 3D reconstruction based on on-line or interactive methods. Those methods are categorized by two scanning types. The rst scanning type uses a hand-held sensor to scan a xed object. The other type uses a xed sensor to scan a hand-held object. Figure 1 shows two different types

Fig. 1 Registration problem of a hand-held 3D sensor

of on-line scanning. In the left of the gure, an object is held by hand and a xed sensor is used to capture range images. Suppose the object in hand is rotated by a small angle and the two range images are obtained by the sensor before and after rotation. Then, the two poses of the object in the images are close enough to run a renement algorithm to align the images. This is due to the xed camera coordinate system. On the contrary, suppose a hand-held sensor is rotated by the same angle as shown in the right of the gure, then the displacement of the object in the two range images becomes much larger than that of the xed sensor due to the rotation of the camera coordinate system. Because each range image of a hand-held sensor is represented by an independent camera coordinate system, unstable scanning motion can cause registration failure. Figure 2 shows an example. Here, we scan an object with the same distance and direction, but different scanning speed. In Fig. 2a, registration of range images fails, thus the gure shows incorrect position and orientation of the sensor (small white dots are camera centers and short red lines are their orientations). With slower scanning speed, aligned range images and correct camera poses are obtained as shown in Fig. 2b. On-line 3D modeling using a hand-held sensor has inherent problems as mentioned in the previous paragraphs. In this reason, only a few investigations address the on-line 3D scanning or modeling problems. Liu and Heidrich [10] propose a stereo-based on-line registration system based on real time processing hardware. Jaeggli and Koninckx [6] acquire multi-view range images using a pattern projector and register them on-line. Their approach uses a xed sensor and a turn table to rotate an object. Therefore, initial poses of range images are known in advance. Popescu et al. [15,16] propose a real-time modeling system using a scanning rig which consists of calibrated laser dots and a video camera. The 3D orientation of projected laser dots are recorded and registered in about ve frames per second. Their approach reconstructs 3D surface models by triangulation of sparse 3D points. Rusinkiewicz et al. [17] propose a real-time registration system using a pattern projector and a video camera. A pattern projector is used to project coded patterns to a hand-held object. 3D point clouds of the object are registered in near real-time using a point-to-projection renement technique. Se and Jasiobedzki [19] introduce a 3D modeling system using a hand-held stereo camera. They use texture features to register a sequence of range images for crime scene reconstruction. Yun et el. [24] uses a hand-held stereo camera to acquire range images of indoor scenes. Their off-line process of 3D reconstruction and model registration uses the well-known SIFT algorithm. Hilton and Illingworth [4] use a laser sensor attached at the end of an articulated robotic arm. The arm has six degree-of-freedom and the pose of the sensor is measured with respect to a global coordinate

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images Fig. 2 a Erroneous registration, b successful registration

565

system. Therefore, their approach do not need to consider the registration problem. Matabosch et. al. [12] propose an on-line registration technique which minimizes propagation error which is inherent in pair-wise registration. They use a point-to-plane registration technique which is similar to that used in our method. In this paper, we propose an on-line 3D scanning technique which is based on a hand-held stereo sensor. A sequence of range images captured from the sensor is registered on-line. The on-line scanning process consists of two states, renement and coarse registration. The process begins at the renement state. A registration renement technique continuously registers range images in a pair-wise manner. To overcome renement failure due to unstable hand motion, a fast coarse registration technique is proposed. If the renement technique fails, scanning process changes to the coarse registration state. At this state, we match shape features between two different range images, and register them using the matching results. To reduce matching time, we sample depth edges as shape features. Matching process is done in a hierarchical manner for fast processing. The coarse registration estimates the initial pose between a pair of range images so that the renement technique can resume. Once the initial pose is close enough, the renement step begins to register subsequent frames again. The renement technique uses both 3D shape and texture information for accurate 3D model reconstruction. Sampled texture features by the KLT tracker are used in both shape-based and texture-based registration [21]. Using a graphic and video display, a human operator can see registered range images and plan the next view interactively. Such interaction enhances the 3D scanning performance. Experimental results show that the proposed technique can register a long sequence of range images. Using the stereo sensor, we register range images at 1.2 frames per second. Error analysis of 3D reconstruction results in 1.2 mm aver-

age error with 1.8 mm standard deviation. Also, 3D models of real indoor and outdoor objects are shown.

2 System overview Figure 3 shows a ow diagram of the proposed 3D scanning technique. Acquisition of multi-view range images is done by a stereo vision camera, BumbleBee from Pointgrey incorporation [26]. Both range and texture images are acquired simultaneously from the camera. To separate foreground objects from the background, we remove some portions of range images, which are farther or closer than the working range from 0.3 to 2 m. A range image captured from the camera is registered to the previously aligned range image. Therefore, all acquired range images are registered in a pairwise manner. The rst frame from the camera becomes the reference frame and the others are registered to the reference frame. For pair-wise registration, 3D points from two range images are sampled. The samples are chosen by the feature selection routine of the KLT tracker [20,21]. Because that the KLT features are sampled usually from high contrast textures, they are good features to track in a 2D sequence of small motion. Therefore, later in the modeling process, we combine them with shape-based registration renement technique. To use 3D features associated to the 2D features, we remove such KLT features which have no depth values. Because background ranges are removed beforehand, 2D features in the background are removed too. 3D features on the range images are registered iteratively by shape-based registration followed by texture-based registration. From the given 3D features in the current range image, their correspondences are determined by a point-to-plane registration technique [13]. The transformation between the current and

123

566 Fig. 3 Flow diagram of the proposed 3D registration technique

S.-Y. Park et al.

Start
Range/Te /Texture acquisition Foregro ground extraction Yes Refinement Re Fine registration 2D/3D feature selection Point-to-plane registration < 10
No. of Control point

No Coarse registration Depth edge sampling Depth edge matching < 10


No. of Control point

Texture-based registration

Registr istration Error

Yes

Coarse mode = true Refinement = false

No Coarse m e mode = false Refinem = true nement No Enoug frames ? ugh Yes

End
the previous range images is derived by the least-square manner. After shape-based registration, a modied KLT tracker renes the registration result again. 2D features are used in the texture-based registration step. Owing to the shapebased registration renement, projections of the 2D features in one texture image are close to their correspondences in the other image. Thus, the processing time of feature matching is reduced. We modied the original KLT tracking algorithm to make the correspondence search fast and accurate [14]. The 3D scanning scheme consists of two states. As shown in Fig. 4, the scanning scheme starts at the ne registration state. In each registration renement step, we measure registration error to determine if the alignment is successful or not. If the current registration is determined to fail, we change the state of scanning to the coarse registration state. Once a human operator fail to register a range image, based on our several experiments, it is difcult for him to align the sensor very close to the last successful pose. Even though a graphical user interface is displayed on-line during acquisition, using only a renement algorithm is not enough to recover the failure. Instead, we introduce a coarse registration technique. After successful coarse registration, we resume registration renement. The scanning process is done continuously until enough range images are obtained and registered.

start

stop

Success Success Fail Fail

Pairwise Refinement

Coarse registration

Registration error

Fig. 4 State diagram of the proposed 3D scanning process

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images

567

3 Registration renement

Q
3.1 Shape-based range image renement In this section, we describe a shape-based registration renement technique. In general, there are three main categories of shape-based registration techniques. Figure 5 shows simple diagrams of them. In Point-to-point registration, the conjugate of a control point p on the source surface is determined as the closest point q on the destination surface. An error metric ds is the distance between two control points. Point-to-plane registration is another common technique. It searches the intersection on the destination surface from the normal vector of the source point. As shown in the Fig. 5b, the destination control point q is the projection of p onto the tangent plane at q which is the intersection from the normal of p. Point-to-projection approach is known to be a fast registration technique. As shown in the Fig. 5c, this approach determines a point q which is the conjugate of a source point p, by forward-projecting p from the point of view of the destination O Q . In order to determine the projection point, p is rst backward-projected to a 2D point p Q on the range image plane of the destination surface, and then p Q is forward-projected to the destination surface to get q. This algorithm is very fast because it does not include any searching step to nd the correspondence. However, one of its disadvantages is that the result of registration is not as accurate as those of the others [17]. In this paper, we use a point-to-plane technique called IPP (iterative-projection-point) method for shape-based registration [13]. This technique combines with the point-to-projection technique to reduce processing time. Let us briey explain the IPP method. To align two surfaces S and D shown in Fig. 6, we need to nd correspondences between the two surfaces. For example, P0 in S corresponds to Q in surface D which is the intersection of the normal vector at P0 . First, we project P0 in I D , the 2D image of Q surface and ne coordinate p D . Then we can nd point Qr using range image of D surface that corresponds to the coordinate of p D . A 3D point P1 is found from the projection Qr point to the normal of P0 . Applying this iteratively n times, we can nd Q which is the convergence point of Pn .Q is then determined by the

Qr

Q D p0 p1
Tangent plane

p0

vp
S
ID

pD

Fig. 6 Iterative projection point algorithm

projection of P0 to the tangent plane of Q. Using K correk sponding pairs P0 and Q k , we nd the transformation matrix T = [R|t] that minimizes a registration error between two point sets using Eq. (1). We use the SVD (singular value decomposition) method to solve the equation.
K

=
k=1

k Q k (RP0 + t) .

(1)

This process calculates the Euclidean transformation matrix T between two surfaces repeatedly. Registration error is measured using rotation R and translation t of the transformation. We nd that registration error after about 30 iterations does not decrease in common. In general, source control point set P can be selected by sampling the source surface randomly or uniformly, and ltered by some constraints to delete unreliable control points. Selection of IPP control points are described in the next section. 3.2 Selection of IPP control points To run IPP, some 3D points must be sampled as control points. Sampling control points usually affects registration performance[18]. In this paper, a couple of constraints are applied to select the control points as follows. Texture features of the KLT tracker are used as the control points. Invalid features are removed if they are in the background or have no 3D correspondences. If the distance between 3D correspondences is too long, they are regarded as invalid correspondences. As shown in the left of the Fig. 7, only 2D features in the object area are used in both shape-based and texturebased renement steps. In the right of the gure, the normal vector of each point is compared with that of the viewpoint to remove unreliable

q
q
ds

q'
p
ds

ds

p OP OQ

(a)

(b)

(c)

Fig. 5 Three catagories of registration techniques. a Point-to-point, b point-to-plane, c point-to-projection

123

568

S.-Y. Park et al.

TS rD(qD) rS(pS)

qD
Fig. 7 Feature point selection. Reliable KLT features are sampled and out of range points are removed

pS IS large search area

pS

ID

Sa

Sb

Sc

Sd

Sa Sb

Sc

Sd

(a)
Sn

(a)

rD(qD)

rS(pS)

Sa

Sb

Sc

Sd

Sa Sb

Sc

Sd
qD ps pS IS small search area

Tab

(b)
Fig. 8 a Pair-wise registration problem, b initialization using the previous transformation

ID

(b)
Fig. 9 Comparison of KLT search range a before and b after shapebased registration

features. We remove a control point if the dot product of two vectors is greater than 0.5. [3,23,24]. In case that there is large pose difference, tracking of 2D texture features is not easy due to the wide baseline between views. In consequence, tracking time increases and success rate also decreases. One may lter out unsuccessful features by using a RANSAC algorithm, however it also take extra time for pose estimation. For accurate tracking, the size of search region and tracking windows must be traded off with tracking performance and time. In this paper, we improve the performance of correspondence tracking by reducing search range. The original KLT tracking algorithm is modied by employing the results of shape-based registration. In the original KLT tracker, the coordinates of the starting point of correspondence search in the current image are always the same with those in the previous images. In Fig. 9a, a feature point p S in image plane Is is shown also in the other image plane I D . The original KLT tracker starts search its correspondence q D , from the coordinates of p S . Therefore, existing KLT tracker needs a wide search area to nd the correspondence if images are obtained from much different viewing directions. Figure 9b shows a modied KLT method used in our experiments. When the range of a 2D feature p S is denoted as r S (p S ), we can compute p S from the projection of r S (p S ) to

3.3 Pair-wise registration When consecutive range images are registered to a reference coordinate system, an initial registration error need to be small. Fig. 8a shows a simple case. After the shape Sb registered to Sa , the next shape Sc should be registered to Sb which is aligned already. However, there is a large initial error between Sb and Sc , which makes it difcult for shape Sc to be aligned to Sb . If the initial registration error i between two range images is large, the renement step is subject to fail to align them. A simple solution of this problem is transforming the current range image based on the previous alignment result. This brings the current range image to the coordinates of the previous range image, which is already in the common coordinate system. In Fig. 8b, the transformation Tab which brings shape Sb to Sa is applied to surface Sc to initialize registration. 3.4 Texture-based registration method Texture-base registration techniques employ 2D feature correspondences to estimate the 3D pose between range images

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images Fig. 10 Feature tracking comparison a modied KLT b original KLT

569

I D . When the range image r s is registered by transformation T S , it is projected to image I D by multiplying the perspective projection matrix M. The projection matrix is obtained from the calibrated camera in advance. pS MT S rs (p S ) = (2)

When the shape-based registration result is considered, the starting coordinates of the KLT tracker become very close to its correspondence. This however assumes the shape-based registration has small registration error. In most case, registration result is accurate enough to bring the 2D correspondences in very close distance. Thus, using a small search region, we can track texture features very fast to further rene the pose. Figure 10 shows an example of texture feature tracking. In the gure, the green dots represent texture features and the white lines represent tracking results between two consecutive images. Using a smaller tracking window than that of the original KLT, almost all features are successfully tracked in the modied KLT. Because most features are successfully tracked, we can estimate the transformation without any RANSAC algorithm.

we move the state to coarse registration. In the coarse registration state, a range image is captured and matched with the previously aligned range image. If it is not successful, the capturing and matching steps are done repeatedly until it is successful. If the coarse registration is successful, the scanning state returns to renement step. Decision of coarse registration is done by measuring the pose error between the correspondences. Let Tn1 and Tn be the transformation matrix of the (n 1)th and nth range images. Then the transformation Tn1,n = Tn1 T1 n (3)

can be considered as an error between them. To measure the registration error between (n 1)th and nth range images, we need to multiply Tn1 and T1 to the corresponding range n images, and compute rotation and translation error between them. From matrix Tn1,n = [ri j |ti ], rotation and translation errors are computed as follows. , (4) 3 K k k k=1 Q P0 , (5) t = K where k is the number of correspondences between images. Rotation error is measured by the difference between the identity matrix and the rotation matrix. Rather than using the translation vector in Tn1,n , we use the average distance between the correspondences as the measure of translation. As mentioned in an earlier section, we provide a human operator a graphic user interface to plan the view of the camera and check the status of on-line scanning.
R

2 i=0 (Iii

rii )2

4 Handling registration failure Hand-held 3D scanning is a convenient way to reconstruct 3D models of real objects. However, one drawback of this method is that the view of the 3D sensor is controlled by hand. One may think that hand motion is very controllable by a human operator. However, in terms of 3D registration, a very small motion of the hand yields large displacement between consecutive images. With this reason, conventional 3D scanning is done by on-the-shelf systems. To reconstruct 3D shapes of real objects, we need to acquire range and texture images as many as possible. Therefore, it is needed to cope with the hand motion during scanning. During the registration renement state, we check if the alignment of the current image is successful or not. If not,

5 Coarse registration 5.1 Sampled depth-edge block matching In this section, we propose a coarse registration technique to match two wide baseline range images. We call our coarse

123

570 Fig. 11 Matching strategy of SDBM

S.-Y. Park et al.

Fig. 12 Sampling of depth edge points. From left, original depth edge, uniform sampling, and complete depth points

Fig. 13 An example of SDBM. a Matched depth points, b before registration, c after registration

registration method SDBM (Sample Depth edge Block Matching). Suppose there are two range images, source and destination as in Fig. 11. The source range image can be considered as the current image and similarly the destination image as the last range image aligned already in the reference coordinate system. From the source image, we pick some

shape features which are originally on the edge of the range image. Edge features in the range image are chosen because they are independent of texture change. Edge features can be acquired directly from the range image. In this paper, we apply the Sobel lter to nd edge features. A xed threshold value is used to determine edge features. Determining the

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images Fig. 14 Ten test pairs of coarse registration. From left, destination and source frames, Initial poses, depth and features of the two frames. From top, BT1, BT2, SB1, SB2,SM1, SM2, SC1, SC2, SL1, and SL2

571

threshold value is not so critical in this case because the edge features are sampled later to reduce the number of features. Given a sample point si in the source, we nd a matching point in the destination range image. To nd the matching

point, we dene two search regions, R1 and R2 . A square region R1 is dened by placing its center to the same coordinate with that of si . In R1 , let one of the depth edge points be di . Another square region R2 is dened similarly with

123

572 Fig. 15 Results of coarse registration. From left to right Point clouds and textured results of KLT, SPIN, and SDBM. The point clouds show the poses before applying IPP, but the textured results show after applying IPP

S.-Y. Park et al.

its center di . In R2 , a matching window W D is dened at each point di and its cost is measured with respect to W S , the matching window of si . The second search region R2 is dened at every destination points in R1 . Therefore, the

matching pair (si , di ) is determined to yield the least cost value between W S and W D . Let n w , n R2 , and n R1 be the number of depth points in the matching window, the number valid points in R2 , and

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images Table 1 Comparison of registration error ( mm) Object BT1 Method KLT SPIN SDBM BT2 KLT SPIN SDBM SB1 KLT SPIN SDBM SB2 KLT SPIN SDBM SM1 KLT SPIN SDBM SM2 KLT SPIN SDBM SC1 KLT SPIN SDBM SC2 KLT SPIN SDBM SL1 KLT SPIN SDBM SL2 KLT SPIN SDBM Initial 48.3 63.7 40.3 43.7 74.5 37.1 115.4 54.2 48.4 84.9 48.5 55.5 62.8 57.3 87.8 138.8 72.5 42.4 54.1 42.4 24.4 44.7 24.1 28.8 38.2 38.1 30.6 103.0 61.1 Coarse 19.1 42.3 10.6 30.9 64.8 2.7 96.9 43.8 25.9 79.3 33.9 21.6 40.5 6.6 59.2 130.9 64.8 8.7 24.3 0.9 8.7 43.7 1.2 14.4 10.0 0.8 17.9 88.2 6.2 Coarse + IPP 0.9 1.3 1.2 0.8 1.1 0.7 7.9 2.6 3.9 1.5 3.1 2.8 0.5 2.7 0.5 5.9 130.9 6.9 2.6 2.2 0.8 0.7 2.6 0.7 4.3 0.4 0.3 3.4 4.0 0.6 S(uccess)/F(ail) S S S S S S F S S S S S S S S S F F S S S S F S F S S F F S

573

Fig. 16 Graphic user interface to assist a user adjusting the sensor pose

the edge. To make the matching process fast and reliable, we reject some depth edge points as follows. First depth edge points are sampled as shown in Fig. 12. In the left image, red-colored (dark-grey) points are edge points sampled from a range image. Second, we uniformly samples again to reduce the number of edge points as shown in the center in the gure. Finally, we remove some incomplete points. Here, we regard a point as incomplete if any neighborhood in the matching window W is a hole. We do the depth points sampling in both range images. The local matching point in R2 is determined by local(di ) = argmin cost si , d j R2 .
j

(8)

the number of depth edge points in R1 , respectively. Then computational complexity of nding the matching point is O(n w n R2 n R1 ). Cost C(si , di ) between the two matching windows is measured by the mean-normalized SSD (Sum of squared difference) as follows:
1 WS = nW 1 r S (i, j), W D = nW r D (i, j),
i, jW D

The nal matching point is a global minimum of si , which can be computed as global(si ) = argmin cost (local(d j ) R1 ).
j

(9)

(6)

i, jW S

C(si , di ) =

1 nW

((r S (i, j) W S ) (r D (i, j) W D ))2 ,


i, jW S

(7) where r S (i, j) and r D (i, j) are the depth at pixel (i, j) of the matching window. Depth edge points used in the matching algorithm are sampled from the original range images. As mentioned before, only depth edge points are used to facilitate high curvature of

Even though a destination point is found to be the global minimum, its cost C(si , di ) should be less than a threshold value. If not, we reject such point. Figure 13 shows an example of coarse matching. In Fig.13a, solid lines show matching pairs of features between two range images. In Fig.13b and c, the two 3D shapes displayed together to compare before and after coarse registration. In this case, 10 matching pairs are used for registration. Matching window size is 15 15 and the size of R1 and R2 are 100 and 50, respectively. Computing the transformation matrix from matching pairs is done by the same method with that of ne registration. Registration errors before and after registration are 38.54 and 3.58 mm, respectively. Initial

123

574 Fig. 17 Results of Beethoven a original images b range images, c selected features d tracked features, e camera orientations, f Reconstructed model

S.-Y. Park et al.

registration error is reduced much enough to start renement. 5.2 Comparison of coarse matching The proposed coarse registration technique can be considered to be a 3D shape matching technique. To match 3D shapes obtained from different views, other conventional

shape matching techniques can be considered. Spin image is one of the examples. By the way, conventional 2D matching techniques can be used also because we have texture images associated to every range images. In this section, we compare the performance of our 3D matching technique with other techniques, Spin image [8] and KLT [21]. Spin image is a 3D shape matching technique. Spin image is a 2D space of and which are mapping of 3D measure of

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images Fig. 18 Results of Sacheonwang a original images, b range images, c selected features, d tracked features, e camera orientations, f reconstructed model

575

a point with respect to neighboring surface points. To run the Spin image technique, the image size is set to 40 40 and the length of each bin of the image is set to 8 mm, which covers 160 mm from the measure point. To run KLT, the original KLT algorithm is used and the size of matching window is set to 15 15, as the same with that of our matching block

W. For pair comparison, we use the same number of features which are extracted from SDBM. Five objects are used for this comparison as shown in Fig.14. Two test frames shown in the rst and the second columns of the gure are sampled from the video sequence of the object. Corresponding range images are shown in the

123

576 Fig. 19 Results of Natural scene a original images, b range images, c selected features, d tracked features, e camera orientations, f reconstructed model

S.-Y. Park et al.

last two columns. The test range images are sampled so that the initial pose becomes too wide to register them by our renement algorithm. In the range images, depth features extracted by the SDBM are overlaid. In the third column, two range images are overlapped to show the initial pose between two frames. Figure 15 shows coarse registration results. The rst three columns show the results of KLT, SPIN, and SDBM. The second three columns show the same results with textured points.

KLT and SDBM show very similar results while SPIN sometimes yields erroneous results. The main reason is that SPIN has more degree of freedom than KLT and SDBM due to rotation and scale invariance. The rst three columns of the gure show the pose right after applying SDBM technique. The last three columns show textured range images but after applying IPP which is followed by SDBM. Table 1 shows registration error measured between two range images after applying each matching technique. In the table, decision of success

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images

577

and fail is given by visual inspection of results. Registration error is measured as the average distance between matching pairs.

Table 2 Registration and integration time (s/frame) Object Registration Integration 20 frames Beetoven 0.75 0.87 0.91 330 220 390 40 frames 705 N.A. 1105 60 frames N.A. 650 N.A.

6 Experimental results The stereo vision camera generates 15 range images per second. The resolution of a range image is 320 240. The proposed 3D scanning technique is applied to two real objects and one natural scene. A computer of a Pentium 3.4 GHz CPU is used. It takes about 800 ms to register per range image when both shape-based and texture-based registration renements are applied. In case that the shape-based registration is used only, about 600 ms is needed. 6.1 Online graphic interaction To register range images on-line, it is better for a human operator to view the status of scanning through a graphic display. Then the operator can adjust the view of the next image frames. During 3D scanning, we provide a user interaction based on graphic models and image display. The graphic interaction system presents the user the current registration status by rendering all range images. The system also shows the position and orientation of each camera displaying a graphic box and a line as in Fig. 16. Through the display, the user can get a visual feedback to adjust the speed of camera motion and the direction of camera. When a range image fails, the user can adjust the position and orientation of the camera to resume the registration process. In addition if there are holes in the registered range images, the user can acquire new range images to ll the hole. At one of the corners of the display, two images are shown in real time. The images are the current texture image and the previous texture image. The graphical user interface is developed by OpenGL. 6.2 Reconstruction results The rst 3D reconstruction experiment is performed using a plaster model of Beethoven. Total 40 range images are acquired continuously. In this experiment, the sensor rotates around the object in about 90 . The object is placed in front of a random-dot background and the background ranges are removed before registration. Fig. 17 shows some reconstruction results. From Fig. 17ad show input images, range images of the object areas, selected features, and feature motions. Frame number 0, 13, 29, and 39 are shown. In Fig. 17e, registered point clouds are shown with the all camera positions. In Fig. 17f and g, the front view of the registered range images and an integration result are shown. To

Sachenwang Natural scene

Table 3 Average registration error Object Registration error Translation ( mm) Beetoven Sachenwang Natural scene 0.75 0.82 4.55 Rotation 0.00052 0.00028 0.00032

integrate registered range images into a 3D mesh model, we the Marching Cubes algorithm [11]. For this reconstruction, the voxel size is set to 3 mm, and total 309,520 triangles are generated. Figure 18 shows results of another object called Sacheonwang. The object is in the museum of our campus and its height is about 1 m. An operator holds the range camera in hand and moves it around the object. Because the hand shaking, it was not easy to take continuous range images without error, however total 60 range images are acquired and registered. Due to the illumination condition of the museum, there are more range errors than Beethoven, however experimental results show its 3D model is reasonably reconstructed. Figure 19 shows experimental results of Natural scene. Due to the inherent noises in natural objects, some parts of reconstruction show blur patterns. However, reconstruction of 40 range images yields the natural 3D shape of the scene. Table 2 presents registration and integration time of the three experiments. Registration of a pair of range images takes less than 1 s in average. It is acceptable speed for online registration because an operator can move a range camera carefully to acquire accurate range images. Table 3 shows average registration and rotation error. The translation and rotation errors are measured as explained in Sect. 4. The table shows that registration error is very small after the registration renement. Translation error of Natural scene is a lot high compared to the others. This is due to the noisy background of the scene. 6.3 Reconstruction error analysis To analyze the reconstruction error of our 3D scanning, a reconstructed model of Beethoven is compared with a

123

578 Fig. 20 Reconstruction error analysis a ground truth model of Beethoven, b registration of the reconstructed model(Green, light grey) to the ground truth

S.-Y. Park et al.

ground truth model. To generate the ground truth model, a NextEngine desktop 3D scanner is used to scan the same object. The scanner is based on the laser-ranging technique. Figure 20a shows the reconstructed ground truth model of Beethoven. Error analysis is done as follows. First, the reconstructed 3D model from our method is overlapped with the ground truth model by manually. Second, using a simple ICP technique, we register our model to the ground truth. We use the ICP for renement because the ground truth model does not have any image plane required to run IPP. Figure 20b shows registered models. Third, by uniformly sampling the 3D points in our model, we measure the distance to the closest point in the ground truth. In result, there is about 1.2 mm average error and 1.8 mm standard deviation.

fails, a coarse registration technique nds the initial pose of a wide baseline range images to resume the renement step. A graphic interface displaying the status of registration on-line helps a human operator to plan the next view of the sensor. Using the proposed technique, we show the 3D reconstruction results of three real objects. In this paper, we have shown only partial reconstruction results. For complete 3D reconstruction, closing the surfaces of an object is needed. Currently, we need to walk around an object to scan all visible surfaces of the object. However, it is still a difcult problem to scan and register all surfaces online while walking around the object. Two main diffulties are sensor vibration due to human gait and error propagation. In the future, we will consider the complete 3D modeling problem using a hand-held 3D sensor.
Acknowledgements This work was supported by the Korea Research Foundation Grant funded by the Korean Government. (KRF-2007-331D00423).

7 Conclusion This paper proposes a new hand-held 3D scanning technique. Until today, not many investigations address the problem of hand-held 3D scanning. Due to the unstable motion of a human hand and the processing time of pose estimation, conventional 3D shape registration or matching techniques may fail to automatically align a sequence of range images. In this paper, we combine ne and coarse registration of multiple range images to overcome the problem. A sequence of range images obtained by a stereo vision sensor is registered automatically to reconstruct 3D models of real objects. A fast registration renement technique aligns continuous range images in a pair-wise manner. If the renement step

References
1. Akbarzadeh, A., Frahm, J.M., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewenius, H., Yang, R., Welch, G., Towles, H., Nister D., Pollefeys, M.: Towards urban 3D reconstruction from video. In: Proceedings of 3DPVT06 (2006) 2. Besl, P.J., McKay, N.D.: A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Recogn. Mach. Intell. 14(2), 239 256 (1992) 3. Dias, P., Sequeira, V., Vaz, F., Goncalves, J.G.M.: Registration and fusion of intensity and range data for 3D modelling of real world

123

Hand-held 3D scanning based on coarse and ne registration of multiple range images scenes. In: Fourth International Conference on 3-D Digital Imaging and Modeling, pp. 418421 (2003) Hilton, A., Illingworth, J.: Geometric fusion for a hand-held 3d sensor. Mach. Vis. Appl. 12(1), 4451 (2000) Huber, D., Hebert, M.: 3-D modeling using a statistical sensor model and stochastic search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 858865 (2003) Jaeggli, T., Koninckx, T.P., Van Gool, L.: Online 3D acquisition and model integration, IEEE international workshop on projectorcamera systemsICCV03, cdrom proc (2003) Johnson, A.E., Kang, S.B.: Registration and integration of textured 3D data. Image Vis. Comput. 17(2), 135147 (1999) Johnson A.: Spin-images: a representation for 3-D surface matching, CMU-RI-TR-97-47 (1997) Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D., Pereira, L., Ginzton, M., Anderson, S., Davis, J., Ginsberg, J., Shade, J., Fulk D.: The digital michelangelo project: 3D scanning of large statues. In: SIGGRAPH, pp. 131144 (2000) Liu, Y., Heidrich, W.: Interactive 3D model acquisition and registration. In: Proceedings of 11th Pacic Conference on Computer Graphics and Applications, pp 115122 (2003) Lorensen, W.E., Cline Harvey, E.: Marching cubes: a high resolution 3D surface construction algorithm. ACM SIGGRAPH Comput. Graph. 21(4), 163169 (1987) Matabosch, C., Fo, D., Salvi, J., Batlle, E.: Registration of surfaces minimizing error propagation for a one-shot multi-slit hand-held scanner. Pattern Recogn. 41(6), 20552067 (2008) Park, S.Y., Subbarao, M.: An accurate and fast point-to-plane registration technique. Pattern Recogn. Lett. 24(16), 29672976 (2003) Park, S.Y., Baek, J.: Online registration of multi-view range images using geometric and photometric feature tracking. In: The 6th International Conference on 3-D Digital Imaging and Modeling (3DIM) (2007)

579

4. 5.

6.

7. 8. 9.

10.

11.

12.

13. 14.

15. Popescu, V., Sacks, E., Bahmutov, G.: The model camera: a handheld device for interactive modeling. In: Proceedings of 3DIM03, pp. 285292 (2003) 16. Popescu, V., Sacks, E., Bahmutov, G.: Interactive modeling from dense color and sparse depth. In: Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT) (2004) 17. Rusinkiewicz, S., Hall-Holt, O., Levoy, M.: Real-time 3d model acquisition. Proc. Siggraph 21(3), 438446 (2002) 18. Rusinkiewicz, S., Levoy, M.: Efcient variants of the ICP algorithm. In: Third international conference on 3-D digital imaging and modeling, pp. 145152 (2001) 19. Se, S., Jasiobedzki, P.: Instant scene modeler for crime scene reconstruction. IEEE Conf. Comput. Vis. Pattern Recogn. 3, 123123 (2005) 20. Shi, J. Tomasi, C.: Good features to track. IEEE Conf. Comput. Vis. Pattern Recogn. 593600 (1994) 21. Tomasi, C., Kanade, T.: Detection and tracking of point features, Carnegie Mellon University Technical Report CMU-CS-91-132 (1991) 22. Urfalioglu, O., Mikulastik, P., Stegmann, I.: Scale invariant robust registration of 3D-point data and a triangle mesh by global optimization. In: Proceedings of the 8th International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS 2006), LNCS, vol. 127, pp. 10591070 (2006) 23. Yoshida, K., Saito H.: Registration of range images using texture of high-resolution color images. In: Proceedings of IAPR Workshop on Machine Vision Applications(MVA2002) (2002) 24. Yun S.U., Min, D., Sohn, K.: 3D scene reconstruction system with hand-held stereo camerasm 3DTV conference, pp. 14 (2007) 25. http://www.ces.clemson.edu/~stb/klt/ 26. http://www.ptgrey.com

123

Вам также может понравиться