Depth Camera in Computer Vision and Computer Graphics EXCELENTE

ISSN 1673-9418 CODEN JKYTA8 Journal of Frontiers of Computer Science and Technology 1673-9418/2011/05(06)-0481-12 DOI: 10.3778/j.issn.1673-9418.2011.06.
001
E-mail: fcst@vip.163.com http://www.ceaj.org Tel: +86-10-51616056
Depth Camera in Computer Vision and Computer Graphics: An Overview*

XIANG Xueqin, PAN Zhigeng+, TONG Jing
State Key Lab of Computer Aided Design and Computer Graphics, Zhejiang University, Hangzhou 310058, China + Corresponding author: E-mail: zgpan@cad.zju.edu.cn
*
, +,
, 310058 : ,
, , , , , , : ; ; A TP391.41
XIANG Xueqin, PAN Zhigeng, TONG Jing. Depth camera in computer vision and computer graphics: an overview. Journal of Frontiers of Computer Science and Technology, 2011, 5(6): 481-492. Abstract: An increasing number of applications depend on accurate and fast 3D scene analysis, such as geometry reconstruction, collision prevention, mixed reality, and gesture recognition. The acquisition of a range map by imagebased analysis or laser scan techniques is still time-consuming and expensive. Emerged as an alternative device to measure distance, depth camera enjoys some advances, e.g., lower price and higher photo speed, that have not be
*The National Natural Science Foundation of China under Grant No. 60970076 (); the National High-Tech Research and Development Plan of China under Grant No. 2009AA062704 ((863)). Received 2011-02, Accepted 2011-04.
482
Journal of Frontiers of Computer Science and Technology
2011, 5(6)
made in traditional 3D measuring systems. Recently, significant improvements have been made in order to achieve low-cost and compact depth camera devices that have the potential to revolutionize many fields of research, including computer vision, computer graphics and human computer interaction (HCI). These technologies are also starting to attract many researchers working for academic or commercial purposes. This paper gives an overview of recent developments in depth camera technology and discusses the current state of the integration of this technology into various related applications in computer vision and computer graphics. Key words: depth camera; computer vision; computer graphics
Introduction
Acquiring 3D geometric information from real environments is an essential task for many applications in computer vision and computer graphics. Numerous assignments, such as cultural heritage preservation, augmented reality and human computer interaction, obviously favors simple and accurate devices for realtime range image acquisition. Unfortunately, even for static scenes, there exists no low-priced off-the-shelf system, which can provide good quality, high resolution distance information in real time. Laser scanning techniques, which merely sample a scene row by row with a single laser device, are rather time-consuming and therefore infeasible for dynamic scenes. Stereo vision systems are rather limited: they are known to be quite fragile in practice (e.g., due to lack of texture). Being a newly developed distance measuring hardware, the depth camera technology opens a new epoch for 3D geometric information acquisition. Unlike other 3D systems, the depth camera is very compact and it has already fulfilled most of the above stated features, such as full range field and high photo speed, that are desired for real-time distance measurement.
There are two main approaches employed currently in depth camera technology. The first one is based on the time-of-flight (ToF) principle, measuring time delay between transmissions of a light pulse. Some solutions utilize modulated, incoherent light with radio frequency (RF) carrier, then measure the phase shift of that carrier on the receive side (e.g., the Photonic Mixer Devices (PMD)[1] and Swiss Ranger 4000[2]). With phase unwrapping algorithms, the maximum uniqueness range can be increased. The Swiss Ranger 4000 (http://www.mesa-imaging.ch, Fig.1 (a)) has ranges of 5 or 10 meters, with 176144 pixels. The PMD (http://www.pmdtec.com, Fig.1 (b)) can provide ranges up to 60 meters. On the other hand, the 3DV Inc. cameras (http://www.3dvsystems.com)[3] and Canesta 3D cameras (http://www.canesta.com) are range-gated systems using Medinas design[4], and indirectly measure the time of flight using a fast shutter technique. The second approach is based on the light coding, projecting a known infrared pattern onto the scene and determining depth based on the patterns deformation captured by an infrared CMOS imager. This driven by a single-chip custom-silicon solution, e.g., PrimeSensor (http://www.primesense.com, Fig.1 (c)), can produce
Fig.1 Different types of depth camera 1
483
depth image up to 640480 pixels with a maximum throughput of 60 f/s. And recently popular Microsoft Kinect sensor (http://www.xbox.com/kinect, Fig.1 (d)) also uses light coding for depth measuring. The overview gives a summary on the depth camera measurement principles (Section 1). Sections 2 and 3 discuss sensor calibration issues and basic concepts in terms of image processing and sensor fusion. Section 4 focuses on applications for geometric reconstruction, human-oriented applications, and interaction based on depth cameras. Finally, Section 5 draws a conclusion and gives a perspective on future work of depth camera related research and applications.
Calibration
Fig.2 Multi-sensor calibration in [10] 2 [10]
Depth cameras use standard optics to focus the reflected active light onto the chip. Thus, it is important that classical intrinsic calibration is required to compensate effects like shifted optical centers and lateral distortion. For depth camera with relatively high resolution, i.e., 176144, standard calibration techniques[5] can be used. For low resolution sensors, Beder[6] has proposed an optimization approach based on analysisby-synthesis. To evaluate the error of the depth camera, acquisition of reference data (ground truth) is a non-trivial task. Previous approaches use track lines[7], which unfortunately need cost intensive experiment. Alternative techniques use image based approach to estimate the extrinsic parameters of the sensor with respect to a reference plane, e.g., a checkerboard[8]. Considering the systematic measurement error, first approach[9] assumed a linear deviation with respect to the objects distance. Then, this systematic depth error can be corrected using look-up-tables[10] or B-splines[5]. Since the systematic error behaves quite similar for different sensor types[11], it was a significant improvement when Zhu et al.[10] combined ToF sensor with passive stereo (See Fig.2) for getting high accuracy depth maps. Their approach is based on the observation that ToF sensors have error characteristics which are complementary to passive stereo. Unfortunately, the captured range data are typically contaminated by noise. The noise level of the distance measurement depends on the amount of incident active light. Also, an additional depth error related to the intensity color is observed[11], i.e., object regions with
low near-infrared reflectivity (NIR) have a non-zero mean offset compared to regions with high reflectivity. In [8] the systematic and the intensity-related errors were compensated using a bivariate correction function based on B-splines directly on the distance values, assuming both effects to be coupled. Alternatively, Chan et al.[12] proposed an adaptive multi-lateral filter that takes into account the inherent noisy nature of real-time depth data. Regarding the multiple reflections, the authors in [1314] proposed a model for multiple reflections as well as a technique for correcting the related measurements. It is assumed that the perturbation component due to multiple reflections outside and inside the camera depends on the scene and the camera construction, respectively. Therefore, the spatial spectral components consist mainly of low spatial frequencies, which can be compensated using a genuine model of the signal as being complex with the amplitude and the distance as modulus and argument. In a word, this model is useful if an additional light pattern can be projected on the object. The device manufacturers also attempt to reduce the motion artifacts, which are mainly caused by the latency between the individual exposures for the four phase images. However, the problem remains and might be solved by motion-compensated integration of the individual measurements or motion deblurring method[15].
3 Range Image Processing and MultiSensor Fusion

Before using the range data from a depth camera,
484
2011, 5(6)
usually some pre-processing of the input data is required. In current generation, these sensors provide noise-contaminated range data of comparably low image resolution (e.g., only up to 176144 for Swiss Ranger 4000). For the purpose of removing outliers caused by random noise, bilateral filter is typically used to refine the range data[16]. To upsample the resolution of depth camera, most approaches are based on the main assumption that depth discontinuities are often related to color changes in the corresponding color image. In [17], Markov random field (MRF) was first designed based on the low resolution depth maps and the high resolution camera images. Unfortunately, this method gives promising spatial resolution enhancement only up to 10. Yang et al.[18] then presented a method that models a cost volume of depth probability and iteratively applies bilateral filter[16] to refine the cost volume. Another recent method[2] utilized exclusively depth maps, without color image aid: a sequence of low resolution depth maps of same scene is aligned and then merged together to obtain a single depth map with improved resolution. But this method is restricted to static scenes acquisition. Then, we therefore presented a simple pipeline[19] to enhance the quality as well as improve the spatial and depth resolution of range data in real time (See Fig.3). Similarly, by using information from one or more additional high resolution vision cameras, Tian et al. [20] considered the problem of upsampling a low resolution depth map generated by a range camera to provide an accurate high resolution depth map from the viewpoint of one of the vision cameras.
Fig.3 Depth camera data denosing 3
From a practical point of view, a higher resolution is need for color than for depth information. Therefore different combinations of high resolution video cam-
eras and lower resolution depth cameras have been studied. Many researchers use a binocular combination of a depth camera with one[16] or several RGBcameras[21] to upsample the low resolution ToF data with high resolution color information. This fixated sensor combinations make it available to compute the rigid 3D transformation between the optical centers of both sensors (external calibration) and intrinsic camera parameters of each sensor. Utilizing this transformations the 3D points provided by the depth camera are co-registered with the 2D image, thus color information can be assigned to each 3D point. There are also a number of monocular systems, which combine a depth camera with a conventional image sensor. They have the advantage of making data fusion easier but requiring more sophisticated optics, hardware and algorithm. The currently released Microsoft Kinect is a good example of monocular 2D/3Dcamera aimed at video game. The device features an RGB camera and depth sensor running proprietary software, which provides the capabilities of full-body 3D motion capture, facial recognition and voice recognition. Another research direction investigates on combining depth cameras with classical stereo techniques. In [22], it has been first shown that a ToF-stereo combination can greatly speed up the stereo algorithm while helping to manage textureless regions. A global data fusion algorithm that incorporates the belief propagation for depth from stereo images and the ToF depth data was proposed by [10]. They combine both depth estimates with an MRF to obtain a fused superior depth map. For those that are interested in more technical details, please refer to [23] where authors built a hybrid camera system composed of a stereoscopic camera and a time-of-flight depth camera to generate high-quality and high-resolution video-plusdepth. A recent technique[24] for improving the accuracy of range maps measured by ToF-cameras is based on the observation that the range map and intensity image are not independent but are linked by the shading constraint: If the reflectance properties of the surface are known, a certain range map implies a corresponding intensity image (See Fig.4). The main limitation of this method is that it does not cope well with range discontinuities. But it will be possible overcome by ignoring any mesh triangle that straddles range discontinuities.
485
Fig.4
3D reconstruction of a human face using shading constraint 4
Fig.5
3D reconstruction based on depth camera 5
4 Applications of Depth Camera 4.1 Geometry Extraction and 3D Reconstruction

Depth cameras typically record surroundings at high photo speed, e.g., up to 30 f/s for Microsoft Kinect. Thus, these sensors are especially well suited for directly capturing 3D scene geometry in static and even dynamic environments. A 3D map of the environment can be captured by sweeping the depth camera and registering all scene geometry into a consistent reference coordinate system[25]. Kim et al.[26] have proposed an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors. They first combined multi-view ToF sensor measurements to obtain a coarse but complete model. Then, the initial model is refined by means of a probabilistic multi-view fusion framework, optimizing over an energy function that aggregates ToF depth sensor information with multi-view stereo and silhouette constraints. Fig.5 (a) and (b) show a sample acquired with this kind of approach. For high quality 3D reconstruction, Fuchs et al.[27] investigated how well the known 3D geometry of a cube was reconstructed with ToF sensors information.
Guan et al.[28] presented a system that combines multiple ToF cameras with a set of video cameras to simultaneously reconstruct dynamic 3D objects with shape-from-silhouettes and range data. After defining sensing models for each type of sensors, they solved the reconstruction problem robustly by using Bayesian inference. A probabilistic ad hoc fusion algorithm[2930] was then derived in order to obtain relatively high quality 3D construction result from the information of both the ToF camera and the stereo-pair. According to experimental results, this ad hoc fusion algorithm leaded to a very accurate calibration suitable for the fusion algorithm, that, in turn, allowed for precise extraction of the depth information. On the other hand, the low resolution and small field of views of a depth camera can be merged or aligned together to utilize additive information among these scenes. Cui et al.[31] described a method for 3D object scanning by aligning depth scans that are taken from around an object with a time-of-flight camera (See Fig.5 (c)). This new easy-to-use 3D object scanning approach makes it applausible in 3D reconstruction. Also, high quality 3D reconstruction can be achieved by utilizing a structure from motion (SFM) approach[3233]. The inherent problem of SFM, however, is that no metric scale is available. This can be
486
2011, 5(6)
solved by the metric properties of the depth measurements[34]. Thus, the SFM approach allows to reconstruct metric scenes with high resolution at interactive rates, e.g., for 3D map building and navigation[35]. Since color and depth can be obtained simultaneously, free viewpoint rendering is easily incorporated using depth compensated warping[36].We also propose a 3D reconstruction method for non-rigid object using one depth camera[37], and then extend this method to scan hairstyle[38] (See Fig.6).
0.1 mm. Thus, it is clearly competitive with other image based approaches[41]. A further paper[42] used ToF cameras to monitor respiration during sleep and detect sleep apnea. Currently, ToF cameras were reported in [43] to identify person facial identification from single-view on real depth images acquired with an off-the-shelf 3D time-of-flight depth camera. Some medical applications such as cancer treatment require a repositioning of the patient to a previously defined position. Depth cameras have been used in such situation to solve the problem by segmenting the patient body and registering a rigid 3D-3D surface registration[44]. Also, in iris capturing scenario, it has been reported that[45], depth sensor (See Fig.7 (a)) was used in iris deblurring algorithm for less intrusive iris capture while improving the robustness and nonintrusiveness for iris capture.
Fig.6 6
Hairstyle scanning using one depth camera
Simultaneous reconstruction of a scene with wide field of view and dynamic scene analysis can be accomplished by jointly combining a depth/color camera pair on a computer-driven pan-tilt unit and by scanning the environment in a controlled manner. When scanning the scene, a 3D panorama can be achieved by stitching both depth and the color images into a common cylindrical or spherical panorama. Therefore, from the center point given by the position of the pan-tilt unit, a 3D environment model can be finally reconstructed in a preparation phase. Dynamic 3D scene content like person movements can then be acquired online by adaptive object tracking with the camera head[39].
Fig.7 Human-oriented applications using depth camera 7
4.2
Human-Oriented Analysis
A number of human-oriented applications based on depth cameras have been made in last few years. For example, ToF camera systems can be successfully used to detect respiratory motion of human persons[40]. Possible samples are emission tomography where respiratory motion may be the main reason for image quality degradation. In such cases, ToF camera systems can detect the three dimensional, markerless, real-time respiratory motion with an accuracy of
Depth cameras are also useful in motion detection. In [46], Liao et al. first utilized a single depth camera to reconstruct complete 3D deformable models (e.g., human body) over time, provided that most parts of the models are observed by the camera at least once. Unlike well-studied structure from motion method, their approach can tackle time-varying objects deforming arbitrarily but predictably. Acting like a touch sensor, depth cameras were used to touch on a tabletop[47]. Automatic detection and pose estimation of humans is an important task in human computer interaction (HCI). In [48], Jain and Subramanian presented a model based approach for detecting and estimating human pose by fusing depth and RGB color data from monocular view. A further study was released by Ganapathi et al. in [49] where they derive an efficient
487
Fig.8
Overview of the algorithm in [53] 8 [53]
filtering algorithm for markerlessly tracking human pose in real time, using a stream of monocular depth images (See Fig.7 (b)). The key idea lies in their approach is to combine an accurate generative model which is achievable using programmable graphics hardware with a discriminative model that feeds data driven evidence about body part locations. Since the accurate real-time tracking of humans and other articulated bodies has enticed researchers for many years, their work opens a new door for the large number of useful applications. Most recently, Shotton et al.[50] proposed a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. By breaking the whole skeleton into parts, their system can run at 200 f/s on consumer hardware (i.e., Microsoft Kinect), while achieving state of the art accuracy.
4.3
User Interaction and User Tracking
Depth cameras have an obvious potential for interactive systems such as alternative input devices, games, animated avatars etc. In an early literature[51], Oggier et al. have used a ToF-camera to track the hand and thereby allow for touch-free interaction in a large virtual interactive screen. Soutschek et al.[52] then presented a similar application for a touch-free navigation in a 3D medical visualization. User interaction often requires image matting operation, since it wants to extract an interesting object by recovering per-pixel opacity from its background. More recently, Zhu et al.[53] proposed an automatic matting technique by combining a ToF camera with a stereo. The key idea of their method is to fuse information from the ToF sensor and the stereo camera to jointly optimize depth map and alpha matte iteratively.
Fig.8 shows the overview of their method. For more details we refer the reader to [53]. Recently, many works consider the application of depth camera for user tracking and man-machineinteraction. Tracking people in a smart room, i.e. multi-modal environments where the audible and visible actions of people inside the rooms are recorded and analyzed automatically, can benefit from the using of ToF-sensors[22]. Another different tracking approach has been discussed in [54]. Here, only one ToF-sensor is utilized to observe a scene at an oblique angle. As for tracking non-rigid objects, in particular human faces, Cai et al.[55] proposed a regularized maximum likelihood deformable model fitting (DMF) algorithm for 3D face tracking with a commodity depth camera. They regulated the noisy depth data in the ICP framework by using a novel l1 regularization scheme. Fig.9 demonstrates some tracking results using their algorithm.
Fig.9
Example tracking results using the algorithm in [55] 9 [55]
488
2011, 5(6)
Depth cameras have also been successfully used in the real-time markerless three dimensional interaction, by detecting hand gestures and movements[56]. Holte et al.[57] first used depth cameras for gesture recognition, where only range data were used. In [58], motion was detected using band-pass filtered difference range images. Then, they extended this to full body gesture recognition using spherical harmonics[59]. The incorporation of depth camera in automotive applications has been studied in [59], where a system design for parking assist and backup has been presented. Swadzba et al.[60] also used ToF sensors to set up an environment model and localize human interaction partners. This has been accomplished by tracking 3D points using an optical flow approach and a weak object model with a cylindrical shape. In [61] a system to control an industry robot by gestures has been described. Most recently, it has been reported[62] that depth camera is used for high speed running car tracking in outdoor environments. The approach is flexible and can generate depth maps with increased spatial and temporal resolution for both static and dynamic scenes. For purpose of developing natural means of controlling humanoid robots, Halit et al.[63] presented a new humanoid robot control and interaction interface that uses depth images and skeletal tracking software to control the navigation, gaze and arm gestures of a humanoid robot.
Since depth cameras can capture depth maps in real time, it is natural to use these cameras to reconstruct object geometry. However, depth cameras deliver rather inaccurate distance measurements compared to, for example, laser range scanners. The field of user-interaction and user tracking has been well studied in the last two years, resulting in a number of significant improvements in robustness and functionality based on the incorporation of depth cameras. Also, incorporation of depth cameras for humanoriented applications has become a new trend, attracting many researchers in area of medical diagnosis, biomedical authentication and so on. Looking into future, we confirm that the growing interest in depth camera technology, the ongoing development of sensor hardware, and the increasing amount of related work on the real-time range data processing algorithms will soon result in further solutions of the discussed problems and extend applications of depth cameras.
References:
[1] Xu Z, Schwarte R, Heinol H, et al. Smart pixel photonic mixer device (PMD)[C]//Proceedings of the International Conference on Mechatronics and Machine Vision in Practice (M2VIP), Nanjing, China, 1998: 259264. [2] Schuon S, Theobalt C, Davis J, et al. High-quality scan-
Conclusion and Future Work
ning using time-of-ight depth superresolution[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop on Time-of-Flight Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008. [3] Iddan G J, Yahav G. 3D imaging in the studio and elsewhere[J]. Proceedings of SPIE, 2001, 4298: 4855. [4] Medina A, Gay F, Pozo F. Compact laser radar and three-dimensional camera[J]. Journal of the Optical Society of America A, Optics, Image Science and Vision, 2006, 23(4): 800805. [5] Lindner M, Kolb A. Lateral and depth calibration of PMD-distance sensors[C]//Proceedings of the 2nd International Symposium on Visual Computing (ISVC 06), Lake Tahoe, Nevada, USA, Nov 6-8, 2006: 524533. [6] Beder C, Koch R. Calibration of focal length and 3D pose
This paper presents a review of ongoing research on novel real-time depth cameras. These devices are currently under development and few low cost cameras (e.g., Microsoft Kinect) are available. Depth cameras which are based on the time-of-flight principle, measure time delay between transmissions of a light pulse. An alternative approach is based on the light coding, projecting a known infrared pattern onto the scene and determining depth based on the patterns deformation captured by an infrared CMOS imager. From the application perspective, these devices exhibit a large number of specific effects that must be considered. Therefore, the calibration is needed as a pre-processing. Typically, many approaches combine depth cameras with high resolution grayscale or RGB sensors to construct a simple yet efficient multi-modal system that delivers high resolution intensity and low resolution range data in real time.
489
based on the reflectance and depth image of a planar object[J]. International Journal of Intelligent Systems Technologies and Applications: Issue on Dynamic 3D Imaging, 2008, 5(3/4): 285294. [7] Kahlmann T, Remondino F, Guillaume S. Range imaging technology: new developments and applications for people identification and tracking[J]. Proceedings of SPIE, 2007, 6491: 112. [8] Lindner M, Kolb A. Calibration of the intensity-related distance error of the PMD ToF-camera[J]. Proceedings of SPIE, 2007, 6764: 35. [9] Kuhnert K D, Stommel M. Fusion of stereo camera and PMD-camera data for real-time suited precise 3D environment reconstruction[C]//Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 06), Beijing, China, 2007: 47804785. [10] Zhu J J, Wang L, Yang R G, et al. Reliability fusion of time-of-flight depth and stereo for high quality depth maps[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2010. [11] Rapp H. Experimental and theoretical investigation of correlating ToF-camera systems[D]. University of Heidelberg, Germany, 2007. [12] Chan D, Buisman H, Theobalt C, et al. A noise-aware lter for real-time depth upsampling[C]//Proceedings of the Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications (M2SFA2 08), Marseille, France, October 12-18, 2008: 112. [13] Falie D, Buzuloiu V. Distance errors correction for the time of flight (ToF) cameras[C]//Proceedings of European Conference on Circuits and Systems for Communications (ECCSC 08), Bucharest, Romania, July 10-11, 2008: 193196. [14] Falie D. 3D image correction for time of flight (ToF) cameras[J]. Proceedings of SPIE, 2008, 7156: 133. [15] Tai Y W, Kong N, Lin S, et al. Coded exposure imaging for projective motion deblurring[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 10), San Francisco, USA, June 13-18, 2010: 18.
[16] Huhle B, Schairer T, Jenke P, et al. Robust non-local denoising of colored depth data[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop on Time of Flight Camera Based Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008. [17] Diebel J, Thrun S. An application of Markov random fields to range sensing[C]//Proceedings of the Conference on Neural Information Processing Systems (NIPS 05), Vancouver, British Columbia, Canada, December 5-8, 2005. [18] Yang Q X, Yang R G, Davis J, et al. Spatial-depth super resolution for range images[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 07), Minneapolis, Minnesota, USA, June 18-23, 2007: 18. [19] Xiang X Q, Li G X, Pan Z G, et al. Real-time spatial and depth upsampling for range data[J]. LNCS Transactions on Computational Science, 2011, 6670: 7898. [20] Tian C, Vaishampayan V, Zhang Y F. Upsampling range camera depth maps using high-resolution vision camera and pixel-level confidence classification[J]. Proceedings of SPIE, 2011, 7863. [21] Guomundsson S A, Larsen R, Aanaes H, et al. ToF imaging in smart room environments towards improved people tracking[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop on Time of Flight Camera Based Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008. [22] Gudmundsson S A, Aanaes H, Larsen R. Fusion of stereo vision and time-of-flight imaging for improved 3D estimation[J]. International Journal of Intelligent Systems Technologies and Applications: Issue on Dynamic 3D Imaging, 2008, 5(3/4): 425433. [23] Kim S Y, Koschan A, Mongi A A, et al. Book chapter: three-dimensional video contents exploitation in depth camera-based hybrid camera system[M]//Signals and Communication Technology, High-Quality Visual Experience. [S.l.]: Springer, 2010: 349369. [24] Bhme M, Haker M, Martinetz T, et al. Shading con-
490
2011, 5(6)
straint improves accuracy of time-of-flight measurements[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 08), Anchorage, Alaska, USA, June 24-26, 2008: 18. [25] Huhle B, Jenke P, Strasser W. On-the-fly scene acquisition with a handy multisensory system[J]. International Journal of Intelligent Systems Technologies and Applications: Issue on Dynamic 3D Imaging, 2008, 5(3/4): 255263. [26] Kim Y M, Theobalt C, Diebel J, et al. Multi-view image and ToF sensor fusion for dense 3D reconstruction[C]// Proceedings of the IEEE Workshop on 3-D Digital Imaging and Modeling (3DIM 09), Kyoto, Japan, October 3-4, 2009: 15421549. [27] Fuchs S, May S. Calibration and registration for precise surface reconstruction[C]//Proceedings of the Dynamic 3D Imaging Workshop in Conjunction with DAGM (Dyn3D), Heidelberg, Germany, September 2007. [28] Guan L, Franco J S, Pollefeys M. 3D object reconstruction with heterogeneous sensor data[C]//Proceedings of the 4th International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT 08), Atlanta, USA, June 18-20, 2008: 295302. [29] Mutto C D, Zanuttigh P, Cortelazzo G M. Accurate 3D reconstruction by stereo and ToF data fusion[C]//Proceedings of the GTTI Meeting 2010, Brescia, Italy, May 2010. [30] Mutto C D, Zanuttigh P, Cortelazzo G M. A probabilistic approach to ToF and stereo data fusion[C]//Proceedings of the 5th International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT 10), Paris, France, May 17-20, 2010. [31] Cui Y, Schuon S, Chan D, et al. 3D shape scanning with a time-of-flight camera[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR10), San Francisco, USA, June 13-18, 2010: 18. [32] Bartczak B, Koeser K, Woelk F, et al. Extraction of 3D freeform surfaces as visual landmarks for real-time tracking[J]. Journal of Real-Time Image Processing, 2007, 2(2/3): 81101.
[33] Koeser K, Bartczak B, Koch R. Robust GPU-assisted camera tracking using free-form surface models[J]. Journal of Real-Time Image Processing, 2007, 2(2/3): 133147. [34] Streckel B, Bartczak B, Koch R, et al. Supporting structure from motion with a 3D range-camera[C]//Lecture Notes in Computer Science 4522: Proceedings of the 15th Scandinavian Conference on Image analysis. Berlin, Heidelberg: Springer-Verlag, 2007: 233242. [35] Prusak A, Melnychuk O, Roth H, et al. Pose estimation and map building with a PMD-camera for robot navigation[J]. International Journal of Intelligent Systems Technologies and Applications: Issue on Dynamic 3D Imaging, 2008, 5(3/4): 355364. [36] Koch R, Evers-Senne J. View synthesis and rendering methods[M]//3D Video Communication: Algorithms, Concepts and Real-time Systems in Human Centered Communication. [S.l.]: Wiley, 2005: 151174. [37] Tong J, Xiang X Q, Pan Z G, et al. 3D reconstruction of non-rigid shapes using one ToF camera[J]. Journal of Computer-Aided Design & Computer Graphics, 2011, 23(3): 377384. [38] Tong J, Zhang M M, Xiang X Q, et al. 3D body scanning with hairstyle using one time-of-flight camera[J]. Journal of Computer Animation and Virtual Worlds, 2011, 22(2/3): 203211. [39] Bartczak B, Schiller I, Beder C, et al. Integration of a time-of-flight camera into a mixed reality system for handling dynamic scenes, moving viewpoints and occlusions in real-time[C]//Proceedings of the 4th International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT 08), Atlanta, USA, June 18-20, 2008: 295302. [40] Schaller C, Penne J, Hornegger J. Time-of-flight sensor for respiratory motion gating[J]. Medical Physics, 2008, 35(7): 30903093. [41] Penne J, Schaller C, Hornegger J, et al. Robust real-time 3D respiratory motion detection using time-of-flight cameras[J]. International Journal of Computer Assisted Radiology and Surgery, 2008, 3(5): 427431. [42] Falie D, Ichim M, David L. Respiratory motion visualization and the sleep apnea diagnosis with the time of flight
491
(ToF) camera[C]//Proceedings of International Conference on Visualization Imaging and Simulation (VIS 08), Bucharest, Romania, November 7-9, 2008: 179184. [43] Ding H, Moutarde F, Shaiek A. 3D object recognition and facial identification using time averaged single-views from time-of-flight 3D depth-camera[C]//Proceedings of the Eurographics Workshop on 3D Object Retrieval, Norrkping, Sweden, May 3-7, 2010. [44] Adelt A, Schaller C, Penne J. Patient positioning using 3D surface registration[C]//Proceedings of the RussianBavarian Conference on Biomedical Engineering, Moscow, Russia, July 8-9, 2008: 202207. [45] Huang X Y, Ren L, Yang R G. Image deblurring for less intrusive iris capture[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 09), Miami, Florida, USA, June 20-26, 2009: 15581565. [46] Liao M, Zhang Q, Wang H M, et al. Modeling deformable objects from a single depth camera[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV 09), Kyoto, Japan, September 27 - October 4, 2009: 167174. [47] Wilson A. Using a depth camera as a touch sensor[C]// Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces (ITS 10), Saarbrucken, Germany, November 7-10, 2010: 6972. [48] Jain H P, Subramanian A. Real-time upper-body human pose estimation using a depth camera, HPL-2010-190[R]. 2010. [49] Ganapathi V, Plagemann C, Koller D, et al. Real time motion capture using a single time-of-flight camera[C]// Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 10), San Francisco, USA, June 13-18, 2010: 755762. [50] Shotton J, Fitzgibbon A, Cook M, et al. Real-time human pose recognition in parts from single depth images[C]// Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 11), Colorado Springs, USA, June 20-25, 2011. [51] Oggier T, Bttgen B, Lustenberger F, et al. SwissRanger
SR3000 and first experiences based on miniaturized 3D ToF cameras[C]//Proceedings of the 1st Range Imaging Research Day at ETH Zurich, 2005: 97108. [52] Soutschek S, Penne J, Hornegger J, et al. 3D gesturebased scene navigation in medical imaging applications using time-of-flight cameras[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop on Time of Flight Camera Based Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008. [53] Zhu J J, Wang L, Yang R G, et al. Reliability joint depth and alpha matte optimization via fusion of stereo and time-of-flight sensor[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 09), Miami, Florida, USA, June 20-26, 2009. [54] Hansen D W, Hansen M S, Kirschmeyer M, et al. Cluster tracking with time-of-flight cameras[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop on Time of Flight Camera Based Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008. [55] Cai Q, Gallup D, Zhang C, et al.3D deformable face tracking with a commodity depth camera[C]//Proceedings of the 11th European Conference on Computer Vision (ECCV 10), Crete, Greece, September 5-11, 2010: 229242. [56] Penne J, Soutschek S, Fedorowicz L, et al. Robust real-time 3D time-of-flight based gesture navigation[C]// Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG 2008), Amsterdam, Netherlands, September 17-19, 2008. [57] Holte M, Moeslund T. View invariant gesture recognition using the CSEM SwissRanger SR-2 camera[J]. International Journal of Intelligent Systems Technologies and Applications: Issue on Dynamic 3D Imaging, 2008, 5(3/4): 295303. [58] Holte M, Moeslund T, Fihl P. Fusion of range and intensity information for view invariant gesture recognition[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Work-
492
2011, 5(6)
shop on Time of Flight Camera Based Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008. [59] Acharya S, Tracey C, Rafii A. System design of timeof-flight range camera for car park assist and backup applications[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop on Time of Flight Camera Based Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008. [60] Swadzba A, Beuter N, Schmidt J, et al. Tracking objects in 6D for reconstructing static scene[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop on Time of Flight Camera Based Computer Vision (CVPRW 08), Anchorage, Alaska, USA, June 23-28, 2008.
[61] Ghobadi S E, Loepprich O E, Ahmadov F, et al. Real time hand based robot control using 2D/3D images[C]//Proceedings of the International Symposium on Advances in Visual Computing (ISVC 08), Las Vegas, Nevada, USA, December 1-3, 2008: 307316. [62] Dolson J, Baek J, Plagemann C, et al. Upsampling range data in dynamic environments[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 10), San Francisco, USA, June 13-18, 2010: 11411148. [63] Halit B S, Sonia C. Humanoid robot control using depth camera[C]//Proceedings of the 6th International Conference on Human-Robot Interaction (HRI 11), EPFL, Lausanne, Switzerland, March 6-9, 2011: 401402.
XIANG Xueqin was born in 1981. He is a Ph.D. candidate at Zhejiang University. His research interests include computer vision and depth camera, etc. (1981), , , , ,
PAN Zhigeng was born in 1965. He is a professor and doctoral supervisor at Zhejiang University, and the senior member of CCF. His research interests include virtual reality, human animation, human-computer interaction and edutainment, etc. (1965), , , , CCF , , , ,
TONG Jing was born in 1981. He is a Ph.D. candidate at Zhejiang University. His research interests include computer graphics and 3D animation, etc. (1981), , , , ,

Depth Camera in Computer Vision and Computer Graphics EXCELENTE

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Depth Camera in Computer Vision and Computer Graphics EXCELENTE

Загружено:

Авторское право:

Доступные форматы

ISSN 1673-9418 CODEN JKYTA8 Journal of Frontiers of Computer Science and Technology 1673-9418/2011/05(06)-0481-12 DOI: 10.3778/j.issn.1673-9418.2011.06.

E-mail: fcst@vip.163.com http://www.ceaj.org Tel: +86-10-51616056

Depth Camera in Computer Vision and Computer Graphics: An Overview*

Journal of Frontiers of Computer Science and Technology

Fig.1 Different types of depth camera 1

Fig.2 Multi-sensor calibration in [10] 2 [10]

3 Range Image Processing and MultiSensor Fusion

Journal of Frontiers of Computer Science and Technology

Fig.3 Depth camera data denosing 3

3D reconstruction of a human face using shading constraint 4

3D reconstruction based on depth camera 5

4 Applications of Depth Camera 4.1 Geometry Extraction and 3D Reconstruction

Journal of Frontiers of Computer Science and Technology

Hairstyle scanning using one depth camera

Fig.7 Human-oriented applications using depth camera 7

Overview of the algorithm in [53] 8 [53]

User Interaction and User Tracking

Example tracking results using the algorithm in [55] 9 [55]

Journal of Frontiers of Computer Science and Technology

Conclusion and Future Work

Journal of Frontiers of Computer Science and Technology

Journal of Frontiers of Computer Science and Technology

Вам также может понравиться