Вы находитесь на странице: 1из 22

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belgaum-590014, Karnataka, INDIA

Seminar Report on On the Recovery of Depth from a Single Defocused Image Submitted in partial fulfillment of the requirements for the VIII Semester Bachelor of Engineering
IN

COMPUTER SCIENCE AND ENGINEERING For the Academic year 2013-2014


BY

Supriya Pramod Deshpande (1PE10CS099) Guided By Ms Rakhi Mittal Rathor Assistant Professor, Dept. Of CSE PESIT Bangalore South Campus

Department of Computer Science and Engineering

PESIT Bangalore South Campus


HOSUR ROAD BANGALORE-560100

PESIT BANGALORE SOUTH CAMPUS


HOSUR ROAD BANGALORE-560100

Department of Computer Science and Engineering

CERTIFICATE
This is to certify that the seminar entitled On the Recovery of Depth from a Single Defocused Image is a bonafide work carried out by Supriya Pramod Deshpande bearing register number 1PE10CS099 in partial fulfillment for the award of Degree of Bachelors (Bachelors of Engineering) in Computer Science and Engineering of Visvesvaraya Technological University, Belgaum during the year 2013-2014.

Seminar Guide

Head of the Dept

Ms Rakhi Mittal Rathor


Assistant Professor, Department of CSE PESIT Bangalore South Campus Bangalore-100

Dr. Srikanta Murthy K


HOD, Department of CSE PESIT Bangalore South Campus Bangalore-100

ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of any task would be but incomplete without the mention of the people who made it possible and without whose constant guidance & encouragement it would never have been possible. I would like to thank the principal Dr. J Surya Prasad for providing me the necessary facilities to carry out my seminar work. I express my sincere gratitude to Dr. Srikanta Murthy, HOD Computer Science and Engineering for his support and encouragement during my work. With profound sense of gratitude, I acknowledge the guidance and support extended by Ms Rakhi Mittal Rathor, Assistant Professor, Dept. of Computer Science and Engineering. Her incessant encouragement and invaluable technical support have been of immense help in realizing this seminar. Her guidance gave me the environment to enhance my knowledge, skills and to reach the pinnacle with sheer determination, dedication and hard work. I also extend my thanks to all the faculty members of the Department of Computer Science & Engineering, PESIT BSC, who have encouraged me throughout the course of Bachelor of Engineering.

Supriya Pramod Deshpande

Department Of CSE

2013-2014

ABSTRACT
Depth recovery from images is an important application in robotics and 3D reconstruction. In this paper, the challenging problem of recovering the depth from a single defocused image is addressed. A simple yet effective approach to estimate the amount of spatially varying defocus blur at edge locations is presented. The input defocused image is re-blurred using a Gaussian kernel and the defocus blur amount can be obtained from the gradient ratio between the input and re-blurred images. By propagating the blur amount at edge locations to the entire image, the full depth map of the scene can be recovered. Experimental results on synthetic and real images demonstrate the effectiveness of this method in providing a reliable estimation of the depth of a scene.

Department Of CSE

ii

2013-2014

CONTENTS
Sl. No.
1 2 2.1 2.2 2.3 3 3.1 3.2 3.3 3.4 3.5 4 4.1 4.2 4.3 5 6 7 8 Introduction Applications Robotics 3D Reconstruction Object Recognition Existing Work Stereo Vision Structure from motion (SFM) Depth from focus (DFF) Depth from defocus (DFD) Coded Aperture method Proposed Methodology Defocus Model Blur Estimation Depth Interpolation Experimentation Ambiguities Conclusion References

Title

Page No.
1 3 3 4 4 5 5 5 5 6 6 7 8 9 10 11 13 14 15

Department Of CSE

iii

2013-2014

LIST OF FIGURES

Fig. No
1.1 1.2 4.1 4.1.1 4.2.1 5.1 5.2 6.1

Title
Categories for Depth estimation The depth recovery result o f the book image The depth recovery result (a) A thin lens model. (b) The diameter of CoC The blur estimation approach The depth recovery results of flower and building images Comparison of the method used and the inverse diffusion method. Ambiguity: The depth recovery result of the photo frame image

Page No
1 2 7 8 8 9 12 13 14

Department Of CSE

iv

2013-2014

On the Recovery of Depth from a Single Defocused Image

1.

INTRODUCTION

Depth recovery plays an important role in many computer vision and computer graphics applications including robotics, 3D reconstruction, image de-blurring and refocusing. In principle depth can be recovered either from monocular cues (shading, shape, texture, motion) or from binocular cues (stereo correspondences). Conventional methods for depth estimation have relied on multiple images for example methods like Stereo Vision, Structure from Motion (SFM), Depth from Focus (DFF) and Depth from Defocus (DFD). These methods either suffer from the occlusion problem or cannot be applied to dynamic scenes.

Fig 1.1. Categories for Depth estimation

The essence of an image is a projection from a 3D scene onto a 2D plane, during which process the depth is lost. The 3D point corresponding to a specific image point is constrained to be on the line of sight and it can be recovered either from monocular cues (shading, shape, texture, motion) or from binocular cues (stereo correspondences).

Department Of CSE

2013-2014

On the Recovery of Depth from a Single Defocused Image In recent decades, there is an important demand for 3D content for computer graphics, virtual reality and communication, triggering a change in emphasis for the requirements. Many existing systems for constructing 3D models are built around specialized hardware (e.g. stereo rigs) resulting in a high cost, which cannot satisfy the requirement of its new applications. This gap stimulates the use of digital imaging facilities (like a camera). As 3D vision has grown to be a very important field today, we need better methods to recover the depth from images. Recent approaches have been proposed to recover depth from a single image in very specific settings. projecting Several methods use active illumination to aid depth recovery b y

structured patterns onto t h e s c e n e . The d e p t h is then measured by the

attenuation of the projected l i ght or the deformation of the projected pattern. As an example, we have the Coded Aperture Method.

Fig 1.2. The depth recovery result o f the book image. (a) The input defocused image. (b) Recovered layered depth map. The larger intensity means larger blur amount and depth in all the depth maps presented in this paper.

In this paper the focus is on a more challenging problem of recovering the depth layers from a single defocused image captured by an uncalibrated conventional camera. As the most related work, the inverse diffusion method which models the defocus blur as a

diffusion process, uses the inhomogeneous reverse heat equation to obtain an estimate of the blur at edge locations and then proposed a graph-cut based method for inferring the

Department Of CSE

2013-2014

On the Recovery of Depth from a Single Defocused Image depth in the scene. In contrast, this paper uses the defocus blur as a 2D Gaussian blur. T h i s work has three main contributions. Firstly, an efficient blur estimation method based on the gradient magnitude ratio is proposed and this method is robust t o noise, inaccurate edge location and interference f r o m near edges. Secondly, without any modification to the camera or using additional illumination, the blur estimation method combined with MRF optimization can obtain the depth map of a scene by using only single defocused image captured by conventional camera. As shown in Fig. 1.1, this

method can extract a layered depth map of the scene with fairly good extent of accuracy. Finally, two kinds of ambiguities in recovering depth f rom a single image using defocus cue are discussed, one of which is usually overlooked.

2. APPLICATIONS
3D content has growing demands in todays market for various image processing applications.

2.1 Robotics
Robotics is the branch of technology that deals with the design, construction, operation, and application of robots, as well as computer systems for their control, sensory feedback, and information processing. Computer vision is the science and technology of machines that see. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences and views from cameras. It is the field that includes methods for acquiring, processing, analyzing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. A theme in the development of this field has been to duplicate the abilities of human vision by electronically perceiving and understanding an image. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of
Department Of CSE 3 2013-2014

On the Recovery of Depth from a Single Defocused Image geometry, physics, statistics, and learning theory. Computer vision has also been described as the enterprise of automating and integrating a wide range of processes and representations for vision perception. In most practical computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common. Applications range from tasks such as industrial machine vision systems to research into artificial intelligence as well as the design of artificial systems to mimic the processing and behavior of biological systems

2.2 3D Reconstruction
3D reconstruction is the process of capturing the shape and appearance of real objects. In 3D computer graphics, 3D reconstruction involves data acquisition, 3D modeling and object reconstruction. Data acquisition can occur from a multitude of methods including 2D images, acquired sensor data and on site sensors. 3D modeling is the process of developing a mathematical representation of any 3D surface of object (either inanimate or living) called a 3D model which can be displayed as a 2D image through a process called 3D rendering or used in a computer simulation of physical phenomena. After the data has been collected, the acquired data from images or sensors needs to be reconstructed.

2.3 Object Recognition


Object recognition in computer vision is the task of finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view-points, in many different sizes / scale or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. This application finds itself widely used in Optical Character Recognition (OCR), face detection, image watermarking, visual positional and tracking, robotics, automated vehicle parking systems etc.
Department Of CSE 4 2013-2014

On the Recovery of Depth from a Single Defocused Image

3. EXISTING WORK
3.1. Stereo Vision
Stereo vision measures disparities between a pair of images of the same scene taken from two different viewpoints and uses the disparities to recover the depth. In traditional stereo vision, two cameras, displaced horizontally from one another are used to obtain two differing views on a scene, in a manner similar to human binocular vision. By comparing these two images, the relative depth information can be obtained, in the form of disparities, which are inversely proportional to the differences in distance to the objects. To compare the images, the two views must be superimposed in a stereoscopic device, the image from the right camera being shown to the observer's right eye and from the left one to the left eye.

3.2. Structure from motion (SFM)


Structure from motion (SFM) is a range imaging technique; it refers to the process of estimating three-dimensional structures from two-dimensional image sequences which may be coupled with local motion signals. It computes the correspondences between images to

obtain the 2D motion field. The 2D motion field is used to recover the 3D motion and the depth It is studied in the fields of computer vision and visual perception. In biological vision, SFM refers to the phenomenon by which humans (and other living creatures) can recover 3D structure from the projected 2D (retinal) motion field of a moving object or scene.

3.3. Depth from focus (DFF)


Depth from focus (DFF) captures a set of images using multiple focus settings and measures the sharpness of image at each pixel locations. The sharpest pixel is selected to form a all-in-focus image and the depth of the pixel depends on which image the pixel is selected from.

Department Of CSE

2013-2014

On the Recovery of Depth from a Single Defocused Image DFF does not avoid the occlusion problem any more than triangulation techniques, but they are more stable in the presence of such disruptions. The fundamental advantage of DFF method is the two-dimensionality of the aperture, allowing more robust estimation.

3.4. Depth from defocus (DFD)


Depth from defocus (DFD) requires a pair of images of the same scene with different focus setting. It estimates the degree of defocus blur and the depth of scene can be recovered providing the camera setting. DFD avoids the occlusion problem and matching ambiguity problems It also has the advantage present in DFF that is the two-dimensionality of the aperture, allowing a more robust estimation and hence it is more reliable.

3.5. Coded Aperture method


The coded aperture method changes the shape of defocus blur kernel by inserting a customized mask into the camera lens, which makes the blur kernel more sensitive to depth variation. The depth is determined after a deconvolution process using a set of calibrated blur kernels. Using two coded apertures further enables a depth recovery with greater fidelity and also obtain a high quality all-focused image as the two aperturesare found to complement each other in the scene frequencies.

Department Of CSE

2013-2014

On the Recovery of Depth from a Single Defocused Image

4. PROPOSED METHODOLOGY
In this paper, the more challenging problem of recovering the relative depth from a single defocused image captured by an uncalibrated conventional camera is addressed. The defocus blur is modeled as a 2D Gaussian blur in this approach. The input image is reblurred using a known Gaussian blur kernel and the gradient ratio between input and re-blurred images is calculated. It is shown that the blur amount at edge locations can be derived from the ratio. The blur propagation problem is formulated as minimization of a cost function, an optimization problem to balance between fidelity to blur estimation at edge locations and piecewise depth smoothness. Solving the optimization problem leads to a full depth map of the scene.

Fig 4.1 The depth recovery result of this method. (a) Input Image. (b) Depth Map The larger intensity means larger depth in all depth maps presented in this paper. An efficient blur estimation method based on the Gaussian gradient ratio is proposed, and it is shown that it is robust to noise, inaccurate edge location and interference from neighboring edges. Without any modification to cameras or using additional illumination, this method is able to obtain the depth map of a scene by using only single defocused image captured by conventional camera. As shown in Fig. 4.1, this method can extract the depth map of the scene with fairly good extent of accuracy.
Department Of CSE 7 2013-2014

On the Recovery of Depth from a Single Defocused Image

4.1Defocus Model
As the amount of defocus blur is estimated at edge locations, we must model the edge first. We adopt the ideal step edge model which is

where u(x) is the step function. A and B are the amplitude and offset of the edge respectively. Note that the edge is located at x = 0. When an object is placed at the focus distance df , all the rays from a point of the object will converge to a single sensor point and the image will appear sharp. Rays from a point of another object at distance d will reach multiple sensor points and result in a blurred image. The blurred pattern depends on the shape of aperture and is often called the circle of confusion (CoC). The diameter of CoC characterizes the amount of defocus and can be written as

where f0 and N are the focal length and the stop number of the camera respectively.

Fig 4.1.1 a) A thin lens model. (b) The diameter of CoC c as a function of the object distance d and f-stop number N given df = 500mm, f0 = 80mm. Fig. 4.1.1 shows a thin lens model and how the diameter of circle of confusion changes with d and N , given fixed f0 and df . As we can see, The diameter of the CoC c is a non-linear
Department Of CSE 8 2013-2014

On the Recovery of Depth from a Single Defocused Image monotonically increasing function of the object distance d. The defocus blur can be modeled as the convolution of a sharp image with the point spread function (PSF). The PSF can be approximated by a Gaussian function g(x, ), where the standard deviation = kc is proportional to the diameter of the CoC c. is used as a measure of the depth of the scene. A blurred edge i(x) can be represented as follows,

4.2 Blur Estimation

Fig. 4.2.1. The blur estimation approach: here, and are the convolution and gradient operators respectively. The black dash line denotes the edge location.

Fig. 4.2.1 shows the overview of the local blur estimation method. A step edge is reblurred using a Gaussian function with know standard deviation. Then the ratio

between the gradient magnitude of the step edge and its re-blurred version is calculated. The ratio is maximum at the edge location. Using the maximum value, we can compute the amount of the defocus blur of an edge. For convenience, blur estimation algorithm for 1D case i s d e s c r i b e d first and then extend it to 2D image. The gradient of the re-blurred edge is:

where 0 is the standard deviation of the re-blur Gaussian function. We call it the re-blur scale. The gradient magnitude ratio between the original and re-blurred edges is
Department Of CSE 9 2013-2014

On the Recovery of Depth from a Single Defocused Image

It can be proved that the ratio is maximum at the edge location (x = 0). The maximum value is given by

Giving the insight on (4) and (6), we notice that the edge gradient depends on both the edge amplitude A and blur amount , while the maximum of the gradient magnitude ratio R eliminates the effect of edge amplitude A and de- pends only on and 0 . Thus, given the maximum value R, we can calculate the unknown blur amount using

For blur estimation in 2D images, we use 2D isotropic Gaussian function to perform re-blur. As any direction of a 2D isotropic Gaussian function is a 1D Gaussian, the blur estimation is similar to that in 1D case. Fig 4.2.2 shows the results of the Blur Estimation Method.

4.3 Depth Interpolation


After we obtain the depth estimates at edge locations, we need to propagate the depth estimates from edge locations to other regions that do not contain edges. We seek a regularized depth labeling which is smooth and close to the estimation in Eq. (7). We also prefer the depth discontinuities to be aligned with the image edges. Thus, We formulate this as a energy minimization over the discrete Markov Random Field (MRF) whose energy is given by

Department Of CSE

10

2013-2014

On the Recovery of Depth from a Single Defocused Image where each pixel in the image is a node of the MRF and balance the single node potential Vi ( i ) and pairwise potential Vij ( i , j ) which are defined as

where M () is a binary mask with non-zeros only at edge locations. the weight wij = exp{ (I (i) I (j))2 } encodes the difference of neighboring colors I (i) and I (j). 8neighborhood system N (i) is adopted in our definition. FastPD is used to minimized the MRF energy defined in Eq. (9). FastPD can guarantee a approximately optimal solution and is much faster than previous MRF optimization methods such as conventional graph cut techniques.

5. EXPERIMENTATION
There are two parameters in our method: the re-blur scale 0 and the . We set 0 = 1, = 1, which gives good results in all our examples. We use Canny edge detector and tune its parameters to obtain desired edge detection output. The depth map are actually the estimated values at each pixel. The performance is tested to check the method on the synthetic bar image shown in Fig. 5.1(a). The blur amount of the edge increases linearly from 0 to 5. First noises are added to the bar image. Under noise condition, although the result of edges with larger blur amount is more affected by noise, our method can still achieve reliable estimation result (see Fig. 5.1(b)). We then create more bar images with different edge distances. Fig. 5.1(c) shows that interferences from neighboring edges increase estimation errors when the blur amount is large (> 3), but the errors are controlled in a relative low level.

Furthermore, we shift the detected edges to simulate inaccurate edge location and test our method. The result is shown in Fig. 5.1(d). When the edge is sharp, the shift of

Department Of CSE

11

2013-2014

On the Recovery of Depth from a Single Defocused Image edge locations causes quite large estimation errors. However, in practice, the sharp edges usually can be located very accurately, which greatly reduces the estimation error.

Fig. 5.1. The depth recovery results of flower and building images. (a) The input defocused images. (b) The sparse blur maps. (c) The final layered dept maps.

As show in Fig. 5.1, the method is tested on some real images. In the flower image, the depth of the scene changes continuously from the bottom to the top of the image. The sparse blur map gives a reasonable measure of the blur amount at edge locations. The depth map reflects the continuous change of the depth. In the building image, there are mainly 3 depth layers in the scene: the wall in the nearest layer, the buildings in the middle layer, and the sky in the farthest layer. Our method extracts these three layers quite accurately and produces the depth map shown in Fig. 5.1(c). Both of the results are obtained using 10 labels of depth with the blur amount from 0 to 3. One more example is the book image shown in Fig. 1.1. The result is obtain using 6 depth labels with blur amount from 0 to 3. As we can see from the recovered depth map, our method is able to obtain a good estimate of the depth of the scene from a single image. In Fig. 6, we compare our method with the
Department Of CSE 12 2013-2014

On the Recovery of Depth from a Single Defocused Image inverse diffusion method [13]. Both methods generate reasonable layered depth maps. However, our method has higher accuracy in local estimation and thus, our depth map captures more details of the depth. As shown in the figure, the difference in the depth of the left and right arms can be perceived in our result. In contrast, the inverse diffusion method does not recover this depth difference.

Fig. 5.2. Comparison of our method and the inverse diffusion method. (a) The input image. (b) The result of inverse diffusion method. (c) Our result

6. AMBUIGUITIES
There are two kinds of ambiguities in depth recovery from single image using defocus cue. 6.1 Focal Plane ambiguity When an object appears blur in the image, it can be on either side of the focal plane. To remove this ambiguity, most of the depth from defocus methods including this method assumes all objects of interest are located on one side of the focal plane. When taking images, the focus point on the nearest/farthest point in the scene is used. 6.2 Blur/sharp edge ambiguity The defocus measure obtained may be due to a sharp edge that is out of focus or a blur edge that is in focus. This ambiguity is often overlooked by previous works and may cause some artifacts in our result. One example is shown in Fig. 6.2.1. The region indicated by the white rectangle is actually blur texture of the photo in the frame, but
Department Of CSE 13 2013-2014

On the Recovery of Depth from a Single Defocused Image our method treats it as sharp edges due to defocus blur, which results in error estimation of the depth in that region. Additional images are usually used to remove the blur texture ambiguity.

Fig. 6.1. The depth recovery result of the photo frame image. (a) The input defocused image. (b) Recovered layered depth map.

7. CONCLUSION
In this paper, we show that the depth of a scene can be recovered from a single defocused image. A new method is presented to estimate the blur amount at edge locations based on the gradient magnitude ratio. extracted using MRF The layered depth map is then noise,

optimization. We show that our method

is robust to

inaccurate edge location and interferences of neighboring edges and can generate more accurate scene depth maps compared with existing methods. We also discuss ambiguities arising in recovering depth from single images using defocus cue. In the future, we would like to apply our blur estimation method to images with motion blur to estimate the blur kernels.

Department Of CSE

14

2013-2014

On the Recovery of Depth from a Single Defocused Image

REFERENCES
1.

Barnard, S., Fischler, M.: Computational stereo. ACM Comput. Surv. 14(4) (1982) 553572

2.

Dhond,

U., Aggarwal,

J.:

Structure from stereo:

A review.

IEEE\Trans. Syst. Man Cybern. 19(6) (1989) 14891510


3.

Dellaert, F., Seitz, S.M., Thorpe, C.E., Thrun, S.: Structure from motion without correspondence. In: Proc. CVPR. (2000) 557564

4.

Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: A factorization method. Int. J. Comput. Vision 9 (1992) 137154

5.

Asada, N., Fujiwara, H., Matsuyama, T.: Edge and depth from focus. Int. J. Comput. Vision 26(2) (1998) 153163

6.

Nayar, S., Nakagawa, Y.:

Shape from focus.

IEEE

Trans.

Pattern Anal. Mach.Intell. 16(8) (1994) 824831


7.

Favaro, P.,

Favaro, P.,

Soatto, S.:

A geometric approach to

shape from defocus. IEEE Trans. Pattern Anal. Mach. Intell. 27(3) (2005) 406417
8.

Pentland, A.P.:

A new sense for depth of field. IEEE Trans.

Pattern Anal. Mach. Intell. 9(4) (1987) 523531


9.

Moreno-Noguer, F.,

Belhumeur, P.N.,

Nayar, S.K.:

Active

refocusing of images and videos. ACM Trans. Graphics (2007) 67


10.

Nayar, S.K., Watanabe, M., Noguchi, M.: Real-time focus range sensor. IEEE Trans. Pattern Anal. Mach. Intell. 18(12) (1996) 11861198

11.

Levin,A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. In: ACM Trans. Graphics, ACM (2007)

Department Of CSE

15

2013-2014

On the Recovery of Depth from a Single Defocused Image


12.

Saxena,

A., Sun,

M., Ng, A.:

Make3d:

Learning 3d scene

structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. (2008) 11
13.

Namboodiri, V.P.,

Chaudhuri, S.: Recovery

of relative depth

from a single observation using an uncalibrated (real-aperture) camera. In: Proc. CVPR. (2008)
14.

Hecht, E.: Optics (4th Edition). Addison Wesley (August 2001)

15.

Komodakis, N., Tziritas, G., Paragios, N.: Performance vs computational efficiency for optimizing single and dynamic mrfs: Setting the state of the art with primal- dual strategies. Proc. CVIU 112(1) (2008) 1429

16.

Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6) (1986) 679698

Department Of CSE

16

2013-2014