Академический Документы
Профессиональный Документы
Культура Документы
Abstract
1. Introduction
We wish to take only a single grayscale image of an
object and estimate the shape, albedo, and illumination
that produced that image (Figure 1). This inverse optics
problem is terribly underconstrained: the space of albedos,
shapes, and illumination that reproduce an image is vast.
But of course, not all albedos and shapes are equally
likely. Past work has demonstrated that simple statistics
govern natural images [8, 23], and we will construct models
of the similar statistics that can be found in natural albedo Figure 1. Our algorithm takes only a single masked grayscale im-
age as input (shown on the left) and produces as output a depth
and shape. Our algorithm is simply an optimization prob-
map, albedo map, shading image, and spherical harmonic illu-
lem in which we recover the most likely shape, albedo, and
mination model. These images were taken by the authors with a
illumination under to our statistical model, such that a sin- cellphone camera in uncontrolled indoor and outdoor illumination
gle image is exactly reproduced. Our priors are effective conditions. All images and results in this paper were produced us-
enough that shape, albedo, and illumination can be recov- ing the same piece of code with the same parameter settings. The
ered from real-world images, and are general enough that shading image is a rendering of the recovered depth under the
they work across a variety of objects: a single model learned recovered illumination. Depth is shown with a pseudo-color visu-
on teabags and squirrels can be applied to images of cof- alization (red is near, blue is far). Many more similar results can
fee cups and turtles. Our model can be seen in Figures 1, 2 be found in the supplementary material.
1
(a) Input Image & (b) Ground Truth (c) Our Model (d) Retinex [16, 11] (e) Tappen et al. (f) Barron & Malik (g) J. Shen et al. (h) L. Shen & Yeo
Illumination + SFS 2005 [28] + SFS 2011 [2] 2011 [26] + SFS 2011 [27] + SFS
Figure 2. A visualization of our technique, and others, for the task of recovering shading, albedo, and shape, given a single grayscale image
and a known illumination. We benchmark against the other algorithms on the known-illumination task because no other algorithm we are
aware of is capable of recovering shape, albedo, shading, and illumination for general objects.
and 7, and in the supplementary material. nique from being useful in the wild. This paper can be
The problem we address was first posed to the com- viewed as an extension to that work in which the priors on
puter vision community as the intrinsic images problem shape and albedo are dramatically improved, the require-
[3], though it had been studied earlier in other fields as the ment of low-frequency shape information is relaxed (due to
problem of lightness constancy [10]. Over time the intrin- the strength of our priors), and illumination is not required
sic images problem has been simplified to the problem of to be known.
separating an image into shading and albedo [11, 16, 19], Let us present a modification to the problem formula-
which has seen some recent progress [26, 27, 28] mostly tion of [2]. Assuming Lambertian reflectance and ortho-
by relying on color as a cue. However, none of these al- graphic projection, given a log-albedo map A, a depth-map
gorithms work well on grayscale images (see Figure 2, Ta- Z, and illumination L, the log-intensity image I is defined
ble 1), nor do they allow shape (with the exception of [2]), as I = A + S(Z, L). The function S(Z, L) is the log-
or illumination to be directly recovered. Our model relates shading image of Z with respect to L: it linearizes Z into
to the classic shape-from-shading problem [15] as we rely a set of normals and renders those normals using L, a model
on shading as a cue for depth, but SFS techniques gener- of spherical harmonic illumination (detailed in the supple-
ally require albedo and illumination to be known, which we mentary material). Now let us assume that the image I and
do not. Shape-from-contour is another classic single-image illumination L have been observed, but Z and A are un-
shape recovery technique [7, 18] which we build upon. known. This problem is underconstrained, so we impose
priors on Z and A and search for the most likely shape and
A competing approach to single-image techniques is to albedo that explain image I. This is the same as minimiz-
better constrain the problem with additional data. Instances ing the sum of g(A) and f (Z), which will be defined later
of this approach are photometric stereo [30], structure from as being (loosely) equivalent to the negative log-likelihoods
motion [12], and inverse global illumination [31]. All these of Z and A respectively. The optimization problem is:
techniques depend on multiple observations, and they all
break down when given only a single image. Other work minimize g(A) + f (Z) (1)
Z,A
has explored recovering the spatial layout of a scene from
subject to I = A + S(Z, L)
a single image [14, 25], but these techniques do not re-
cover albedo or illumination, and the recovered shape is This is the same shape and albedo from shading (SAFS)
very coarse. Morphable models [6] recover shape, albedo, formulation as in [2], but with different priors, a different
and illumination, but require extremely specific models of rendering engine, and formulated in log-intensity space.
the objects being estimated, and therefore do not work for We will present g(A), two priors1 on albedo: one which
general objects. encourages piecewise-smoothness by placing heavy-tailed
We build heavily on the problem formulation of Barron 1 Throughout this paper we use the term prior loosely. We refer to
and Malik [2], which addressed a very constrained case of loss functions or regularizers on Z or A as priors because they have an
interpretation as the unnormalized negative log-likelihood of some statisti-
our problem. Their model required that a low-frequency cal model. We refer to minimizing entropy as a prior, which is again an
observation of shape be known and that illumination be abuse of terminology. Our occluding contour prior requires first observ-
known, two fundamental limitations that prevent the tech- ing the silhouette of an object, and is therefore a posterior, not a prior.
distributions on the multiscale gradient norm of log-albedo,
and one which encourages a low entropy in the marginal
distribution of log-albedo across all scales of an entire im-
age. We then present f (Z), three priors on shape: one
which encourages fronto-parallel flatness (primarily to ad-
dress the bas-relief ambiguity [4]), one which informs the
surface orientation near an objects occluding contour, and
one which encourages a novel measure of smoothness by
placing heavy-tailed distributions on the multiscale gradi-
ent norm of the mean curvature of shape. We then extend
the problem formulation in Equation 1 to present a novel (a) kAk at different scales (b) kH(Z)k at different scales
framework for recovering illumination in addition to shape Figure 3. The log-likelihoods of Gaussian scale mixtures learned
and albedo, using only our priors. on kAk and kH(Z)k at different scales, using the training set
of our dataset. We see similar heavy-tailed distributions as have
been observed in natural image statistics [8, 23]. These distribu-
2. Priors on Albedo tions are the only learned parameters in our entire model.
addressed by our model, and are often the cause of errors. [19] E. H. Land and J. J. McCann. Lightness and retinex theory. JOSA, 1971. 2, 3
[20] H. P. Moreton and C. H. Squin. Functional optimization for fair surface design.
Our technique is targeted towards objects; to be extended to In SIGGRAPH, 1992. 5
scenes, one could use segmentation techniques to generate [21] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising
candidate objects. using scale mixtures of gaussians in the wavelet domain. IEEE Trans. Image
Process, 2003. 3
Our model produces qualitatively reasonable results on
[22] J. C. Principe and D. Xu. Learning from examples with quadratic mutual infor-
arbitrary grayscale images taken in the real world, and dra- mation. Workshop on Neural Networks for Signal Processing, 1998. 4
matically outperforms all previously published algorithms [23] D. Ruderman and W. Bialek. Statistics of natural images: Scaling in the woods.
on the MIT Intrinsic Images dataset. Our technique, to the Physical Review Letters, 1994. 1, 3
[24] R. Salakhutdinov, S. Roweis, and Z. Ghahramani. Optimization with em and
best of our knowledge, is the first unified model for jointly expectation-conjugate-gradient. ICML, 2003. 6
estimating shape, albedo, and illumination from a single im- [25] A. Saxena, M. Sun, and A. Ng. Make3d: learning 3d scene structure from a
age. single still image. TPAMI, 2008. 2
[26] J. Shen, X. Yang, Y. Jia, and X. Li. Intrinsic images using optimization. CVPR,
2011. 2, 3, 7
References [27] L. Shen and C. Yeo. Intrinsic images decomposition using a local and global
[1] N. Alldrin, S. Mallick, and D. Kriegman. Resolving the generalized bas-relief sparse representation of reflectance. CVPR, 2011. 2, 3, 4, 7
ambiguity by entropy minimization. CVPR, 2007. 4 [28] M. F. Tappen, W. T. Freeman, and E. H. Adelson. Recovering intrinsic images
[2] J. T. Barron and J. Malik. High-frequency shape and albedo from shading using from a single image. TPAMI, 2005. 2, 3, 7
natural image statistics. CVPR, 2011. 2, 3, 5, 6, 7 [29] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon. Global stereo reconstruction
under second-order smoothness priors. TPAMI, 2009. 5
[3] H. Barrow and J. Tenenbaum. Recovering Intrinsic Scene Characteristics from
Images. Academic Press, 1978. 2 [30] R. Woodham. Photometric method for determining surface orientation from
multiple images. Optical engineering, 1980. 2
[4] P. Belhumeur, D. Kriegman, and A. Yuille. The Bas-Relief Ambiguity. IJCV,
1999. 3, 4, 5 [31] Y. Yu, P. Debevec, J. Malik, and T. Hawkins. Inverse global illumination: recov-
ering reflectance models of real scenes from photographs. SIGGRAPH, 1999.
[5] P. Besl and R. Jain. Segmentation through variable-order surface fitting.
2
TPAMI, 1988. 5
[6] V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces.
SIGGRAPH, 1999. 2
[7] M. Brady and A. Yuille. An extremum principle for shape from contour.
TPAMI, 1983. 2, 5
[8] D. Field. Relations between the statistics of natural images and the response
properties of cortical cells. JOSA A, 1987. 1, 3
[9] G. D. Finlayson, M. S. Drew, and C. Lu. Entropy minimization for shadow
removal. IJCV, 2009. 4
[10] A. Gilchrist. Seeing in Black and White. Oxford University Press, 2006. 2