Вы находитесь на странице: 1из 10

2018 International Conference on 3D Vision

Joint Material and Illumination Estimation from Photo Sets in the Wild

Tuanfeng Y. Wang Tobias Ritschel Niloy J. Mitra


University College London

Abstract virtual objects into Internet images. For example, in Fig-


ure 1, imagine transferring the red chair from one image to
Faithful manipulation of shape, material, and illumina- another. Currently, this task is challenging as we neither
tion in 2D Internet images would greatly benefit from a reli- have access to the (red) chair’s material, nor the illumina-
able factorization of appearance into material (i.e., diffuse tion in the target scene.
and specular) and illumination (i.e., environment maps). In In this paper, we investigate the problem of material and
this work, we propose to make use of a set of photographs in illumination estimation directly from ‘in the wild’ Internet
order to jointly estimate the non-diffuse materials and sharp images. The key challenge is that material and illumina-
lighting in an uncontrolled setting. Our key observation is tion are never observed independently, but only as the result
that seeing multiple instances of the same material under of the convolving reflection operation with (estimated) nor-
different illumination (i.e., environment), and different ma- mal direction and view direction (assuming access to rough
terials under the same illumination provide valuable con- geometry and pose estimates). Thus, in absence of further
straints that can be exploited to yield a high-quality solu- assumptions, we cannot uniquely recover material or illu-
tion (i.e., specular materials and environment illumination) mination from single observations (i.e., images). Instead we
for all the observed materials and environments. Techni- rely on linked observations. We observe that often Internet
cally, we enable this by a novel scalable formulation using images record the same objects in different environments
parametric mixture models that allows for simultaneous es- (i.e., illuminations), or multiple objects in the same envi-
timation of all materials and illumination directly from a ronments. Such linked observations among all the materials
set of (uncontrolled) Internet images. At the core is an opti- and illuminations form a (sparse) observation matrix pro-
mization that uses two neural networks trained on synthetic viding critical constraints among the observed materials and
images to predict good gradients in parametric space given illumination parameters. We demonstrate that such a spe-
observation of reflected light. We evaluate our method on cial structure can be utilized to robustly and accurately esti-
a range of synthetic and real examples to generate high- mate all the material and illumination parameters through a
quality estimates, qualitatively compare our results against global optimization.
state-of-the-art alternatives via a user study. Code and data
are available on the project website1 . We choose a formulation based on the basic rendering
equation in combination with available per-pixel geometry
estimation. However, there are multiple challenges: (i) ac-
1. Introduction cess to only approximate proxy geometry for the scene ob-
jects with rough pose estimates leads to inaccurate normal
Estimating realistic material (i.e., reflectance) and illu- estimates; (ii) partial observations due to view bias (e.g.,
mination along with object geometry remains a holy grail of chair backs are photographed less often) and sparsely ob-
shape analysis. While significant advances have been made served normal directions (e.g., flat regions in man-made ob-
in the recent years in predicting object geometry and pose jects); (iii) working with the rendering equation when up-
from ‘in the wild’ Internet images, estimation of plausible dating material and illumination parameters in an inverse
material and illumination has remained elusive in uncon- problem setup is inefficient in a standard physically-based
trolled settings and at a large scale. rendering pipeline; and finally, (iv) access to limited data
Successful material and illumination estimation, how- due to sparsely observed joint material-illumination pairs.
ever, will enable unprecedented quality of AR and VR ap- In order to overcome the above challenges, we pro-
plications like allowing realistic ‘transfer’ of objects across pose a novel formulation using parametric mixture models.
multiple photographs, or inserting high-quality replicas of We propose to approximate the reflection operator and its
1 http://geometry.cs.ucl.ac.uk/projects/2018/joint-material- derivative with respect to material and illumination in terms
illumination-estimation of Anisotropic Spherical Gaussians (see [48]) that can be

2475-7888/18/$31.00 ©2018 IEEE 22


DOI 10.1109/3DV.2018.00014
= x

ial
na on

t
Maer
es

umi
mag

l
Il
nputi

Novel Novel Nov


el Nov
el
view il
l
um. mat. bot
h

a ons
I

c
i
Appl
Figure 1. We factor a set of images (left) showing objects with different materials (red, yellow, black, white plastic) under different
illumination into per-image illumination and per-object material (top right) that allows for novel-x applications such as changing view,
illumination, material, or mixed illumination/material (red chair in the left-bottom imaged environment) (bottom right).

efficiently utilized to jointly optimize for the materials and factors [27], intrinsics from rendered data [35], decompose
illumination at a large scale (i.e., involving multiple mate- images into rendering layers [17, 22, 26], or multiple mate-
rials and illuminations). This optimization is driven by two rials under the same illumination [12].
neural networks that were trained on a large set of materials
and illuminations to estimate accurate gradients.
Image and shape collections. Visual computing has
We extensively evaluate our method on both synthetic made increasing use of data, particularly image and/or
and real data, both quantitatively and qualitatively (using shape collections with the aim to exploit cross observa-
a user study). We demonstrate that increasing the amount tions. Starting from illumination [9] and its statistics [10],
of linked material-illumination observations improves the measurements of BRDFs [25], we have seen models of
quality of both the material and illumination estimates. shape [30], appearance [28], object pose estimate [3], object
This, in turn, enables novel image manipulations previously texture [44], object attributes [16] made possible by discov-
considered to be very challenging. ering correlation across observations in image and/or 3D
model collections. In the context of shape analysis, mutual
2. Related Work constraints of instances found across images or 3D scenes in
the collection have been used to propose room layouts [49],
Materials and illumination estimation from images. material assignments [18], or scene color and texture as-
The classic intrinsic image decomposition problem [6] is signments [8]. Instead, we directly estimate materials and
highly ambiguous as many shapes, illuminations, and re- illumination, rather than solving an assignment problem.
flectances can explain one observation made. When geom-
etry and material for the objects in an image are known,
3. Overview
finding the illumination is a problem linear in a set of ba-
sis images [24]. Reflectance maps [15] can also be used to Starting from a set of linked photographs (i.e., multiple
map surface orientation to appearance, allowing for a lim- objects observed in different shared environments), our goal
ited range of applications, such as novel views [33]. In ab- is to retrieve object geometry with pose predictions and es-
sence of such information, alternatives regularize the prob- timate per-object materials and per-environment illumina-
lem using statistics of each component such as texture [34], tions. The estimated information can then be used to faith-
or exploit user annotations on Internet images [7] to develop fully re-synthesize original appearance and more impor-
a CRF-based decomposition approach. tantly, obtain plausible view-dependent appearance. Fig-
Haber et al. [14] used observations of a single known ure 2 shows baseline comparisons to alternative approaches
geometry observed in a small set of images to estimate a to assign materials to photographed objects. We observe
linear combination of basis BRDFs and pixel-basis light- that even if the geometry and light is known, the highlights
ing. Aittala et al. [1] capture texture-like materials by fitting would either be missing (using intrinsic image [5] for esti-
SVBRDF using texture statistics to regularize a non-linear mating average albedo), or not move faithfully (e.g., with
optimization on single image capture. An alternate recent projective texturing) under view changes.
trend is to use machine learning to solve inverse render- As input, we require a set of photographs of shared ob-
ing problems. Deep learning of convolutional neural net- jects with their respective masks. In particular, we assume
works CNNs (cf., [21]) has been used to decompose Lam- the materials segmentation to be consistent across images.
bertian shading [38, 4], albedo in combination with other As output, our algorithm produces a parametric mixture

23
Pr
ojecve S
IRF
S Our
s
te
xturi
ng

Figure 2. Comparison to alternatives (projective texturing, average RGB of intrinsic images [5]). We see that only a proper separation into
specular materials and natural illumination can predict appearance in novel views. Other approaches miss the highlight, even in the original
view (average of intrinsic), or does not move under view changes (projective texturing). Please refer to the accompanying video to judge
the importance of moving highlights under view changes.

model (PMM) representation of illumination (that can be Third, the estimated material and illumination informa-
converted into a common environment map image) for each tion can directly be used in standard renderers. The chal-
photograph and the reflectance parameters for every seg- lenge in such applications is to capture view-dependent ef-
mented material. We proceed in three steps. fects such as moving highlights.
First, we estimate object geometry and pose, and convert
all the input images into an unstructured reflectance map 4. Algorithm
for each occurrence of one material in one illumination in
4.1. Acquiring Geometry and Reflectance Maps
Section 4.1. Since we work with very few images collected
from the wild, our challenge is that this information is very We start from a set of images with the relevant materi-
sparse, incomplete, and often contradict each other. als segmented consistently across the image set. Designer
Second, we solve for illumination for each image and re- websites (e.g., Houzz) and product catalogs (e.g., Ikea) reg-
flectance model parameters for each material in Section 4.4. ularly provide such links. We assume that the links are ex-
This requires combining a very large number of degrees of plicitly available as input. First, we establish a mapping be-
freedom, as fine directional lighting details as well as ac- tween illumination material-pairs and observed appearance.
curate material parameters to be estimated. The challenge
is that a direct optimization can easily involve many vari-
ables non-linearly coupled and lead to a cost function that is
highly expensive even to evaluate as it involves solving the
forward rendering equation, e.g., [23]. For example, repre-
senting images and illumination in the pixel basis leads to Figure 3. RGB, normal, and segmentation of a typical input image.
an order of 104 -105 variables (e.g., 128 × 256×number-of-
environment-maps). At the same time, evaluating the cost
function for every observation pixel would amount to gath-
ering illumination by iterating all pixels in the environment Per-pixel labels. For the input images, we used per-pixel
map, i.e., an inner loop over all 128 × 256 environment map orientation (screen-space normals) (Figure 3) obtained us-
pixels inside an outer loop across all the 640×480×number- ing render-for-CNN [37] trained on the ShapeNet to retrieve
of-images-in-the-set observations. This quickly becomes object geometry and pose estimates. We found this to pro-
computationally intractable. vide better quality normal predictions than those obtained
via per-pixel depth [11] and normal [45] estimation.
Instead, we introduce a solution based on parametric
mixture-model (PMM) representation of illumination to in-
verse rendering, which has been successfully applied to for- Reflectance maps. The rendering equation [20] states
ward rendering [13, 43, 47, 48, 41]. Our core contribution Z
is to take PMM a step further by introducing the paramet- Lo (x, n, ωo ) = Le (x, ωo ) + fr (x, ωi , ωo ) Li (x, ωi ) < n, ωi >+ d ωi ,
ric mixture reflection operator and an approximation of its
| {z } Ω | {z } | {z } | {z }
Emit BRDF Incom. Geometry
gradient, allowing to solve the optimization in a scalable (1)
fashion involving many materials and environments. The
gradient approximation uses a neural network (NN) to map where x is the position, n the surface normal at location x,
from observed reflected light, illumination and material pa- < · >+ is dot product clamped between [0, +∞), ωo the
rameters to changes of illumination and material parame- observer direction, Lo is the observed radiance, Le is light
ters. The NN is mainly to accelerate the computation that emission, Li is the incoming illumination, and fr the bi-
otherwise would require solving the rendering equation. It directional reflectance distribution function (BRDF) [29].
is trained on a set of synthetic images rendered from many We assume a simplified image formation model that al-
illuminations and many materials. lows for using a slightly generalized variant of reflectance

24
maps [15]: (i) distant illumination, (ii) convex objects, is a sum of np lobe functions p(ω|Θl ) that depend on a pa-
i.e., no shadows or inter-reflections, (iii) spatially invariant rameter vector Θl to approximate, in our setting, the incom-
BRDFs, and (iv) no emission. Note that we do not assume ing or outgoing light function L(ω). All parameter vectors
a distant viewer as typical reflectance map does. This sim- Θl of one PMM are combined in a parameter matrix Θ.
plifies Eq. 1 to In our case, the domain of g is the sphere Ω parameter-
Z ized using latitude-longitude representation ω = (θ, φ) ∈
Lo (ωo , n) = fr (ωi , ωo )Li (ωi ) < n, ωi >+ d ωi . (2) [0, 2π) × [0, π).
Ω As mode functions, we employ Isotropic Spherical
Gaussians (ISGs) [13, 40, 41]. An ISG lobe has the form
A classic reflectance map is parameterized either by nor-
2
mal n or by the observer direction ωo . Instead of making p(ω|Θ) = w · e−σ(ω−z) ,
such a split, we take a less structured approach tailored to
where w ∈ R+ is the weight of the lobe, σ is its variance
our problem: an unstructured reflectance map (URM) de-
and z the mean direction. Consequently, a lobe is described
noted by O that uses a list that holds in each entry a tu-
by parameter vector Θ = (w, σ, z). To work with RGB
ple of (i) normal on , (ii) half-angle vector h (cf. Figure 4),
values all weight components w in this paper are vector-
(iii) observed radiance oL , and (iv) indices om and oi of
valued, but the variance parameter σ is scalar. For each
the material and illumination, respectively. We denote h
image, we use an ISG PMMs with np = 32 components to
as the half-angle vector for front (−z) and observer direc-
represent unknown illuminations.
tion, h = (< 2n, ωo > ·n − ωo + (0, 0, −1))/2. This
parametrization will provide a more convenient way to in-
dex information. An example visualization of the URM by Material. We assume the material to be of the form
projecting the n as well as the h coordinate using latitude- fr (ωi , ωo |ρ) = kd fd (ωi , ωo ) + ks fs (ωi , ωo |r), (4)
longitude is seen in Figure 4. | {z } | {z }
Diffuse Specular

θ θ θ θ a parametric model that can be split into the weighted sum


-z n
ωo of a diffuse and a specular component fd and fs with
n

h ϕ ϕ ϕ ϕ weights kd and ks , repectively. We choose Lambertian as


θ θ θ θ the diffuse model and GGX [42] that has a single roughness
r parameter r as the specular model. The material param-
h

ϕ ϕ ϕ ϕ eters are therefore a tuple ρ = (kd , ks , r) ∈ R7 of RGB


Chair 1 Chair 2 Chair 3 Chair 4 diffuse and specular reflectance and a scalar roughness pa-
Figure 4. Schema and actual Unstructured Reflectance Maps. Each rameter. We denote the BRDF parameter vector of material
point is an observed color for a specific surface orientation n and j as ρ(j) . Note that we do not need to represent fr using a
half-angle vector h.
PMM, which would introduce unnecessary approximation.
To acquire the URM from an image with given per-pixel 4.3. Reflection
position and orientation, we apply inverse gamma correc-
Using standard notation [2] for light transport, we ex-
tion (γ = 2.2) such that oL is in physically linear units.
press reflection as an operator R, mapping the function of
We use a simplified pinhole camera model with an equiv-
incoming light Li → Lo :
alent 35mm focal length. Further, we do not differentiate Z
between objects and consider only their materials (i.e., an Lo (ωo ) = R(Li |ρ)(ωo ) = Li (ωi ) fr (ωi , ωo |ρ) dωi .
object with two material parts are essentially treated as two Ω | {z } | {z }
Illumination BRDF
materials).
(5)
4.2. Representation When using an ISG to represent the illumination, we sug-
Illumination. We use Parametric Mixture Mod- gest to use a parametric reflection operator R(Θ|ρ) that
els (PMMs) to represent illumination. PMMs have maps from a single illumination ISG lobe Θ and a material
been used for pre-computed light transport [13, 40, 43], ρ to reflected light. As we assume the BRDF to be a sum
BTF compression [47], interactive rendering [39], impor- of a diffuse and a specular part, we can similarly define D
tance sampling [41], or even in caustic design [32]. A and S that are respectively the diffuse and the specular-only
PMM encoded as reflection and R = D + S. So, finally we have
nl
np X
X Lo (ωo ) = D(Θl |ρ) + S(Θl |ρ). (6)
g(ω|Θ) = p(ω|Θl ) ≈ L(ω) (3)
l=1
l=1

25
( ) ( )
a) Illumination 2 Illumination 1 Illum. PMM 1 Illum. PMM 2 d)
+S +D
Radiance

Radiance

Radiance

Radiance
Orientation
Material 1
Orientation
Mat. PMM 1

ks,g
Orientation

URM 1/1
Orientation

URM 1/2 ( ) ( )
Reflectance

Reflectance

Radiance

Radiance
kd

Optimization
Orientation Orientation Orientation Orientation
Material 2 Mat. PMM 2 URM 2/1 URM 2/2
ks,g
Reflectance

Reflectance

Radiance

Radiance
kd
^ ^
Orientation Orientation Orientation Orientation S D

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
b)
R =R =R +R =S +D +S +D +S +D

( ) ( ) ( ) ( ) ( )( ) ( ) ( )( ) ( )
c)

R =R =R +R =S +D +S +D +S +D

Figure 5. The three main ideas to enable large-scale optimization: (a) approximating illumination as parametric mixture models and the
BRDF as a sum of a diffuse and a specular component; (b, c) expressing reflection as a sum of diffuse and specular reflections of individual
lobes; and (d) approximating derivative of diffuse and specular reflection of ISGs using corresponding neural nets.

Roug
hnes
s.001 .
08 .
2 .
8 S
pec
ula
r Di
ffus
e
10-4

r
Eror
I
ll
um.
lobev
ar.
σ
10-4
GTi
l
lumi
na on GT

r
Eror
Ma
t.Roug
hnes
sr

PMM i
l
lumi
na on GT
+PPM

10-3

r
Eror
PMM i
l
lumi
na on NN+
PMM I
ll
um.
lobev
arσ
Figure 6. Evaluation of the neural network. The first row shows GT renderings with a GT envmap. The second row shows again GT
rendering, but using the GMM fit to the envmap. This is an upper bound on the NN quality, as it works on the GMM representation. The
third row shows the NN result. In the horizontal direction, specular results of increasing roughness are followed by the diffuse result in the
rightmost column. The plots on the right below show the error distribution as a function of different parameters.

4.4. Formulation for an effective computation of its gradient with respect to


illuminations and materials in order to be useful in an opti-
Our task is to find a set of illuminations and a set of mate-
mization. We formulate the objective as:
rials that explain the acquired observations (see the previous
np
2
section). Next, we describe how to represent reflectance and

X X 
(oi ) (om )

illumination as well as introduce the parametric reflectance c(Θ, ρ|O) = oL − R Θl |ρ (ωo ) + λp(Θ) .

operator, its derivative with respect to material and illumi-

o∈O l=1
| {z } | {z }
nation, and an approximation method for efficient joint op- Data Prior
timization for material and illumination given the observa- (7)
tions (see Figure 5).
The gradient of this function with respect to the illumina-
tion and material comprises of evaluating R, which involves
Cost function. Our main objective function quantifies convolving an illumination lobe with the BRDF. This is both
how well a set of materials and illuminations explain the costly to compute and we need to find its derivative. To this
input observation. It should be fast to evaluate and allow end, we will employ a learning-based approach.

26
Neural network. The input to this neural network (NN) p(Θ)
P =P V(q(Θ)) = E(q(Θ)2 ) − E(q(Θ))2 , where q(Θ) =
np
is the parameters of a single illumination lobe, the material Θ∈Θ i=1 Θw,i . In other words by, first computing the
parameters, and the observation direction ω. The NN can average color of all lobes q(Θ) and second the variance
do this mapping faster, than Monte Carlo-rendering could V(q(Θ)) of those three channels.
do, that solves the rendering equation by averaging many
samples. On 100 random roughness values, it takes our NN
Optimization. Armed with a fast method (see above) to
0.22 s on average to compute an image (with 10 k pixels
evaluate the cost function, we employ L-BFGS [50] in com-
and 50 illumination lobes) that is similar to a MC image
bination with randomization. As the full vector O does not
computed in 5.7s (DSSIM 0.039; 0 meaning identical).
fit into memory, we use a randomized subset that fits GPU
We call this approximation R̂. The output is an RGB
memory in each iteration and dynamically change the ran-
value. We keep the NNs for the diffuse and specular com-
domization set across iterations. We stop when each obser-
ponents to be separate and independently process the RGB
vation on average has been sampled 5 times.
channels. The corresponding approximations using NNs are
denoted as D̂ and Ŝ, respectively. The network architecture 4.5. Rendering
is shown in Figure 7. The input to the network is a 12-
dimensional vector and differs between diffuse and specu- For rendering, the result of the optimization is simply
lar NNs. Both consume the parameters of a single illumina- converted into an HDR environment map image by evalu-
tion lobe (direction and variance). However, the diffuse net ating the estimated PMM for each pixel. Such an environ-
consumes the normal while the specular net consumes the ment map along with estimated diffuse/specular parameters
half-angle. All layers are full convolutional with 288 units are then used with standard offline and online rendering ap-
in each layer. The networks are trained from 200 k samples plications as shown in our results.
from renderings of spheres produced using Blender. We
use two million random view-normal- pairs with appear- 5. Results
ance from random MERL materials and random illumina-
We use L-BFGS solver for all the experiments. The com-
tion for training and testing. An evaluation of this architec-
plexity of our optimization in terms of the number of vari-
ture is found in Figure 6.
ables is (7m + 6np n) and hence is linear in terms of the
number of input entries in the material×environment obser-
+x +x
Normal vation matrix. For example, for a five-photo, five-material
matrix dataset, it costs about 30 minutes using a NVIDIA
Diffuse NN D

Titan X GPU. Pre-training the reflection operator R, both


Illum
dir. + var. diffuse and specular components, takes about three hours
Specular NN S

on the same specification.

Half angle 5.1. Evaluation


12 12 288 288 288 288 1
5.1.1 Datasets
Figure 7. Our diffuse (orange) and specular (blue) neural network
architecture, that consumes either normal and a single illumination We evaluated our method using three types of data sets, each
lobe, or half-angle (left) and a lobe to produce a color (right). holding multiple image sets acquired in different ways. The
full resolution images and result images/video are included
in the supplementary material.
Prior. Several models for the statistics of illumination The first comprises of SYNTHETIC images rendered us-
[10] and BRDFs [25, 31] exist. All these could be included ing Mitsuba [19], a collection of HDR environment maps
in the prior in Eq. 7. Instead of using any advanced prior, we and MERL [25] materials. Note that here we have ac-
here opt for the most simple method that can disambiguate cess to ground-truth per-pixel normals and material labels.
situations that are not unique even for a set of input images: Here, we have rendered 3 objects in 3 different scenes with
if multiple explanations are possible, we choose the one both spheres and real-world shapes that allow synthetic re-
where the illumination is colorful. In other words: Observ- combination in an arbitrary fashion.
ing a sphere that is red is rather interpreted as a red-material The second data set consists of real images collected
sphere under white light than a white-material sphere under from the I NTERNET. We have manually selected the im-
red light. Note that α is set to a value that is so low that it ages (using iconic object name-based Google search) and
only affects ambiguous input. masked the image content. This dataset has three sets
To this end, our prior penalizes the variance of the of photographs: the LAPD car (I NTERNET- LAPD), the
illumination lobe colors i.e., their RGB weights, as in Docksta table (I NTERNET-D OCKSTA), and the Eames DSW

27
Figure 8. Results on the I NTERNET- LAPD dataset of four images of police cars with two materials. The first row shows the input images.
The second row the reflectance maps. The observed ones are marked with black circles. In this example, all are observed. When an RM is
not observed, it is re-synthesized. The third row shows our estimated illumination. Recall, that it is defined in camera space. The fourth row
contains a re-synthesis using our material and illumination. Please note, that such a re-synthesis is not possible using a trivial factorization
as all images have to share a common material that sufficiently explains the images. The last row shows a re-synthesis from a novel view,
as well as a rendering of the materials in a new (Uffizi) illumination.

chair (I NTERNET-E AMES) (see supplementary). For ge- cled in black. The objects are rendered from an identical
ometry, we used coarse quality meshes available from novel view, which is more challenging than rendering from
ShapeNet. Images are good for qualitative evaluation but the original view. The material is shown by re-rendering
do not allow to quantify the error in novel views. it under a new illumination. The exposure between all im-
The third dataset contains P HOTOS we have taken from ages is identical, indicating that the reflectance is directly
designer objects we choose under illumination conditions in the right order of magnitude and can transfer to new il-
(in our labs and offices). We have 3D-scanned these objects luminations. While the illumination alone does not appear
(using Kinect) to acquire their (rough) geometry. The pho- natural, shadow and shading from it produce plausible im-
tos are taken in five different environments and 7 materials ages, even of unknown materials or new objects. Recall that
are considered. large parts of the objects are not seen in any of the input im-
ages and hence large parts of the environment maps are only
5.1.2 Qualitative evaluation estimated from indirect observation. Recall that our method
does not use any data-driven prior to regularize the results.
Visual quality. We show results of our approach in Fig-
ure 8 and in the supplementary. We evaluate the main ob-
jective of our approach, i.e., estimating illumination and re-
flectance from a photo set. In each figure, we show the in-
put images, rendering of all objects’ materials from original The I NTERNET- LAPD in Figure 8 shows a single object
view (with the background from input images) and a novel made from multiple materials. Please see the supplemen-
view as well as visualizations of the material and illumi- tal video for an animation of the highlights under changing
nation alone. Input images are shown on the top with the view or object rotations. The geometry of all objects in this
outputs we produce on the bottom (see supplementary for part (except the chairs) is very approximate and acquired by
full images). Observed reflectance maps are shown encir- a depth sensor. Still a good result quality is possible.

28
5.1.3 Quantitative evaluation 5.3. Application
We evaluate the effect of certain aspects of our method on A typical application of our approach is photo-realistic
the end-result (see supplementary). The error is quantified manipulation of objects in Internet images as shown in Fig-
as DSSIM [46] structural image distance (smaller distance ure 1. Having estimated the material and illumination pa-
indicates better match) between a reference image rendered rameters from all the images, we can insert virtual replica
with known illumination and material compared to another into the image, transfer reflectance estimated from other In-
rendering using our estimated material and illumination. ternet images to new scenes, or introduce new object with
Images were gamma-corrected and linearly tone-mapped material under the estimated illumination (see supplemen-
before comparison. tary for figures).
5.4. User Study
5.2. Comparison
We have compared our approach to the similar approach
We compare possible alternatives to our approach as (SIRFS) that extract intrinsic images and lighting in a user
shown in Figure 2 and supplementary material. A simple study. When asking N = 250 subjects if one of five an-
approach could be using image-based rendering based ap- imated turn-table re-renderings using our material infor-
proaches [36], however, these approaches require either flat mation or the model of SIRFS is preferred when showing
geometry or a high-enough number of images to reproduce both in a space-randomized 2AFC the mean preference was
specular appearance, neither of which is available in our in- 86.5 % in our favor (std. error of the mean is 2.1 %). The
put images that show a single image of one condition only. chart of the user response, their mean, the exact sample
Effectively, IBR would amount to projective texturing in counts and standard errors for individual images are pre-
our setting, that is compared to our approach in Figure 2a. sented in Figure 10.
An alternative could be to run a general intrinsic image ap-
proach [5] on the input images and use the average pixel
color of the albedo kd image as the diffuse albedo. The 100%

specular could then be the color that remains. While this


80%
would provide a specular value ks , it is not clear how to get
a glossiness value g (see Figure 2b). 60%

40%
Nishino 2012
Oxholm and

20%

0%
Police Car Red Chair Yellow Chair Vase Total

Figure 10. User study results. the vertical axis is the preference
Single

for ours, so more is better. Kinks are standard error of the mean,
where small means certainty about the outcome.
Ours

6. Conclusion
Joint

We presented a novel optimization formulation for joint


Figure 9. Comparison of a single-image method [31] (first row), material and illumination estimation from sets of Internet
our approach using a single input image only (middle) and our full images when different objects are observed in varying illu-
approach (last row) on a synthetic input. mination conditions sharing coupled material and/or illumi-
nation observations. We demonstrated that such a linked
While we work on a set of images with normals, other material-illumination observation structure can be effec-
work can produce materials an illumination from a single tively exploited in a scalable optimization setup to recover
image [31, 12]. while the inputs different, it is clear that robust estimates of material (both diffuse and specular) and
any competitive method working on multiple images, must effective environment maps. The estimations can then be
perform better than previous work working on a single im- used for a variety of compelling and photo-realistic image
age. In Figure 9 we compare a method working on three manipulation and object insertion tasks.
individual SYNTHETIC images from a set [31], a variant of Acknowledgement. This work is in part supported by a
our method working on that set with isolated images, and Microsoft PhD fellowship program and ERC StartingGrant
our full approach jointly working on all the images. SmartGeometry (StG-2013-335373).

29
References [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
classification with deep convolutional neural networks. In
[1] M. Aittala, T. Weyrich, and J. Lehtinen. Two-shot SVBRDF NIPS, 2012.
capture for stationary materials. ACM Trans. Graph. (Proc.
[22] G. Liu, D. Ceylan, E. Yumer, J. Yang, and J.-M. Lien. Ma-
SIGGRAPH), 34(4):110:1–110:13, 2015.
terial editing using a physically based rendering network. In
[2] J. Arvo, K. Torrance, and B. Smits. A framework for the
ICCV, pages 2280–8. IEEE, 2017.
analysis of error in global illumination algorithms. In Proc.
SIGGRAPH, pages 75–84, 1994. [23] S. Lombardi and Nishino. Reflectance and illumination re-
[3] M. Aubry, D. Maturana, A. A. Efros, B. C. Russell, and covery in the wild. PAMI, 38(1), 2016.
J. Sivic. Seeing 3D chairs: exemplar part-based 2D-3D [24] S. R. Marschner and D. P. Greenberg. Inverse lighting for
alignment using a large dataset of cad models. In CVPR, photography. In Color and Imaging Conf., pages 262–5,
pages 3762–69, 2014. 1997.
[4] J. T. Barron and J. Malik. Intrinsic scene properties from a [25] W. Matusik, H. Pfister, M. Brand, and L. McMillan. A data-
single RGB-D image. PAMI, 38(4), 2015. driven reflectance model. ACM Trans. Graph., 2003.
[5] J. T. Barron and J. Malik. Shape, illumination, and re- [26] A. Meka, M. Maximov, M. Zollhöfer, A. Chatterjee,
flectance from shading. PAMI, 37(8), 2015. C. Richardt, and C. Theobalt. Live intrinsic material esti-
[6] H. G. Barrow and J. M. Tenenbaum. Recovering intrinsic mation. arXiv preprint arXiv:1801.01075, 2018.
scene characteristics from images. Comp. Vis. Sys., 1978. [27] T. Narihira, M. Maire, and S. X. Yu. Direct intrinsics: Learn-
[7] S. Bell, K. Bala, and N. Snavely. Intrinsic images in the wild. ing albedo-shading decomposition by convolutional regres-
ACM Trans. Graph., 33(4):159, 2014. sion. In ICCV, 2015.
[8] K. Chen, K. Xu, Y. Yu, T.-Y. Wang, and S.-M. Hu. Magic [28] C. H. Nguyen, O. Nalbach, T. Ritschel, and H.-P. Seidel.
decorator: Automatic material suggestion for indoor digital Guiding image manipulations using shape-appearance sub-
scenes. ACM Trans. Graph. (Proc. SIGGRAPH Asia), 34(6), spaces from co-alignment of image collections. Comp.
2015. Graph. Forum (Proc. Eurographics), 34(2), 2015.
[9] P. Debevec. Rendering synthetic objects into real scenes:
[29] F. E. Nicodemus. Directional reflectance and emissivity of
Bridging traditional and image-based graphics with global
an opaque surface. Applied optics, 1965.
illumination and high dynamic range photography. SIG-
GRAPH, 1998. [30] M. Ovsjanikov, W. Li, L. Guibas, and N. J. Mitra. Explo-
[10] R. O. Dror, T. K. Leung, E. H. Adelson, and A. S. Willsky. ration of continuous variability in collections of 3d shapes.
Statistics of real-world illumination. In CVPR, 2001. ACM Trans. Graph., 30(4), 2011.
[11] D. Eigen and R. Fergus. Predicting depth, surface normals [31] G. Oxholm and K. Nishino. Shape and reflectance from nat-
and semantic labels with a common multi-scale convolu- ural illumination. In ECCV, 2012.
tional architecture. In ICCV, 2015. [32] M. Papas, W. Jarosz, W. Jakob, S. Rusinkiewicz, W. Matusik,
[12] S. Georgoulis, K. Rematas, T. Ritschel, M. Fritz, T. Tuyte- and T. Weyrich. Goal-based caustics. Comp. Graph Forum
laars, and L. Van Gool. Natural illumination from multiple (Proc. Eurographics), 30(2):503–511, 2011.
materials using deep learning. arXiv:1611.09325, 2016. [33] K. Rematas, C. Nguyen, T. Ritschel, M. Fritz, and T. Tuyte-
[13] P. Green, J. Kautz, W. Matusik, and F. Durand. View- laars. Novel views of objects from a single image. PAMI,
dependent precomputed light transport using nonlinear gaus- 39(8), 2016.
sian function approximations. In Proc. i3D, 2006. [34] L. Shen, P. Tan, and S. Lin. Intrinsic image decomposition
[14] T. Haber, C. Fuchs, P. Bekaer, H.-P. Seidel, M. Goesele, and with non-local texture cues. In CVPR, 2008.
H. P. Lensch. Relighting objects from image collections. In [35] J. Shi, Y. Dong, H. Su, and S. X. Yu. Learning
CVPR, pages 627–34, 2009. non-lambertian object intrinsics across shapenet categories.
[15] B. K. P. Horn and M. J. Brooks, editors. Shape from Shading. arXiv:1612.08510, 2016.
MIT Press, 1989.
[36] H.-Y. Shum, S.-C. Chan, and S. B. Kang. Image-based ren-
[16] M. Hueting, M. Ovsjanikov, and N. Mitra. Crosslink: Joint
dering. Springer Science & Business Media, 2008.
understanding of image and 3D model collections through
shape and camera pose variations. ACM Trans. Graph (Proc. [37] H. Su, C. R. Qi, Y. Li, and L. J. Guibas. Render for CNN:
SIGGRAPH Asia), 34(6), 2015. Viewpoint estimation in images using cnns trained with ren-
dered 3D model views. In ICCV, 2015.
[17] C. Innamorati, T. Ritschel, T. Weyrich, and N. J. Mitra. De-
composing single images for layered photo retouching. Com- [38] Y. Tang, R. Salakhutdinov, and G. Hinton. Deep lambertian
puter Graphics Forum (Proc. EGSR), 36(4):15–25, 2017. networks. In ICML, 2012.
[18] A. Jain, T. Thormählen, T. Ritschel, and H.-P. Seidel. Mate- [39] Y. Tokuyoshi. Virtual spherical gaussian lights for real-
rial memex: Automatic material suggestions for 3d objects. time glossy indirect illumination. Comp. Graph. Forum,
ACM Trans. Graph. (Proc. SIGGRAPH Asia), 31(5), 2012. 34(7):89–98, 2015.
[19] W. Jakob. Mitsuba renderer, 2010. http://www.mitsuba- [40] Y.-T. Tsai and Z.-C. Shih. All-frequency precomputed radi-
renderer.org. ance transfer using spherical radial basis functions and clus-
[20] J. T. Kajiya. The rendering equation. In ACM SIGGRAPH, tered tensor approximation. ACM Trans. Graph., 25(3):967–
1986. 76, 2006.

30
[41] J. Vorba, O. Karlı́k, M. Šik, T. Ritschel, and J. Křivánek. On-
line learning of parametric mixture models for light trans-
port simulation. ACM Trans. Graph. (Proc. SIGGRAPH),
33(4):101, 2014.
[42] B. Walter, S. R. Marschner, H. Li, and K. E. Torrance. Micro-
facet models for refraction through rough surfaces. In Proc.
EGSR, pages 195–206, 2007.
[43] J. Wang, P. Ren, M. Gong, J. Snyder, and B. Guo.
All-frequency rendering of dynamic, spatially-varying re-
flectance. ACM Trans. Graph. (Proc. SIGGRAPH),
28(5):133, 2009.
[44] T. Y. Wang, H. Su, Q. Huang, J. Huang, L. Guibas, and N. J.
Mitra. Unsupervised texture transfer from images to model
collections. ACM Trans. Graph. (Proc. SIGGRAPH Asia),
35(6), 2016.
[45] X. Wang, D. F. Fouhey, and A. Gupta. Designing deep net-
works for surface normal estimation. In CVPR, 2015.
[46] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli.
Image quality assessment: from error visibility to structural
similarity. IEEE TIP, 13(4):600–612, 2004.
[47] H. Wu, J. Dorsey, and H. Rushmeier. A sparse parametric
mixture model for BTF compression, editing and rendering.
Comp. Graph. Forum, 30(2):465–73, 2011.
[48] K. Xu, W.-L. Sun, Z. Dong, D.-Y. Zhao, R.-D. Wu, and S.-
M. Hu. Anisotropic spherical gaussians. ACM Trans. Graph.
(Proc. SIGGRAPH Asia), 32(6), 2013.
[49] L.-F. Yu, S. K. Yeung, C.-K. Tang, D. Terzopoulos, T. F.
Chan, and S. Osher. Make it home: automatic optimiza-
tion of furniture arrangement. ACM Trans. Graph., 30(4):86,
2011.
[50] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal. L-BFGS-B, FOR-
TRAN routines for large scale bound constrained optimiza-
tion. ACM Trans. Math. Soft., 23(4):550–60, 1997.

31

Вам также может понравиться