Bilateral Filtering

Foundations and Trends
R
in
Computer Graphics and Vision
Vol. 4, No. 1 (2008) 173
c 2009 S. Paris, P. Kornprobst, J. Tumblin and
F. Durand
DOI: 10.1561/0600000020
Bilateral Filtering: Theory and Applications
By Sylvain Paris, Pierre Kornprobst, Jack Tumblin
and Fredo Durand
Contents
1 Introduction 2
2 From Gaussian Convolution to Bilateral Filtering 4
2.1 Terminology and Notation 4
2.2 Image Smoothing with Gaussian Convolution 5
2.3 Edge-preserving Filtering with the Bilateral Filter 6
3 Applications 11
3.1 Denoising 12
3.2 Contrast Management 16
3.3 Depth Reconstruction 22
3.4 Data Fusion 22
3.5 3D Fairing 25
3.6 Other Applications 28
4 Ecient Implementation 33
4.1 Brute Force 33
4.2 Separable Kernel 34
4.3 Local Histograms 35
4.4 Layered Approximation 36
4.5 Bilateral Grid 37
4.6 Bilateral Pyramid 40
4.7 Discussion 43
5 Relationship between Bilateral Filtering and Other
Methods or Framework 44
5.1 Bilateral Filtering is Equivalent to Local Mode Filtering 44
5.2 The Bilateral Filter is a Robust Filter 47
5.3 Bilateral Filtering is Equivalent Asymptotically to the
Perona and Malik Equation 51
6 Extensions of Bilateral Filtering 57
6.1 Accounting for the Local Slope 57
6.2 Using Several Images 62
7 Conclusions 65
Acknowledgments 67
References 68
Foundations and Trends
R
in
Computer Graphics and Vision
Vol. 4, No. 1 (2008) 173
c 2009 S. Paris, P. Kornprobst, J. Tumblin and
F. Durand
DOI: 10.1561/0600000020
Bilateral Filtering: Theory and Applications
Sylvain Paris
1
, Pierre Kornprobst
2
,
Jack Tumblin
3
and Fredo Durand
4
1
Adobe Systems, Inc., CA 95110-2704, USA, sparis@adobe.com
2
NeuroMathComp Project Team INRIA, ENS Paris, UNSA LJAD, France,
Pierre.Kornprobst@inria.fr
3
Department of Electrical Engineering and Computer Science,
Northwestern University, IL 60208, USA, jet@cs.northwestern.edu
4
Computer Science and Articial Intelligence Laboratory, Massachusetts
Institute of Technology, MA 02139, USA, fredo@mit.edu
Abstract
The bilateral lter is a non-linear technique that can blur an image
while respecting strong edges. Its ability to decompose an image into
dierent scales without causing haloes after modication has made it
ubiquitous in computational photography applications such as tone
mapping, style transfer, relighting, and denoising. This text provides
a graphical, intuitive introduction to bilateral ltering, a practical
guide for ecient implementation and an overview of its numerous
applications, as well as mathematical analysis.
1
Introduction
Bilateral ltering is a technique to smooth images while preserving
edges. It can be traced back to 1995 with the work of Aurich and
Weule [4] on nonlinear Gaussian lters. It was later rediscovered by
Smith and Brady [59] as part of their SUSAN framework, and Tomasi
and Manduchi [63] who gave it its current name. Since then, the use
of bilateral ltering has grown rapidly and is now ubiquitous in image-
processing applications Figure 1.1. It has been used in various contexts
such as denoising [1, 10, 41], texture editing and relighting [48], tone
management [5, 10, 21, 22, 24, 53], demosaicking [56], stylization [72],
and optical-ow estimation [57, 74]. The bilateral lter has several qual-
ities that explain its success:
Its formulation is simple: each pixel is replaced by a weighted
average of its neighbors. This aspect is important because it
makes it easy to acquire intuition about its behavior, to adapt
it to application-specic requirements, and to implement it.
It depends only on two parameters that indicate the size and
contrast of the features to preserve.
It can be used in a non-iterative manner. This makes the
parameters easy to set since their eect is not cumulative
over several iterations.
2
3
(a) Input image (b) Output of the bilateral filter
Fig. 1.1 The bilateral lter converts any input image (a)to a smoothed version (b). It
removes most texture, noise, and ne details, but preserves large sharp edges without
blurring.
It can be computed at interactive speed even on large images,
thanks to ecient numerical schemes [21, 23, 55, 54, 50, 71],
and even in real time if graphics hardware is available [16].
In parallel to applications, a wealth of theoretical studies [6, 7, 13,
21, 23, 46, 50, 60, 65, 66] explain and characterize the bilateral lters
behavior. The strengths and limitations of bilateral ltering are now
fairly well understood. As a consequence, several extensions have been
proposed [14, 19, 23].
This paper is organized as follows. Section 2 presents linear
Gaussian ltering and the nonlinear extension to the bilateral lter.
Section 3 revisits several recent, novel and challenging applications
of bilateral ltering. Section 4 compares dierent ways to implement
the bilateral lter eciently. Section 5 presents several links of bilat-
eral ltering with other frameworks and also dierent ways to inter-
pret it. Section 6 exposes extensions and variants of the bilateral
lter. We also provide a website with code and relevant pointers
(http://people.csail.mit.edu/sparis/bf survey/).
2
From Gaussian Convolution to Bilateral Filtering
To introduce bilateral ltering, we begin with a description of Gaussian
convolution in Section 2.2. This lter is simpler, introduces the notion
of local averaging, and is closely related to the bilateral lter but does
not preserve edges. Section 2.3 then underscores the specic features
of the bilateral lter that combine smoothing with edge preservation.
First, we introduce the notation used throughout this paper.
2.1 Terminology and Notation
For simplicity, most of the exposition describes ltering for a gray-
level image I although every ltering operation can be duplicated for
each component of a color image unless otherwise specied. We use the
notation I
p
for the image value at pixel position p. Pixel size is assumed
to be 1. F[I] designates the output of a lter F applied to the image I.
We will consider the set o of all possible image locations that we name
the spatial domain, and the set ! of all possible pixel values that we
name the range domain. For instance, the notation

pS
denotes a
sum over all image pixels indexed by p. We use [ [ for the absolute
value and [[ [[ for the L
2
norm, e.g., [[p q[[ is the Euclidean distance
between pixel locations p and q.
4
2.2 Image Smoothing with Gaussian Convolution 5
2.2 Image Smoothing with Gaussian Convolution
Blurring is perhaps the simplest way to smooth an image; each out-
put image pixel value is a weighted sum of its neighbors in the input
image. The core component is the convolution by a kernel which is the
basic operation in linear shift-invariant image ltering. At each output
pixel position it estimates the local average of intensities, and corre-
sponds to low-pass ltering. An image ltered by Gaussian Convolution
is given by:
GC[I]
p
=
qS
G
([[p q[[) I
q
, (1)
where G
(x) denotes the 2D Gaussian kernel (see Figure 2.1):

G
(x) =
1
2
2
exp
_
x
2
2
2
_
. (2)
Gaussian ltering is a weighted average of the intensity of the
adjacent positions with a weight decreasing with the spatial distance to
the center position p. The weight for pixel q is dened by the Gaussian
G
([[p q[[), where is a parameter dening the neighborhood size.

The strength of this inuence depends only on the spatial distance
between the pixels and not their values. For instance, a bright pixel has
a strong inuence over an adjacent dark pixel although these two pixel
values are quite dierent. As a result, image edges are blurred because
pixels across discontinuities are averaged together (see Figure 2.1).
The action of the Gaussian convolution is independent of the image
content. The inuence that a pixel has on another one depends only
their distance in the image, not on the actual image values.
Remark. Linear shift-invariant lters such as Gaussian convolution
(Equation (1)) can be implemented eciently even for very large
using the Fast Fourier Transform (FFT) and other methods, but these
acceleration techniques do not apply to the bilateral lter or other
nonlinear or shift-variant lters. Fortunately, several fast numerical
schemes were recently developed specically for the bilateral lter (see
Section 4).
6 From Gaussian Convolution to Bilateral Filtering
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-60 -40 -20 0 20 40 60
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-60 -40 -20 0 20 40 60
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-60 -40 -20 0 20 40 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-60 -40 -20 0 20 40 60
Fig. 2.1 Example of Gaussian linear ltering with dierent . Top row shows the prole of a
1D Gaussian kernel and bottom row the result obtained by the corresponding 2D Gaussian
convolution ltering. Edges are lost with high values of because averaging is performed
over a much larger area.
2.3 Edge-preserving Filtering with the Bilateral Filter
The bilateral lter is also dened as a weighted average of nearby pixels,
in a manner very similar to Gaussian convolution. The dierence is
that the bilateral lter takes into account the dierence in value with
the neighbors to preserve edges while smoothing. The key idea of the
bilateral lter is that for a pixel to inuence another pixel, it should
not only occupy a nearby location but also have a similar value.
The formalization of this idea goes back in the literature to
Yaroslavsky [77], Aurich and Weule [4], Smith and Brady [59] and
Tomasi and Manduchi [63]. The bilateral lter, denoted by BF[ ], is
dened by:
BF[I]
p
=
1
W
p
qS
G
s
([[p q[[) G
r
([I
p
I
q
[) I
q
, (3)
where normalization factor W
p
ensures pixel weights sum to 1.0:
W
p
=
qS
G
s
([[p q[[) G
r
([I
p
I
q
[). (4)
Parameters
s
and
r
will specify the amount of ltering for the image
I. Equation (3) is a normalized weighted average where G
s
is a spatial
Input
Spatial weight Range weight
Result
Multiplication of range
and spatial weights
Bilateral filter weights at the central pixel
}
Fig. 2.2 The bilateral lter smooths an input image while preserving its edges. Each pixel
is replaced by a weighted average of its neighbors. Each neighbor is weighted by a spatial
component that penalizes distant pixels and range component that penalizes pixels with a
dierent intensity. The combination of both components ensures that only nearby similar
pixels contribute to the nal result. The weights shown apply to the central pixel (under
the arrow). The gure is reproduced from [21].
Gaussian weighting that decreases the inuence of distant pixels, G
r
is a range Gaussian that decreases the inuence of pixels q when their
intensity values dier from I
p
. Figure 1.1 shows a sample output of the
bilateral lter and Figure 2.2 illustrates how the weights are computed
for one pixel near an edge.
2.3.1 Parameters
The bilateral lter is controlled by two parameters:
s
and
r
. Figure 2.3
illustrates their eect.
As the range parameter
r
increases, the bilateral lter gradu-
ally approximates Gaussian convolution more closely because
the range Gaussian G
r
widens and attens, i.e., is nearly
constant over the intensity interval of the image.
Increasing the spatial parameter
s
smooths larger features.
Fig. 2.3 The bilateral lters range and spatial parameters provide more versatile control
than Gaussian convolution. As soon as either of the bilateral lter weights reaches values
near zero, no smoothing occurs. As a consequence, increasing the spatial sigma will not blur
an edge as long as the range sigma is smaller than the edge amplitude. For example, note
the rooftop contours are sharp for small and moderate range settings r, and that sharpness
is independent of the spatial setting s. The original image intensity values span [0, 1].
In practice, in the context of denoising, Liu et al. [41] show that adapt-
ing the range parameter
r
to estimates of the local noise level yields
more satisfying results. The authors recommend a linear dependence:
r
= 1.95
n
, where
n
is the local noise level estimate.
An important characteristic of bilateral ltering is that the weights
are multiplied: if either of the weights is close to zero, no smoothing
occurs. As an example, a large spatial Gaussian coupled with narrow
range Gaussian achieves limited smoothing despite the large spatial
extent. The range weight enforces a strict preservation of the contours.
2.3.2 Computational cost
At this stage of the presentation, skeptical readers may have already
decided that the bilateral lter is an unreasonably expensive algorithm
to compute when the spatial parameter
s
is large, as it constructs
each output pixel from a large neighborhood, requires the calculation
of two weights, their products, and a costly normalizing step as well.
In Section 4 we will show some ecient approaches to implement the
bilateral lter.
2.3.3 Iterations
The bilateral lter can be iterated. This leads to results that are
almost piecewise constant as shown in Figure 2.4. Although this yields
smoother images, the eect is dierent from increasing the spatial
and range parameters. As shown in Figure 2.3, increasing the spatial
parameters
s
has a limited eect unless the range parameter
r
is also
increased. Although a large
r
also produces smooth outputs, it tends
to blur the edges whereas iterating preserves the strong edges such as
the border of the roof in Figure 2.4 while removing the weaker details
such as the tiles. This type of eect is desirable for applications such
as stylization [72] that seek to abstract away the small details, while
computational photography techniques [5, 10, 21] tend to use a single
iteration to be closer to the initial image content.
2.3.4 Separation
The bilateral lter can split an image into two parts: the ltered image
and its residual image. The ltered image holds only the large-scale
features, as the bilateral lter smoothed away local variations without
aecting strong edges. The residual image, made by subtracting the
ltered image from the original, holds only the image portions that
the lter removed. Depending on the settings and the application,
Fig. 2.4 Iterations: the bilateral lter can be applied iteratively, and the result progressively
approximates a piecewise constant signal. This eect can help achieve a limited-palette,
cartoon-like rendition of images [72]. Here, s = 8 and r = 0.1.
Fig. 2.5 Separation: The residual image holds all input components (a) removed by the
bilateral lter (b), and some image structure is visible here (c). For denoising tasks, the
ideal residual image would contain only noise, but here the r setting was large enough
to remove some ne textures that are nearly indistinguishable from noise, and still yields
acceptable results for many denoising tasks.
this removed small-scale component can be interpreted as noise or
texture, as shown in Figure 2.5. Applications such as tone management
and style transfer extend this decomposition to multiple layers (see
Section 3).
To conclude, bilateral ltering is an eective way to smooth
an image while preserving its discontinuities (see Sections 3.1 and
3.5) and also to separate image structures of dierent scales (see
Section 3.2). As we will see, the bilateral lter has many applications,
and its central notion of assigning weights that depend on both space
and intensity can be tailored to t a diverse set of applications (see
Section 6).
Remark. The reader may know that the goal of edge-preserving
image restoration has been addressed for many years by partial dier-
ential equations (PDEs), and one may wonder about their relationship
with bilateral lters. Section 5.1 will explore those connections in detail.
3
Applications
This section discusses the uses of the bilateral lter for a variety of
applications:
Denoising (Section 3.1): This is the original, primary goal of
the bilateral lter, where it found broad applications that
include medical imaging, tracking, movie restoration, and
more. We discuss a few of these, and present a useful exten-
sion known as the cross-bilateral lter.
Texture and Illumination Separation, Tone Mapping,
Retinex, and Tone Management (Section 3.2): Bilateral l-
tering an image at several dierent settings decomposes
that image into large-scale/small-scale textures and features.
These applications edit each component separately to adjust
the tonal distribution, achieve photographic stylization, or
match the adjusted image to the capacities of a display
device.
Data Fusion (Section 3.4): These applications use bilateral
ltering to decompose several source images into compo-
nents and then recombine them as a single output image that
inherits selected visual properties from each of the source
images.
11
12 Applications
3D Fairing (Section 3.5): In this counterpart to image denois-
ing, bilateral ltering applied to 3D meshes and point clouds
smooths away noise in large areas and yet keeps all corners,
seams, and edges sharp.
Other Applications (Section 3.6): New applications are
emerging steadily in the literature; we highlight several new
trends indicated by recently published papers.
3.1 Denoising
One of the rst roles of bilateral ltering was image denoising.
Later, the bilateral lter became popular in the computer graphics
community because it is edge preserving, easy to understand and set
up, and because ecient implementations were recently proposed (see
Section 4).
The bilateral lter has become a standard interactive tool for image
denoising. For example, Adobe PhotoshopR provides a fast and sim-
ple bilateral lter variant under the name surface blur (Figure 3.1).
Instead of Gaussian functions, it uses a square box function as its spa-
tial weight, and a tent (linear) function as the range weight. Unlike
Gaussian convolution that smooths images without respecting their
visual structures, the bilateral lter preserves the object contours and
produces sharp results. The surface blur tool is often used by portrait
photographers to smooth skin while preserving sharp edges and details
in the subjects eyes and mouth.
Fig. 3.1 Denoising using the surface blur lter from Adobe Photoshop R : We added
noise (b) to the input image (a) and applied the surface blur lter. As the input image
was corrupted by noise, some signal loss is inevitable, but the ltered version is signicantly
improved.
3.1 Denoising 13
Qualitatively, the bilateral lter represents an easy way to decom-
pose an image into a cartoon-like component and a texture one. This
cartoon-like image is the denoised image which can be used in several
applications as shown in this section. Qualitatively, such a decomposi-
tion could be obtained by any simplifying lter. But this decomposition
is not trivial from a mathematical perspective if one considers the math-
ematical structure of images. In this respect, we refer to Meyer [44],
Vese and Osher [67], Aujol et al., [3] for more details about approaches
dedicated to precise texture-cartoon decompositions.
The cartoon-like eect can also be a drawback depending on the
application. Buades et al. [14] have shown that although bilateral lter-
ing preserves edges, the preservation is not perfect and some edges are
sharpened during process, introducing an undesirable staircase eect.
We discuss this eect in more detail in Section 6.1.3. In summary, the
bilateral lter can be the right approach for many applications, but it
is not always the best solution nor the best denoising lter available.
As a nal comment, the bilateral lter is related to several
approaches and frameworks proposed in the literature. We revisit the
most important ones in Section 5. These analogies are interesting to
notice, as they give theoretical foundations to bilateral ltering and
show alternative formulations.
3.1.1 Medical Imagery
In the domain of medical imagery, Wong et al. [73] improved the struc-
ture preservation abilities of the bilateral lter by explicitly describing
the structure with an additional weight, one that depends on the local
shape and orientation of the sensed image data.
3.1.2 Videos
Bennett and McMillan [10] show that bilateral ltering can be used for
videos. In this context, the bilateral lter is applied along the time axis,
that is, pixels at the same location in successive frames are averaged
together. The fact that the bilateral lter does not average together
pixels of dierent colors prevents mixing data from dierent objects
that appear at the same location but at dierent times. For instance,
14 Applications
(a) Input (b) Naive histogram
stretching
(c) Output of Bennett and
McMillan [9]
Fig. 3.2 Bennett and McMillan [10] describe how to combine spatial and temporal bilateral
lterings to achieve high-quality video denoising and exposure correction. Figure reproduced
from Bennett and McMillan [10].
if a red ball passes in front of green tree, the ball and tree pixels are
not mixed together, thanks to the range weight of the bilateral lter.
However, pixels that change color often, for instance due to a rapidly
moving object, may not have any similar neighbors along the time axis.
Bennett and McMillan compensate for this case by looking for spatial
neighbors when there are not enough temporal similar pixels. Figure 3.2
shows sample results.
3.1.3 Orientation Smoothing
Paris et al. [49] use the bilateral lter to smooth the 2D orientation
eld computed from optical measurements for hairstyle modeling. Their
measuring scheme yields a per-pixel evaluation of the local orientation,
but these measures are noisy and at times ambiguous due to the com-
plex nature of hair images. Paris et al. evaluated the success of their
measurements at pixel p using the variance at V
p
and incorporated it
into their lter. In Paris setup, several illumination conditions oer
orientation estimates for each pixel, and they use the maximum dif-
ference among all these estimates. As the orientation angle varies
cyclically between 0 and , they map their averaging onto a complex
exponential: [0, [ exp
_
2i
_
C, leading to the lter:
exp(2i F
Paris
()
p
)
=
q
G
s
([[p q[[) G
V
(V
p
/V
q
) G
_
(p, q)
_
exp(2i
q
) (5)
3.1 Denoising 15
(a) Zoom on input image (b) Orientations before bilateral filtering (c) Orientations after bilateral filtering
Fig. 3.3 Paris et al. [49] smooth their orientation measurements using a variant of bilateral
ltering mapped to the complex plane C. Figure reproduced from Paris et al. [49].
This lter acts upon orientation mapped to the complex plane.
Although Paris application needs only the phase argument of the
result and discards the amplitude , if needed it could act as the stan-
dard deviation in the scalar case Watson, [70]. This lter illustrates
how bilateral ltering can adapt to incorporate application-specic
knowledge.
3.1.4 Discussion and Practical Consideration
Denoising usually relies on small spatial kernels
s
and the range sigma
r
is usually chosen to match the noise level.
The bilateral lter might not be the most advanced denoising tech-
nique but its strength lies in its simplicity and exibility. The weights
can be adjusted to take into account any metric on the dierence
between two pixels and information about the reliability of a given
pixel can be included by reducing the weights assigned to it.
In the case of salt-and-pepper or impulse noise, the bilateral lter
may need to mollify the input image before use. Though the noise may
be sparse, the aected pixels intensity values may span the entire image
range (e.g., [01]), and their values might be too dierent from their
neighbors to be ltered out. To mollify these images, compute the range
Gaussian weights on a median-ltered version of the image [21]. If M
describes median ltering, this gives:
BF[I]
p
=
1
W
p
qS
G
s
([[p q[[) G
r
([M[I]
p
M[I]
q
[) I
q
, (6)
16 Applications
W
p
=
qS
G
s
([[p q[[) G
r
([M[I]
p
M[I]
q
[). (7)
This practice is commonplace in robust statistics: users apply a
very robust estimator such as the median lter rst to obtain a suitable
initial estimate, then apply a more precise estimator (the bilateral lter)
to nd the nal result.
3.2 Contrast Management
Bilateral ltering has been particularly successful as a tool for con-
trast management tasks such as detail enhancement or reduction.
Oh et al. [48] describe how to use the bilateral lter to separate an
image into a large-scale component and a small-scale component by
subtracting ltered results. With this decomposition, they edit texture
in a photograph. Several earlier nonlinear coarse/ne decompositions
were already in use in various local tone mapping operators (e.g., Stock-
ham [62], Chiu et al. [17], Schlick [58], Pattanaik et al. [51], Tumblin
and Turk [64]) but Durand and Dorsey [21] were the rst to apply
the method using the bilateral lter. Elad [24] followed the same strat-
egy to estimate the illumination and albedo of the photographed scene.
Bae et al. [5] extended this approach to manipulate the look of a photo-
graph, and Fattal et al. [25] describe a multi-scale image decomposition
that preserves edges and allows for combining multiple images to reveal
object details. We describe these applications in the next sections.
3.2.1 Texture and Illumination Separation
In the context of image-based modeling, Oh et al. [48] used the
structure-removal aspect of the bilateral lter. By using a suciently
large range parameter
r
, the bilateral lter successfully removes the
variations due to reectance texture while preserving larger disconti-
nuities stemming from illumination changes and geometry. Their tech-
nique is motivated by the fact that illumination variations typically
occur at a larger scale than texture patterns, as observed by Land in
his Retinex theory of lightness perception [39, 38]. To extract the illu-
mination component, they derive a variant of the iterated bilateral lter
for which the initial image is always ltered. The successive estimates
are used only to rene the range weight:
BF
i+1
[I]
p
=
1
W
p
qS
G
s
([[p q[[) G
r
_
BF
i
[I]
p

BF
i
[I]
q
_
I
q
,
with

BF
0
[I] = I.
In addition, because a depth estimate is available at each image pixel,
they adapt the spatial Gaussian size and shape to account for depth
foreshortening. At each pixel they estimate a tangent plane to the local
geometry, and choose an oriented spatial Gaussian that is isotropic
in this tangent plane, which results in an anisotropic Gaussian once
projected onto the image plane.
3.2.2 Tone Mapping
Durand and Dorsey [21] show that the use of bilateral ltering can be
extended to isolate small-scale signal variations including texture and
also small details of an image. They demonstrate this property to con-
struct a tone mapping process whose goal is to compress the intensity
values of an high-dynamic-range image to t the capabilities of a low-
dynamic-range display. In accordance with earlier local tone mapping
operators, they note that naive solutions such as uniform scaling or
gamma reductions to compress contrasts yield unsatisfactory results
because the severe reductions needed for high contrast features cause
subtle textures and scene details to vanish. While earlier tone map-
ping operators used multi-scale lter banks, wavelets, nonlinearities
modeled on neural processes, and diusion PDEs to separate visually
compressible and incompressible components of log luminance, Durand
and Dorsey used the bilateral lter for a fast, much simpler and visually
pleasing result. They apply the bilateral lter on the log-intensities of
the HDR image, scale down uniformly the result, and add back the l-
ter residual, thereby ensuring that the small-scale details have not been
compressed during the process. Some earlier methods such as Pattanaik
et al. [51] used weighted multi-scale decompositions that model psy-
chophysical models of visual appearance or relied on user interaction
to achieve best-looking results (e.g., Jobson et al. [32], Tumblin and
18 Applications
Fig. 3.4 Tone Mapping: Direct display of an HDR image (a) is not satisfying because
over- and under-exposed areas hide image features. Contrast compression maps all scene
intensities to the display, but details in clouds and in the city below the horizon are barely
visible (b). Isolating the details using Gaussian convolution brings back the details, but
incurs halos near contrasted edges (e.g., near the tree silhouettes) (c). Durand and Dorsey
use the bilateral lter to isolate the small variations of the input image without incurring
halos (d). Figure reproduced from Durnad and Dorsey [21].
Turk [64]), but as shown in Figure 3.4 HDR images tone-mapped with
Durand and Dorseys technique are less dicult to achieve yet maintain
a plausible, visually pleasing appearance.
3.2.3 Retinex
Elad [24] proposes a dierent interpretation of the tone mapping tech-
nique of Durand and Dorsey using the Retinex theory of Edwin Land
that seeks a separation of images into illumination and albedo. Under
the assumption that scene objects are mostly-diuse reectors that
do not emit light, illumination values are greater than the measured
intensities because objects always absorb part of the incoming light.
Elad adapts the bilateral lter to ensure that the ltered result fullls
this requirement and forms an upper envelope of the image intensities.
He replaces the range weight G
r
by a truncated Gaussian H G
r
,
where H is a step function whose value is 1 for non-negative inputs and
is 0 otherwise. As a consequence, at any given pixel p, the local aver-
aging includes only values greater than intensity at p and guarantees
a ltered value at or above the local image intensity.
3.2.4 Tone Management
Bae et al. [5] build upon the separation between the large scale and the
small scale oered by the bilateral lter, and describe a technique to
transfer the visual look of an artists picture onto a casual photograph.
They explored a larger space of image modications by applying an
arbitrary, user-specied monotonic transfer function to the large-scale
component of the source image. With histogram matching, they con-
struct a transfer function that matches the global contrast and bright-
ness of the model photograph. They also show that the small-scale
component can be modied to vary the amount of texture visible in the
image. To this end, they introduce the notion of textureness to quan-
tify the local texture amplication they wish to induce in an image by
cross-bilateral ltering (cf. Section 3.4.1 for detail). With the small-
scale components (or high frequencies) H of the images log-intensity
logI, the textureness is dened by:
1
W
p
qS
G
s
([[p q[[) G
r
([logI
p
logI
q
[)[H[
q
. (8)
Said another way, textureness is the amplitude of the high frequen-
cies that were locally averaged while respecting the edges of the input
image.
Later, Chen et al. [16] sped up the bilateral lter computation using
graphics hardware and achieved real-time results on high-denition
videos, thereby enabling on-the-y control of the photographic style.
3.2.5 Detail Enhancement
Fattal et al. [25] extend the small-scale/large-scale decomposition to
multiple layers to allow for ner control and selection of enhanced
20 Applications
(a) Input (b) Result after contrast
and textureness increase
Fig. 3.5 Bae et al. [5] use the bilateral lter to separate the large-scale and small-scale
variations of an image, and then processes them separately. In this example, users chose
to increase the global image contrast and increase the texture as well for a more dramatic
image result. Figure reproduced from Bae et al. [5].
(a) Sample input images (b) Output with enhanced details
Fig. 3.6 Fattal et al. [25] use the bilateral lter to create a multi-scale decomposition of
images. They rst decompose several images of the same scene under dierent lighting
conditions (a) and construct a new pyramid that generates a new image with enhanced
details (b). Figure reproduced from Fattal et al. [25].
details. They use their decomposition on several images taken from the
same point of view but under dierent lighting conditions and demon-
strate a variety of eects by combining portions of bilateral image
pyramids obtained from these lighting variations. They describe how
these combinations can be controlled to reveal the desired details while
avoiding the halo artifacts (Figure 3.6). They also describe a numer-
ical scheme to eciently compute image pyramids using the bilateral
lter.
3.2.6 High-Dynamic-Range Hallucination
Wang et al. [69] use a bilateral lter decomposition to allow users to
generate a high-dynamic-range image from a single low-dynamic-range
one. They seek to reconstruct data in over- and under-exposed areas of
the image. They use the bilateral lter to create a decomposition into
texture and illumination inspired by Oh et al.s [48] work. This allows
them to apply user-guided texture synthesis to the detail (texture)
layer, after bilateral ltering removed the large-scale illumination
variations. Similarly, they can apply smooth interpolation to the large
scale (illumination) layer because high-frequency texture has been
decoupled.
3.2.7 Discussion and Practical Considerations
Contrast management relies on large spatial kernels to create large-
scale/small-scale decompositions, because the small scale needs to
include high- and medium-frequency components. The human visual
system is not very sensitive to low frequencies but is quite sensitive to
medium frequencies. As the large-scale component is typically the one
that gets its contrast reduced, medium frequencies must be excluded
from it to avoid attenuation as well.
For contrast management, the bilateral lter is usually applied
to the log of the original image because the human visual systems
response to light is approximately multiplicative. Using the log domain
makes the range sigma act uniformly across dierent levels of intensity:
edges where ltering should stop are dened in terms of multiplicative
contrast. Similarly, relighting applications deal with a multiplicative
process where illumination is multiplied by reectance. The use of the
log domain is not without its problems, as zero intensity maps to minus
innity and in dark regions noise in sensed intensity may be magni-
ed in the log domain. Accordingly, many users add small constant on
the order of the noise level to the input intensities before taking the
log. The new color space proposed by Chong et al. [18] is particularly
promising to handle these and other multiplicative processes. Using the
luminance channel of the CIE-Lab color space is another useful alter-
native. Instead of a log curve, it is based on a cubic root that does not
22 Applications
model exactly these multiplicative processes but is numerically simpler
to handle.
3.3 Depth Reconstruction
Yang et al. [75, 76] and Yoon and Kweon [78] applied the bilateral
lter to aid in stereo reconstruction, the recovery of depth values from
correspondences between pixels dierent views. Ideally we wish to nd a
corresponding point in the right image for every pixel in the left image.
As the distance between these point pairs, the disparity, is inversely
proportional to the depth at that pixel, this information is equivalent
to recovering the scene geometry. To pair the pixels with points in
the other image, stereo algorithms typically compute a similarity score
such as color dierences or local correlation. Yang et al. and Yoon
and Kweon show that locally aggregating these scores using bilateral
weights signicantly improves the accuracy and reduces noise in the
recovered depth maps. Y ang et al. [75] have tested many similarity
scores and pairing strategies and found that the bilateral aggregation
always improves their results.
3.4 Data Fusion
3.4.1 Flash/No-ash Imaging
Eisemann and Durand [22] and Petschnigg et al. [53] describe simi-
lar techniques to produce satisfying pictures in low-light conditions by
combining a ash and a no-ash photograph. Their work is motivated
by the fact that, although the ash image has unpleasantly direct and
hard-looking lighting, its signal-to-noise ratio is higher than the no-
ash image. On the other side, the no-ash image has more pleasing
and natural-looking lighting, but its high frequencies are corrupted by
noise and the camera may require a longer exposure time and increase
the likelihood of blurring from an unsteady camera. The key idea is
to extract the details of the ash image and combine them with the
large-scale component of the no-ash picture. A variant of the bilateral
lter performs this separation.
3.4 Data Fusion 23
(a) Sample input image (b) Coarse resolution computation (c) Refinement using bilateral
aggregation
Fig. 3.7 Yang et al. [75] use the bilateral lter to achieve stereo reconstruction from pho-
tographs (a). First, they build a coarse depth map (b) and then use a scheme inspired from
the bilateral lter to aggregate local information and compute a rened, more accurate
depth map (c). Figure reproduced from Yang et al. [75].
Both articles introduced the cross (joint) bilateral lter to better
process the no-ash photograph whose noise level is often too high to
enable an accurate edge detection. As the ash image F represents the
same scene, it is used to dene the edges and the ltered no-ash image
is obtained as:
CBF[N, F]
p
=
1
W
p
qS
G
s
([[p q[[) G
r
([F
p
F
q
[) N
q
, (9)
where N is the original no-ash image. Figure 3.8 gives an overview of
the process, and Figures 3.9 and 3.10 show sample results.
3.4.2 Multispectral Fusion
Bennett et al. [9] show how to exploit infrared data in addition to
standard RGB data to denoise low-light video streams. They use the
dual bilateral lter, a variant of the bilateral lter with a modied
range weight that accounts for both the visible spectrum (RGB) and
the infrared spectrum:
DBF[RGB]
p
=
1
W
p
qS
G
s
([[p q[[) G
RGB
([[RGB
p
RGB
q
[[)
G
IR
([IR
p
IR
q
[) RGB
q
, (10)
where RGB
p
is a 3-vector representing the RGB component at
pixel p, and IR
p
the measured infrared intensity at the same pixel p.
24 Applications
BF BF
Fig. 3.8 Denoising of low-light images: Overview of the ash/no-ash combination of
Eisemann and Durand [22]. The bilateral lter is used to combine the illumination com-
ponent of the no-ash picture and the structure component of the ash picture. Figure
reproduced from Eisemann and Durand [22].
(a) Photograph with flash (b) Photograph without flash (c) Combination
Fig. 3.9 By combining a ash photograph (a) and a no-ash photograph (b), Eisemann and
Durand render a new photograph (c) that has both the warm lighting of the no-ash picture
and the crisp details of the ash image. Figure reproduced from Eisemann and Durand [22].
3.5 3D Fairing 25
(a) Flash picture (b) No-flash picture (c) Output of Petschnigg et al. [53]
Fig. 3.10 By combining a ash photograph (a) and a no-ash photograph (b),
Petschnigg et al. render a new photograph (c) that has both the warm lighting of the
no-ash picture and the crisp details of the ash image. Figure reproduced from Petschnigg
et al. [53].
Bennett et al. show that this combination better detects edges because
it is sucient for an edge to appear in just one of the channels (RGB
or infrared) to form a sharp boundary in the result. In combination
with temporal ltering, they demonstrate that it is possible to obtain
high-quality video streams from noisy sequences of moving objects shot
in very low light.
3.5 3D Fairing
Jones et al. [34] extend bilateral ltering to meshes. The diculty com-
pared to images is that all three xyz coordinates are subject to noise,
data are not regularly sampled, and the z coordinate is not a function
of x and y unlike the pixel intensity. To smooth a mesh, Jones et al.
assume that it is locally at. Under this assumption and in the absence
of noise, a vertex p belongs to the plane tangent to the mesh at any
nearby vertex q. With
q
(p) the projection of p onto the plane tangent
to the mesh at q, ideally we have p =
q
(p). However, because of noise
and because the mesh is not at everywhere, this relationship does not
hold in general. To smooth the mesh, Jones et al. average the position of
p predicted by
q
(p), they apply a spatial weight G
s
([[p q[[) which
ensures that only nearby points contribute to the estimate. They add
a term G
r
([[p
q
(p)[[) that reduces the weights of outliers, i.e., the
predictions
q
(p) that are far away from the original position p. Using
26 Applications
(a) Input mesh (b) Smoothed mesh
Fig. 3.11 Jones et al. [34] have adapted the bilateral lter to smooth 3D meshes while
preserving their most prominent features. Figure reproduced from Jones et al. [34].
a term a
q
to account for the sampling density, the resulting lter is:
F
Jones
(p) =
1
W
p
q
a
q
G
s
([[p q[[) G
r
([[p
q
(p)[[)
q
(p). (11)
To improve the results, they mollify the mesh normals used to estimate
the tangent planes [30, 47], that is, they apply a low-pass lter on
the normals. This mollication is analogous to the pre-ltering step
described by Catte et al. [15] for PDE lters. Figure 3.11 shows a
sample result.
Fleishman et al. [26] simultaneously proposed a similar approach
(Figure 3.12). The main dierence between the techniques of
Jones et al. and Fleishman et al. [26] is the way Jones expresses his
at neighborhood assumption. Fleishman et al. use the mesh normal
n
p
at p and project neighbors onto it. With q is such a neighbor, q
should project on p, that is: p + [(q p) n
p
] n
p
= p. This results in
the following variant of the bilateral lter:
F
Fleishman
(p)
= p +
n
p
W
p
q
G
s
([[p q[[) G
r
([(q p) n
p
[)[(q p) n
p
]. (12)
The projection on the normal can be rewritten using the plane pro-
jection operator used by Jones et al.: [(q p) n
p
] n
p
= q
p
(q).
3.5 3D Fairing 27
(a) Input (b) Output of Fleishman et al. [2003]
Fig. 3.12 Fleishman et al. [34] have adapted the bilateral lter to smooth 3D meshes while
preserving their most prominent features. Figure reproduced from Fleishman et al. [26].
This leads to the following expression equivalent to Equation (12):
F
Fleishman
(p)
= p +
1
W
p
q
G
s
([[p q[[) G
r
([[q
p
(q)[[)
_
q
p
(q)
_
. (13)
These two formulations underline the dierences between the
approaches of Jones et al. and Fleishman et al. Equation (12) shows
that, unlike Jones et al., Fleishman et al. guarantee no vertex drift
by moving p only along its normal n
p
. On the other hand, Fleish-
man et al. do not compensate for the density variations described by
Jones et al. Furthermore, Equation (13) shows that the weights between
both approaches are similar except that Jones et al. project p on the
tangent plane at q and thus exploit both the position and normal of all
neighbors q, whereas Fleishman et al. project q on the tangent plane
at p, thereby exploiting rst-order information only from the vertex p.
This suggests a hybrid lter that we have not yet evaluated:
F
hybrid
(p)
= p +
1
W
p
q
a
q
G
s
([[p q[[) G
r
([[p
q
(p)[[)(q
p
(q)). (14)
28 Applications
In addition to these dierences in estimating the vertex positions,
Fleishman et al. advocate iterating the lter three times for further
smoothing of the mesh geometry. Wang [68] renes the process
by explicitly detecting the sharp-edge vertices to preserve them.
He remeshes the model at these edges to ensure that sharp features
are correctly represented by an edge between two triangles.
Later, Jones et al. [33] rened their technique to lter normals.
Applying a geometric transformation f to the 3D space given by
x R
3
F(x), Jones transforms the normals by the transposed
inverse of the Jacobian of F. The Jacobian of F is a 3 3 matrix
that captures the rst-order deformation induced by F and is dened
by J
ij
(F) = F
i
/x
j
where F
i
is the ith coordinate of F, and x
j
the jth
coordinate of x. Jones et al. show that iteratively transforming the nor-
mals by J
T
(F
Jones
) smooths the normals of a model while respecting
its edges and without moving its vertices. They argue that not moving
the vertices yields a better preservation of the ne details of the meshes.
Miropolsky and Fischer [45] propose a variant of bilateral ltering
to smooth and decimate 3D point clouds. They assume that a normal
n
p
is known for each point p. They overlay a regular 3D grid on top
of the points and determine a representative point for each grid cell
by taking into account the point location and normal. With c the cell
center and n
c
the mean normal of the cell points, they propose:
F
Miropolsky
(c) =
1
W
p
q
G
s
([[c q[[) G
r
(n
c
n
q
)q (15)
3.6 Other Applications
3.6.1 Depth Map from Luminance
Khan et al. [35] use bilateral ltering to process the luminance chan-
nel of an image and obtain a pseudo-depth map that is sucient for
altering the material appearance of the observed object. The original-
ity of this use of the bilateral lter is that the smoothing power of the
bilateral lter determines the geometric characteristics of an object.
For instance, a smaller intensity tolerance
r
results in a depth map
that looks engraved with the object texture, because the intensity
patterns are well preserved and directly transferred to the map as depth
variations.
3.6.2 Video Stylization
Winnem oller et al. [72] iterate the bilateral lter to simplify video con-
tent and achieve a cartoon look (Figure 3.13). They demonstrate that
the bilateral lter can be computed in real time at video resolution
using the numerical scheme of Pham and van Vliet [55] on modern
graphics hardware. Later, Chen et al. [16] ported the bilateral lter
on the GPU using the bilateral grid and achieved similar results on
high-denition videos. Winnem oller et al. demonstrate that bilateral
ltering is an eective preprocessing for edge detection: ltered images
trigger fewer spurious edges. To modulate the smoothing strength of
the bilateral lter, they modify it to control the degree of edge preserva-
tion. The range weight G
r
is replaced by (1 m) G
r
+ m u where
m is a function varying between 0 and 1 to control edge preservation,
and u denes the local importance of the image. To dene u and m,
Winnem oller et al. suggest using an eye tracker [20], a computational
model of saliency [31], or a user-painted map.
(a) Input (b) Abstracted output
Fig. 3.13 Sample abstraction result from the method by Winnemoller et al. [72]. Reproduced
from Winnemoller et al. [72].
30 Applications
Fig. 3.14 Bayer patterns are such that, although each pixel is missing two color channels,
adjacent pixels have measures in these missing channels. Figure reproduced from Wikipedia
(http://en.wikipedia.org/wiki/Bayer lter).
3.6.3 Demosaicking
Demosaicking is the process of recovering complete color information
from partial color sampling through a Bayer lter (see Figure 3.14).
Ramanath and Snyder [56] interpolate missing color values of Bayer
patterns [8]. These patterns are used in digital cameras where each
sensor measures only a single value among red, green, and blue. Bayer
patterns are such that, although each pixel is missing two color chan-
nels, adjacent pixels have measures in these missing channels. Demo-
saicking is thus a small-scale interpolation problem where values are
interpolated from neighbor pixels. Directly interpolating the values
yields blurry images because edges are ignored. Ramanath and Snyder
start from such an image and rene the result with bilateral lter-
ing. They use a small spatial neighborhood to consider only the pixels
within the 1-ring of the ltered pixel, and also ensure that measured
values are not altered. The validation shows that the obtained results
compare favorably to state-of-the-art techniques although the compu-
tational cost is higher.
3.6.4 Optical Flow
Xiao et al. [74] apply bilateral ltering to regularize the optical ow
computation. They use an iterative scheme to rene the ow vectors
between a pair of images. Each iteration consists of two steps: rst the
vectors are adjusted using a scheme akin to Lucas and Kanade [42],
then the ow vectors are smoothed using a modied version of bilat-
eral ltering that has two additional terms, one accounting for ow
similarity, and one that ensures that occluded regions are ignored dur-
ing averaging. This scheme also lls in occluded regions, estimating
depth for pixels visible in one image of the pair but hidden in the
other. These occluded points gather information from pixels outside
the occluded region covered by the bilateral lter kernel, and the range
(a) Upsampled result (b) Nearest
neighbor
(c) Bicubic (d) Gaussian (f) Ground
truth
(e) Joint
bilateral
Fig. 3.15 Sample use of joint bilateral upsampling [37] to tone map a high-resolution HDR
image. In this context, the method is used to upsample the exposure map (a) applied to the
pixel values to obtain the output (e) that is close to the ground-truth result (f) and does
not exhibit the defects of other upsampling methods (bd). Figure reproduced from Kopf
et al. [37].
32 Applications
weight ensures that only similar points contribute, thereby avoiding
data diused from the wrong side of the occlusion. An important
feature of this technique is that it actually regularizes the computa-
tion, i.e., the bilateral lter does not optimize a trade-o between a
data term and smoothness term, it only makes the data smoother.
Nonetheless, the process as a whole is a regularization because it inter-
leaves bilateral ltering with an optimization step, and can be seen as a
progressive renement of the initial guess of a steepest-slope optimiza-
tion. Sand and Teller [57] accelerate this technique by restricting the
use of bilateral ltering near the ow discontinuities.
3.6.5 Upsampling
Kopf et al. [37] describe joint bilateral upsampling, a method inspired
from the bilateral lter to upsample image data. The advantage of their
approach is that it is generic and can potentially upsample any kind of
data such as the exposure map used for tone mapping or hues for col-
orization. Given a high-resolution image and a downsampled version,
one can compute the data at low resolution and then upsample them
using a weighted average. High-resolution data are produced by aver-
aging the samples in a 5 5 window at low resolution. The weights
are similar to those dened by the bilateral lter, as each neighbor-
ing pixels inuence decreases with distance and color dierence. As a
result, Kopfs scheme interpolates low-resolution data while respecting
the discontinuities of the high-resolution input image. This lter is fast
to evaluate because it only considers a small spatial footprint.
4
Ecient Implementation
A naive implementation of the bilateral lter can be extremely slow,
especially for large spatial kernels. Several approaches have been pro-
posed to speed up the computation. They all rely on approximations
that yield various degrees of acceleration and accuracy. In this sec-
tion, we describe these ecient algorithms and compare their perfor-
mances. We begin with the brute force approach as reference. We then
describe the techniques based on separable kernels of Pham [55] and
Pham and van Vliet [54], the local histogram of Weiss [71], and the
bilateral grid [16, 50]. Figure 4.3 at the end of this section provides a
visual comparison of the achieved results.
4.1 Brute Force
A direct implementation of the bilateral lter consists of two nested
loops, as presented in Table 4.1.
The complexity of this algorithm is O
_
[o[
2
_
, where [o[ the size of
the spatial domain (i.e., the number of pixels). This quadratic com-
plexity quickly makes the computational cost explode for large images.
33
34 Ecient Implementation
Table 4.1 Algorithm for the direct implementation of bilateral lter.
For each pixel p in o
(1) Initialization: BF[I]
p
= 0,
W
p
= 0
(2) For each pixel q in o
(a) w =
G
s
([[p q[[) G
r
([I
p
I
q
[)
(b) BF[I]
p
+= wI
q
(c) W
p
+= w
(3) Normalization: BF[I]
p
= I
p
/W
p
A classical improvement is to restrict the inner loop to the neigh-
borhood of the pixel p. Typically, one considers only the pixels q such
that [[p q[[ 2
s
. The rationale is that the contributions of pixels
farther away than 2
s
is negligible because of the spatial Gaussian.
This leads to a complexity on the order of O
_
[o[
s
2
_
. This implemen-
tation is ecient for small spatial kernels, that is, small values of
s
but
become quickly prohibitive for large kernels because of the quadratic
dependence in
s
.
4.2 Separable Kernel
Pham and van Vliet [55] propose to approximate the 2D bilateral lter
by two 1D bilateral lters applied one after the other. First, they lter
each image column and then each row. Each time, they use the brute
force algorithm restricted to a 1D domain, that is, the inner loop on
pixels q is restricted to pixels on the same column (or row) as the pixel
p. As a consequence, the complexity becomes O([o[
s
) because the
considered neighborhoods are 1D instead of 2D. This approach yields
signicantly faster running times but the performance still degrades
linearly with the kernel size. Furthermore, this approach computes
an axis-aligned separable approximation of the bilateral lter kernel.
Although this approximation is satisfying for uniform areas and straight
edges, it forms a poor match to more complex features such as textured
4.3 Local Histograms 35
regions. As a consequence, axis-aligned streaks may appear with large
kernels in such regions (Figure 4.3). Pham [54] describes how to steer
the separation according to the local orientation in the image to reduce
these streaks. This approach improves the quality of the results, espe-
cially on slanted edges, but is computationally more involved because
the 1D lters are no longer axis aligned.
4.3 Local Histograms
Weiss [71] considers the case where the spatial weight is a square box
function, that is, he rewrites the bilateral lter as:
BF[I]
p
=
1
W
p
qNs
(p)
G
r
([I
p
I
q
[)I
q
(16a)
W
p
=
qNs
(p)
G
r
([I
p
I
q
[), (16b)
where A
s
(p) = q, [[p q[[
1

s
. In this case, the result depends
only on the histogram of the neighborhood A
s
(p) because the actual
position of the pixel within the neighborhood is not taken into account.
Following this remark, Weiss exposes an ecient algorithm to com-
pute the histogram of the square neighborhoods of an image. We refer
to his article for the detail of the algorithm. The intuition behind his
approach is that the neighborhoods A
s
(p
1
) and A
s
(p
2
) of two adja-
cent pixels p
1
and p
2
largely overlap. Based on this remark, Weiss
describes how to eciently compute the histogram of A
s
(p
1
) by
exploiting the similarity with the histogram of A
s
(p
2
). Once the his-
togram of A
s
(p) is known for a pixel p, the result of the bilateral l-
ter BF[I]
p
(Equation (16a)) can be computed because each histogram
bin indicates how many pixels q have a given intensity value I. A
straightforward application of this technique produces band artifacts
near strong edges, a.k.a. Mach bands, because the frequency spectrum
of the box lter is not band-limited. Weiss addresses this issue by iter-
ating his lter three times, which eectively smooths away the artifacts.
Weiss [71] then demonstrates that his algorithm has a complexity
on the order of O([o[ log
s
) which makes it able to handle any kernel
size in short times. Furthermore, his algorithm is designed such that
it can take advantage of the vector instruction set of modern CPUs,
thereby yielding running times on the order of one second for images of
several megapixels each. Unfortunately, the algorithm processes color
images independently for each channel, which can introduce bleeding
artifacts; in addition, it is unclear how to extend this lter for use in
cross bilateral ltering applications.
4.4 Layered Approximation
Durand and Dorsey [21] propose a fast approximation based on the
intuition that the bilateral lter is almost a convolution of the spatial
weight G
s
([[p q[[) with the product G
r
([I
p
I
q
[) I
q
(Equation (3)).
But the bilateral lter is not a convolution because the range weight
G
r
([I
p
I
q
[) depends on the pixel value I
p
. Durand and Dorsey over-
came this by picking a xed intensity value i, computing the product
for it, G
r
([i I
q
[) I
q
, and convolving it with the Gaussian kernel G
r
.
After normalization, this gives the exact result of the bilateral lter at
all pixels p such that I
p
= i. Computing the bilateral lter this way
would be extremely slow because it requires a convolution for each
possible pixel value i.
Instead, Durand and Dorsey propose a two-step speed-up. First,
they select a sparse subset i
0
, . . . , i
n
of the intensity values. For each
value i
k
, they evaluate the product G
r
([i
k
I
q
[)I
q
. This produces lay-
ers

L
0
, . . . ,

L
n
. Each

L
k
is then convolved with the spatial kernel G
s
and normalized to form a new layer

L
k
that contains the exact results
of the bilateral lter for pixels with intensity equal to i
k
. For pixels
whose intensities have not been sampled, the result is linearly inter-
polated from the two closest layers. To further speed up the process,
they downsample the image I prior to computing the product with the
range weight G
r
and convolving with the spatial kernel G
s
. The nal
layers

L
0
, . . . ,

L
n
are obtained by upsampling the convolution outputs.
The bilateral lter results are still obtained by linearly interpolating
between the two closest layers (Table 4.2).
Durand and Dorseys approximation dramatically speeds up the
computation. Whereas a brute force implementation requires several
minutes of computation for a megapixel image, their scheme runs in
Table 4.2 Reformulation proposed by Durand and Dorsey [21].
1. Given a 2D image I, compute a low-resolution
version

I, pick a set of intensities i
0
, . . . , i
n
,
and compute layers

L
0
, . . . ,

L
n
:
L
k
(q) = G
r
([i
k

I
q
[)

I
q
.
2. Convolve each layer with the spatial kernel and
normalize the result:
L
k
= (G
s

L
k
) (G
s
G
r
),
where indicates a per-pixel division and
G
s
G
r
corresponds to the sum of the weights
at each pixel.
3. Upsample the layers

L
k
to get

L
k
.
4. For each pixel p with intensity I
p
, find the
two closest values i
k
1
and i
k
2
, and output the
linear interpolation:
BF[I]
p
I
p
i
k
1
i
k
2
i
k
1
L
k
2
+
i
k
2
I
p
i
k
2
i
k
1
L
k
1
.
about one second. The downside of this approach is that in practice,
the achieved result can be signicantly dierent from the reference
brute force implementation, and there is no formal characterization
of this dierence. In the next section, we discuss the scheme of Paris
and Durand [50] that is inspired by the layered approximation, and
achieves an equivalent speed-up but with signicantly better accuracy.
We discuss the relationship between both approaches at the end of the
following section.
4.5 Bilateral Grid
Inspired by the layered approximation of Durand and Dorsey [21], Paris
and Durand [50] have reformulated the bilateral lter in a higher dimen-
sional homogeneous space. They described a new image representation
where a gray-level image is represented in a volumetric data structure
that they named the bilateral grid. In this representation, a 2D image I
is represented by a 3D grid where the rst two dimensions of the grid
corresponds to the pixel position p and the third dimension correspond
to the pixel intensity I
p
. In addition, this 3D grid stores homogeneous
values, that is, the intensity value I is associated with a non-negative
weight w and stored as an homogeneous vector (wI, w). Using this con-
cept, Paris and Durand [50] showed that the bilateral lter corresponds
to a Gaussian convolution applied to the grid, followed by sampling and
normalization of the homogeneous values.
More precisely, the authors consider the o ! domain and repre-
sent a gray-scale image I as a 3D grid :
(p
x
, p
y
, r) =
_
_
I(p
x
, p
y
), 1
_
if r = I(p
x
, p
y
)
(0, 0) otherwise
. (17)
With this representation, they demonstrate that bilateral ltering
exactly corresponds to convolving with a 3D Gaussian whose parame-
ters are (
s
,
s
,
r
):

= G
s,s,r
. They show that the bilateral lter
output is BF[I] (p
x
, p
y
) =

_
p
x
, p
y
, I(p
x
, p
y
)
_
. This process is illustrated
in Figure 4.1 and detailed in Table 4.3.
The benet of this formulation is that the Gaussian-convoluted grid
GC[] is a band-limited signal because it results from a Gaussian con-
volution with a low-pass lter. Paris and Durand use this argument
to downsample the grid . As a result, they deal with fewer stored
data points and achieve performance on the order of one second for
images with several megapixels. Chen et al. [16] further improved the
performances by mapping the algorithm onto modern graphics hard-
ware, obtaining running times on the order of a few milliseconds. Paris
and Durand recommend using the Gaussian width parameters
s
and
r
to set the sampling rates for the 3D grid. This yields a complexity
of O
_
[o[ +
|S|
s
2
|R|
r
_
where [o[ is the size of the spatial domain (i.e.,
the number of pixels) and [![ is the size of the range domain (i.e., the
extent of the intensity scale).
This approach can be easily adapted to cross bilateral ltering and
color images. The downside is that color images require a 5D grid which
x
sampling in the x space

space (x)
r
a
n
g
e

(
)
Gaussian convolution
division
slicing
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120
w
w
bf
i
bf
w
bf
w i
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120
Fig. 4.1 Overview on a 1D signal of the reformulation of the bilateral lter as a linear
convolution in a homogeneous, higher dimensional space. Figure reproduced from Paris and
Durand [50].
no longer maps nicely onto graphics hardware and that requires large
amount of memory for small kernels (10 pixels or less).
4.5.1 Link with the Layered Approximation
The bilateral grid and the layered approximation share the idea of
subsampling along the intensity axis and downsampling in the spa-
tial domain. The major dierence is in the way the downsampling is
performed. The layered approximation encounters diculties at discon-
tinuities: it averages adjacent pixels with dierent values, e.g., a white
and a black pixel ends up being represented by one gray value that
Table 4.3 Approximation proposed by Paris and Durand [50]. In practice, localized down-
sampling and upsampling eliminates the need to build the entire high-resolution grid in
memory.
1. Given a 2D image I, build the grid : o !
R
2
that contains homogeneous values:
(p
x
, p
y
, r) =
_
(I(p
x
, p
y
), 1) if r = I(p
x
, p
y
)
(0, 0) otherwise
.
2. Downsample to get

.
3. Perform a Gaussian convolution of

, for each
component independently
GC
_
_
(p
x
, p
y
, r) = G
s,r

(p
x
, p
y
, r),
where G
s,r
is a 3D Gaussian with
s
as
parameter along the two spatial dimensions and
r
along the range dimension.
4. Upsample GC
_
_
to get

.
5. Extracting the result: For a pixel p with
initial intensity Ip, we denote (
wI, w) the
value at position (p
x
, p
y
, I
p
) in

. The result of
the bilateral filter is
BF[I]
p

wI/ w.
poorly represents the original signal. In comparison, the bilateral grid
subsampling strategy preserves adjacent pixels with dierent intensi-
ties, because they are far apart along the intensity axis. In the white
and black pixels case, the bilateral grid retains the two dierent values
involved and thus is able to produce better results. Figure 4.2 illustrates
this behavior. The bilateral grid should be preferred over the layered
approximation, because both approaches perform equivalently fast.
4.6 Bilateral Pyramid
For several applications such as detail enhancement [25], it is desirable
to decompose the image into more than two layers. Fattal et al. [25]
4.6 Bilateral Pyramid 41
(a) Downsampling of the layered approximation (b) Downsampling of bilateral grid approximation
Fig. 4.2 Compared to the layered approximation, the bilateral grid better represents dis-
continuities and thus yields superior results. This gure is reproduced from Paris and
Durand [50].
propose to compute such a decomposition by successively applying
the bilateral lter to the image with varying parameters: the spatial
parameter
s
is doubled at each level and the range parameter
r
is
halved. Based on this scenario, they describe a dedicated numerical
scheme. Intuitively, instead of computing each level from scratch,
they use the result from the previous level and rely on the fact that
this image has already been smoothed to simplify the computation. For
each level, they compute a bilateral lter based on a 5 5 kernel. At
the rst level they apply the bilateral lter with a small kernel
s
= 1,
and at each subsequent level they double the spatial extent of the ker-
nel. A naive approach would use more coecients, e.g., a 9 9 kernel,
but Fattal et al. keep the cost constant by using 5 5 samples and
inserting zeros. For instance, they approximate a 9 9 kernel using
5 5 samples interleaved with zeros, such that a 1 4 6 4 1
row becomes 1 0 4 0 6 0 4 0 1. This proven strategy,
known as an algorithme ` a trous, yields minimal errors when applied
to band-limited signals [43]. In this particular case, the signal is not
Table 4.4 Complexity summary for Bilateral Filter algorithms.
Brute force (Section 4.1) O
|S|
2
Separable kernel (Section 4.2) O
|S| s
Local histograms (Section 4.3) O
|S| logs
Layered approximation (Section 4.4) O
|S| +
|S|
s
2
|R|
r
Bilateral grid (Section 4.5) O
|S| +
|S|
s
2
|R|
r
(a) Input (876x584)
(b) Input (c) Exact bilateral filter using CIE Lab
(d) Bilateral-grid implementation using per-channelRGB
(0.48s, PSNR
RGB
= 38dB, PSNR
Lab
= 34dB)
(e) Bilateral-grid implementation using RGB
(8.9s, PSNR
RGB
= 41dB, PSNR
Lab
= 39dB)
(f) Separable-kernel implementation using CIE Lab
(5.8s, PSNR
RGB
= 42dB, PSNR
Lab
= 42dB)
(g) Bilateral-grid implementation using CIE Lab
(10.9s, PSNR
RGB
= 46dB, PSNR
Lab
= 46dB)
Fig. 4.3 Comparison of dierent strategies for ltering a color source image (a,b). Processing
the red, green, and blue channels independently is fast but can cause color bleeding that
removes the cross from the sky in (d). Filtering RGB vectors is slower but improves results
although some bleeding remains (e). Using a perceptually motivated color space such as
CIE-Lab addresses those artifacts (c,g). The separable-kernel implementation is fast but
incurs axis-aligned streaks (f) that may undesirable in a number of applications. These
remarks are conrmed by the numerical precision evaluated with the PSNR computed the
RGB and CIE-Lab color spaces. The contrast of the close-ups has been increased for clarity
purpose. This gure is reproduced from Paris and Durand [50].
4.7 Discussion 43
band-limited because bilateral ltering preserves edges. Yet, Fattals
results show that in practice, this approximation achieves good results
without visual defects.
4.7 Discussion
The choice of implementation is crucial to achieving satisfying results
with good performance. Table 4.4 summarizes the complexity of the
various implementations we described.
When graphics hardware is available, we recommend the bilateral
grid method of Chen et al. [16], because it achieves high-quality out-
puts and real-time performances even on high-resolution images and
videos. If only the CPU is available, the choice is split between the
local-histogram method of Weiss [71] and the bilateral grid of Paris
and Durand [50]. To process color images or compute a cross bilateral
lter, the bilateral grid provides a satisfying solution, especially with
large kernels. To process gray-level images with kernels of any size,
e.g., in an image-editing package where users can arbitrarily choose
the kernel size, the local-histogram approach is preferable because it
consistently yields short running times. On color images, this approach
can yield less satisfying results because channels are processed inde-
pendently, which may cause some color bleeding (Figure 4.3).
5
Relationship between Bilateral Filtering and
Other Methods or Framework
Filtering an image while preserving its edges has been addressed in
many ways in computer vision. Interestingly, some methods give results
that are qualitatively very similar to those from bilateral ltering. So
the natural question is to investigate what kind of relationships may
exist between bilateral ltering and other existing methods. In this
section we focus on local mode ltering, robust statistics and PDE-
based approaches.
5.1 Bilateral Filtering is Equivalent to Local Mode Filtering
Local mode ltering was introduced by Van de Weijer and van den
Boomgaard [65] as an extended ltering method to preserve edges and
details. In this section, we demonstrate that the bilateral ltering is
a local mode seeking approach. Based on this histogram interpreta-
tion, Weiss [71] proposed a fast numerical scheme, and Chen et al. [16]
showed that the bilateral grid can be used for local histogram equal-
ization. Refer to Section 4.3 for more details.
Given a pixel and its associated local histogram, local mode ltering
is an iterative procedure which converges to the closest highest mode of
44
5.1 Bilateral Filtering is Equivalent to Local Mode Filtering 45
Fig. 5.1 (a) Image and local neighborhood for a given pixel. (b) In the local mode ltering,
proposed by Van de Weijer and van den Boomgaard [65], each pixel moves toward the
maximum of the local mode it belongs to. In this example, the intensity of the center pixel
will move toward the maximum of the mode made of low-intensity pixels. (c) Eect on the
local histogram of the range parameter.
the local histogram, starting from the value of the pixel at the center of
the neighborhood. This is illustrated in Figure 5.1(a) and (b). Choosing
the closest local mode instead of the global mode allows details to be
preserved.
Like the bilateral lter, local mode ltering depends on two param-
eters: one which denes the neighborhood for the local histogram esti-
mation, and one which is the smoothness parameter of the histogram.
The inuence of the latter parameter is illustrated in Figure 5.1(c):
when the smoothing parameter increases, local modes and the global
mode merge into a single global mode which corresponds to the stan-
dard Gaussian smoothed value. In that case, details are not preserved.
To dene local mode ltering, given a gray-scale image I : !,
one can start with the denition of a histogram:
H
1
(i) =
qS
(I
q
i), i !,
where is the Dirac function so that (s) = 1 if s = 0, and (s) = 0
otherwise. A classical operation consists in smoothing histograms, so
that we dene:
H
2
(i,
r
) = H
1
G
r
(i) =
qS
G
r
(I
q
i),
46 Relationship between Bilateral Filtering and Other Methods or Framework
where
r
denotes the smoothing done on the intensity values, i.e., on the
range. A step further, one can dene an histogram locally, i.e., around a
given position p. To do it, one can introduce a spatial Gaussian kernel
centered around p:
H
3
(p, i,
r
,
s
) =
qS
G
s
([[p q[[) G
r
(I
q
i), (18)
where
s
determines the spatial neighborhood around p. Local his-
tograms can be used to study image properties [36] but also to perform
image restoration. The idea of local mode ltering is to make the inten-
sity I
p
of the center pixel evolve toward the closest local maximum. So,
I
p
will verify:
H
3
i
(p, i,
r
,
s
)
i=Ip
= 0. (19)
Taking into account Equation (18) and the expression of the Gaussian
kernel, Equation (19) becomes:
qS
(I
q
i) G
s
([[p q[[) G
r
(I
q
i) = 0,
so that I
p
should verify the following implicit equation:
I
p
= i where i is such that i
=
qS
G
s
([[p q[[) G
r
(I
q
i)I
q
qS
G
s
([[p q[[) G
r
(I
q
i)
. (20)
To solve this implicit equation, one can propose the following iterative
scheme: Given I
0
p
= I, estimate:
I
t+1
p
=
qS
G
s
([[p q[[) G
r
(I
t
q
I
t
p
)I
t
q
qS
G
s
([[p q[[) G
r
(I
t
q
I
t
p
)
for all p. (21)
Interestingly, the right-hand side term of Equation (21) corresponds to
the denition of the bilateral lter: Consequently, bilateral ltering can
be considered as a local mode seeking method.
Remark. Another important relation established by van de Weijier
and van den Boomgaard [65] is the correspondence between local
mode ltering and the framework of robust statistics. In fact, maxi-
mizing H
3
is equivalent to minimizing a residual error (p, i,
r
,
s
) =
1 H
3
(p, i,
r
,
s
). We explain this idea in more detail later, but focus
more on the link between the bilateral lter and robust statistics (see
Section 5.2).
5.2 The Bilateral Filter is a Robust Filter
Robust statistics oers a general background to model a large class
of problems, including image restoration (see Ref. [30, 29, 40, 28, 27]
for more details). Expressed as optimization problems in a discretized
space, it is possible to dene some edge-preserving restoration formu-
lations. In this section, we show that bilateral ltering corresponds to
a gradient descent of a robust minimization problem.
Image restoration can be formulated as a minimization problem in
the following way: Given a noisy image I
n
, the problem is to nd the
minimizer of the discrete energy:
min
I
pS
_
_
(I
p
I
n
p
)
2
+
qN(p)
(I
q
I
p
)
_
_
, (22)
where A(p) is a neighborhood p, and is a weighting function (also
called error norm).
Energy in Equation (22) contains two kinds of terms. The rst term
is a delity-of-attachment term which prevents the solution from drift-
ing too far away from the noisy input values. The second term is a
regularization term that will penalize dierences of intensities between
neighboring pixels, with a strength that depends on the function .
Thus the regularity of the solution will depend on function . In par-
ticular, this method will be robust if we can preserve signicant inten-
sity dierences such as edges, i.e., if we can distinguish the dierence
between inliers and outliers. Several possible functions have been
proposed in literature, as we are going to show in this section.
Let us now focus on the regularization term of Equation (22) to
show the relationship with the bilateral lter. To do so, we introduce
the following reweighted version of the regularization term, so that the
minimization problem becomes:
min
I
pS
qN(p)
G
s
([[q p[[) (I
q
I
p
) (23)
To minimize Equation (23), one can iterate the following gradient
descent scheme:
I
t+1
p
= I
t
p
+

[A(p)[
qN(p)
G
s
([[q p[[)
(I
t
q
I
t
p
). (24)
By choosing (s) = 1 G
r
(s), we obtain:
I
t+1
p
= I
t
p
+

[A(p)[
qN(p)
G
s
([[q p[[) G
r
(I
t
q
I
t
p
)(I
t
q
I
t
p
). (25)
This equation has in fact some similarities with the bilateral ltering
expression, which corresponds to a weighted average of the data, that
we remind here:
I
t+1
p
=
q
G
s
([[q p[[) G
r
(I
t
q
I
t
p
)I
t
q
q
G
s
([[q p[[) G
r
(I
t
q
I
t
p
)
(26)
and, interestingly, it has been shown that Equations (24) and (26) are
indeed two equivalent ways to solve the same minimization approach
(see, e.g., [29]). Intuitively, one can remark that both formulas average
the same pixels using the same weights, and the only dierence is the
weight of the center pixel I
t
p
. The conclusion is that the bilateral lter
is a special case of a robust lter.
More generally, Durand and Dorsey [21] studied the bilateral lter
in the framework of robust statistics [29, 30] in a similar manner as the
work of Black et al. [11] on PDE lters. The authors showed that the
range weight can be seen as a robust metric, that is, it dierentiates
between inliers and outliers. The bilateral lter replaces each pixel
by a weighted average of its neighbors. The weight assigned to each
neighbor determines its inuence on the result and is crucial to the
output quality. In this context, robust statistics estimates if a pixel is
relevant, i.e., is an inlier, or if it is not, i.e., is an outlier. The strategy
followed by the bilateral lter is that pixels with dierent intensities
are not related and should have little inuence on each other, whereas
pixels with similar intensities are closely related and should strongly
inuence each other. The way that this intensity dierence actually
contributes is dened by the range weight. The most common choice is
a Gaussian function G
r
.
However, Durand and Dorsey [21] have underscored that this Gaus-
sian function is only one of the possible choices among a variety of
robust weighting functions (cf. Figure 5.2-top), a.k.a. stopping func-
tions. These functions dene the weights assigned to a pixel according
to its dierence of intensity with the center pixel. For instance, a clas-
sical non-robust mean assigns the same weight to all pixels. In compar-
ison, robust functions have a bell prole that assign lower weights to
pixels with a dierent intensity. The dierences lie in the fall-o rate
which denes how narrow is the transition between inliers and outliers,
and in the tail value: either non-zero, meaning that outliers still have
some limited inuence, or zero, meaning that outliers are completely
Fig. 5.2 Qualitative illustration of the inuence of weighting functions for image restoration.
The rst two rows show respectively dierent choices of weighting functions and their cor-
responding inuence functions
. These graphs were adapted from Black et al. [11]; Durand

and Dorsey [21]. Last rows show results obtained on the image presented in Figure 5.1 with
the corresponding weighting functions.
ignored. This behavior is better observed on the inuence function
(Figure 5.2-bottom) that shows the variations of the output depending
on the pixel intensity. The constant weight of classical averaging is not
robust because its inuence function is unbounded which reects the
fact that a single pixel can have an unlimited inuence on the mean
value, e.g., a single very bright pixel can make the average arbitrarily
high. In contrast, robust inuence functions are bounded, showing that
a single pixel cannot modify the output beyond a certain point. Some
robust functions such as the Gauss, Tukey, and Lorentz functions are
even redescending, reecting the fact that pixels with a large intensity
dierence are considered irrelevant and ignored, i.e., they have no
inuence on the output.
Durand and Dorsey [21] showed that these concepts can be applied
to the bilateral lter and that the choice of the range function denes
how the pixels across are handled (see some results in Figure 5.2). For
instance, with the classical Gaussian function, pixels across edges still
have some inuence though very limited; with a Tukey function, these
pixels would be ignored. However, according to Durand and Dorseys
experiments, the Gauss and Tukey functions perform better for their
tone-mapping operator. As far as we know, these options have not been
tested with other applications.
The energy function dened by robust norms is usually not con-
vex and can lead to local minima, similar to the local modes of
histograms discussed above. Which local minimum is most desirable
depends on the application. The bilateral lter performs one step
toward the minimum closest to the input value. This is usually desir-
able because most applications seek to smooth low-amplitude noise
while retaining local structure. However, some cases might require a
dierent treatment, such as impulse noise where the value of a pixel
can be severely corrupted. In this case, the robust statistics litera-
ture advocates initialization with an estimator that is very robust
but might not be very precise, such as the median. For impulse noise
removal, a median lter can be used to steer the bilateral lter at
a pixel toward a local minimum that is consistent with its neigh-
bors. In practice, this involves computing the range Gaussian based
on the dierence between a pixel and the median-ltered image rather
5.3 Bilateral Filtering is Equivalent to the Perona and Malik Equation 51
than the dierence with the input pixel value. See Section 3.1.4 for
detail.
Remark. As connections can be established between robust statistics
and nonlinear PDEs, then we have also the same interpretations of
bilateral ltering as a robust nonlinear operator in the continuous
framework of PDEs. This is further explained in Section 5.3.
5.3 Bilateral Filtering is Equivalent Asymptotically to the
Perona and Malik Equation
Bilateral ltering smooths an image while preserving strong edges.
Interestingly, many research projects were carried out in the eld of
nonlinear partial dierential equations (PDEs) to achieve the same
goal, and some models such as [52] give results very similar to bilat-
eral ltering. In this section we revisit several contributions showing
the links between bilateral ltering and PeronaMalik model in the
discrete setting, and more generally between neighborhood lters and
PDE-based approaches in the continuous setting.
1
Of course, the eld of
PDE-based approaches is very large and one may nd better approaches
than bilateral ltering depending on the application. Intensive research
has been carried out in this area, including nonlinear approaches for
image restoration (we refer to [2] for a review). Here we focus on the
nonlinear operators that are related to bilateral ltering.
5.3.1 Results in the Discrete Setting
Anyone studying PDE-based approaches for image processing came
across the famous nonlinear one by Perona and Malik [52]. Starting
from the heat equation and based on the remark that I = div(I),
the authors proposed to introduce a weighting coecient depending on
1
Until now, we considered an image as a discrete set of pixels. Instead, in this section, we
will need to consider an image dened continuously, i.e., an analog image where space is
no longer discretized. The motivation becomes clear when one needs for instance to dene
a notion of derivative. Formally, keeping the same notations, this introduces only minor
changes in the formulation of the bilateral lter. The only dierence here is that sums are
replaced by integrals: Positions p and q now vary on a continuous domain.
the image gradient to prevent edges to be smoothed. Their model is
written in the continuous setting:
I
t
= div
_
c([[I[[
2
) I
_
, (27)
where c : [0, +[]0, +] is a smooth decreasing function. We refer to
Perona and Malik [52] for more details.
In the discrete setting, Durand and Dorsey [21] showed that if the
bilateral lter is restricted to the four adjacent neighbors of each pixel,
then it actually corresponds to a discrete version of the Perona and
Malik [52] model.
This result has been extended by Elad [23] and Barash and Comani-
ciu [7] who have demonstrated that the bilateral lter can be seen as
the sum of several PeronaMalik lters at dierent scales, that is, the
image derivatives are computed with pixels at a distance, not only with
adjacent pixels.
5.3.2 Results in the Continuous Setting
Another important contribution came from Buades et al. [13] who
proved rigorously that for small neighborhoods, the Yaroslavsky
lter:
Y
s,r
[I](p) =
1
W(p)
_
B(p,s)
exp
_
[I(q) I(p)[
2
r
2
_
I(q)dq, (28)
i.e., a bilateral lter using a box function as spatial weight, behaves as
the PeronaMalik lter. Such a result can only be established locally,
that is when small neighborhoods are considered, because the action
of PDE is very local (local structure is taken into account through
derivatives). So the proof of Buades et al. is based on an asymptotic
study which relies on the fact that the image is well approximated by its
second-order Taylor expansion; their result holds for any neighborhood
as long as it covers a suciently regular area such as a region of skin
or sky.
In this section, we present the results by Buades et al. [13] who
revisited the notion of the bilateral lter by studying the more general
neighborhood lter (see also [2] for more details). Here the notion of
neighborhood must be understood broadly: neighboring pixels, neigh-
boring or similar intensities, or neighboring neighborhoods. Each of
these meanings will correspond to a specic lter. Interestingly, the
authors also proved the link between these lters and well-known PDEs
such as the heat equation and the PeronaMalik equation.
A general neighborhood lter can be described as follows. Let I be
an image to be ltered or denoised and let w
s
: R
+
R
+
and w
r
:
R
+
R
+
be two functions whose roles will be to enforce, respectively,
photometric and geometric locality (in Section 2, w
s
and w
r
are both
Gaussian kernels). Parameters
s
and
r
will measure the amount of
ltering for the image I. The ltered image at scale (
r
,
s
) is given by:
BF[I](p) =
1
W(p)
_
S
w
r
([I(q) I(p)[) w
s
([[p q[[)I(q)dq,
where W(p) is a normalization factor.
W(p) =
_
S
w
r
([I(q) I(p)[) w
s
([[p q[[)dq.
For simplicity we suppose that the image has been extended from the
domain image o (a rectangle) to the whole of R
2
, by symmetry and
periodicity.
With this formalism we can easily recover the classical spatial linear
Gaussian ltering by choosing w
r
1 and w
s
(t) = exp
_
t
2
s
2
_
.
Now let us consider bilateral ltering. As mentioned before, the idea
is to take an average of the values of pixels that are both close in gray
level value and spatial distance. Of course many choices are possible
for the kernels w
r
and w
s
. Classical choices are:
w
r
(t) = exp
_
t
2
r
2
_
,
and
w
s
(t) = exp
_
t
2
s
2
_
or w
s
(t) =
B(p,s)
(t),
where
B(p,s)
denotes the characteristic function of the ball of center
p and radius
s
. With the former choice of w
s
, we get the SUSAN
lter [59] or the bilateral lter [63] (see Section 2). With the latter
choice of w
s
, we recover the Yaroslavsky lter dened in Equation (28).
The SUSAN and Yaroslavsky lters have similar behaviors. Inside a
homogeneous region, the gray level values slightly uctuate because of
the noise. Near sharp boundaries between a dark and a bright region,
both lters compute averages of pixels belonging to the same region as
the reference pixel: edges are not blurred.
Interestingly, the estimation of the residue I
s,r
(p) I(p) gives
some analogies with well-known PDEs. This is more precisely stated in
the following theorem.
Theorem 5.1. Suppose I (
2
(o) and let
s
,
r
, and > 0 such that
s
,
r
0 and
r
= O(
s
).
Let us consider the continuous functions g(t) =
1
3
t exp(t
2
)
E(t)
for t ,= 0,
g(0) =
1
6
where E(t) =
_
t
0
exp(s
2
)ds, and f(t) = 3g(t) + 3
g(t)
t
2

1
2t
2
,
for t ,= 0 and f(0) =
1
6
.
Then for x o,
if < 1, Y
s,r
[I](p) I(p)
I(p)
6

s
2
,
if = 1, Y
s,r
[I](p) I(p)
_
g (
s
r
[[I(p)[[)I
TT
(p)
+f(
s
r
[[I(p)[[) I
NN
(p)
s
2
,
if 1 < <
3
2
, Y
s,r
[I](p) I(p) g(
s
1
[[I(p)[[)
_
I
TT
(p)
+3I
NN
(p)
s
2
,
where I
TT
= D
2
u
_
I
||I||
,
I
||I||
_
and I
NN
= D
2
u
_
I
||I||
,
I
||I||
_
.
We refer to [13, 12] for the proof of the theorem. It is not dicult,
somewhat technical, and relies on a Taylor expansion of I(q) and the
exponential function.
More interesting is the interpretation of this theorem. For ranging
from 1 to
3
2
an iterated procedure of the Yaroslavsky lter behaves
asymptotically as an evolution PDE involving two terms respectively
proportional to the direction T =
I
(p)
||I(p)||
, which is tangent to the level
passing through p and to the direction N =
I(p)
||I(p)||
, which is orthogonal
to the level passing through p. In fact, we may write:
Y
s,r
[I](p) I(p)
s
2
= c
1
I
TT
+ c
2
I
NN
.
The ltering or enhancing properties of the model depend on the sign
of c
1
and c
2
. Following Theorem 5.1, we have:
If < 1, then
Ys,r
[I](p)I(p)
s
2

I(p)
6
, which corresponds to
a Gaussian ltering.
If = 1, the neighborhood lter acts as a ltering/enhancing
algorithm. As the function g is positive (and decreasing)
there is always a diusion in the tangent direction, but
because the function I can take positive or negative values,
we may have ltering/enhancing eects depending on the
values of [[I(p)[[. For example, if [[I(p)[[ > a
r
s
, where a
is such that I(a) = 0, then we get an enhancing eect. Let
us remark that because g(t) 0 as t , points with large
gradient are preserved.
If 1 < <
3
2
, then
s
r
tends to innity and g(
s
r
[[I[[) tends
to zero and consequently the original image is hardly deteri-
orated.
Finally, let us observe that when = 1, the Yaroslavsky lter
behaves asymptotically like the PeronaMalik (Equation (27)) which
can be rewritten as:
I
t
= div(c([[I[[
2
)I) = c([[I[[
2
)I
TT
+ b([[I[[
2
)I
NN
, (29)
where b(t) = 2t c
(t) + c(t). By choosing c(t) = g(
t) in (29) we get:
I
t
= g([[I[[
2
)I
TT
+ h([[I[[
2
)I
NN
,
with h(t) = g(t) + t g
(t). We have h(t) ,= f(t) but the coecients in the

tangent direction for the PeronaMalik equation and the Yaroslavsky
lter are equal, and the functions h and f have the same behavior.
This explains why the bilateral lter and PeronaMalik models share
the same qualitative properties.
2
2
In particular, both suer from shock formation, a.k.a. over-sharpening, that creates aliased
edges from smooth ones. In Section 6.1.3 we will show neighborhood lter can be extended
to avoid this eect.
Remark. The weight dened in the bilateral lter is inversely pro-
portional to the total distance (both in space and range, see also Figure
4.1) from the center sample. This idea is also similar in spirit to the
Beltrami ow proposed by Sochen et al. [61]. There, the eective
weight is the geodesic distance between the samples. More precisely,
the authors introduced the notion of image manifolds where an image
I is represented by a manifold M embedded in o !:
(p
x
, p
y
) o M(p
x
, p
y
) =
_
p
x
, p
y
, I(p
x
, p
y
)
_
o ! (30)
With this representation, Barash [6] and Barash and Comanieiu [7]
demonstrated that bilateral ltering is based on the Euclidean distance
of o !instead of the manifold geodesic distance. Note that Paris and
Durand [50] used a similar metric but in a signal-processing context (see
Section 4.5). Sochen et al. [60] have also shown that bilateral ltering is
an approximation to Gaussian ltering using the geodesic metric (i.e.,
using distances measured on the image manifold M) when the Gaussian
kernel is small.
6
Extensions of Bilateral Filtering
This section describes two main extensions of the bilateral lter.
First, variants have been developed to better handle gradients by
taking the slope of the signal into account or avoid the staircase
eect (Section 6.1). Second, bilateral ltering has been extended to
handle several images to better control the way edges are detected
(Section 6.2).
6.1 Accounting for the Local Slope
Humans consistently identify at least three visually distinctive image
features as edges or boundaries: a sharp, step-like intensity change,
a sharp, ridge- or valley-like gradient change, or both. The bilateral
lter is particularly good at preserving step-like edges, because the
range domain ! lter averages together all similar values within the
neighborhood space domain, and also assigns tiny weights to dissimilar
values on the opposite side of the step, as shown in Figure 2.2 helps
maintain the step-like changes without smoothing.
Several researchers [14, 19, 23] have proposed extensions to the bilat-
eral lter to improve edge-preserving results for ridge- and valley-like
57
58 Extensions of Bilateral Filtering
edges as well. As explained by Elad [23] most noted that the bilateral
lter smooths images toward a piecewise constant intensity approxima-
tion of the original signal, and instead, each proposes smoothing toward
piecewise constant-gradient (or low curvature) results instead.
6.1.1 Trilateral Filter
Sharp changes in gradients and large, high-gradient areas degrade the
desirable smoothing abilities of the bilateral lter. As shown for one
image scan-line in Figure 6.1(b), we can approximate the extent of the
combined spatial and range lters as a rectangle centered around each
input pixel: position within this rectangle sets the weight assigned to all
its neighboring pixels. At ridge- or valley-like edges, gradients change
abruptly but intensities do not, as shown in Figure 6.2 feature (1).
filter filter filter
Fig. 6.1 Filter extent for one scan-line of an image.
smoothing signal
smoothing
Fig. 6.2 Dicult image features: (1) Ridge-like and valley-like edges, (2) high-gradient
regions, (3) similar intensities in disjoint regions.
Applying the Bilateral ltering here is troublesome, because the rect-
angular lter extent encloses pixels that span the peak of the ridge or
valley, and the lter blends these intensities to form a blunt feature
instead of the sharp, clean edge with disjoint gradients. High-gradient
regions between ridge- or valley-like edges also reduce the bilateral l-
ters eectiveness. As shown in Figure 6.1(b) and Figure 6.2 feature (2)
the spatial lter extent (the box width) has little eect, as only a narrow
portion of the input signal falls within the box, and the range lters
extent (box height) dominates the ltering. Figure 6.2 feature (3) also
shows that applying the bilateral lter near sharply peaked valley- or
ridge-like features may permit the spatial extent (box width) to include
disjoint portions of the input signal, averaging together image regions
that may belong to unrelated objects in the image.
The trilateral lter introduced by Choudhury and Tumblin [19]
addressed these problems by combining modied bilateral lters with a
pyramid-based method to limit lter extent. First, they applied a bilat-
eral lter to the image gradients to help estimate the slopes any sep-
arate image regions. Using these slopes, they tilt the lter extent of
a bilateral lter applied to image intensity; this ane transform of the
range lter, as shown in Figure 6.1(c), restores the eectiveness of the
spatial lter term. Finally, for each output pixel, they limit the extent
of this tilted bilateral lter to a connected set of pixels that share sim-
ilar ltered-gradient values. To reduce the substantial computing time
required to nd these connected components, they describe a pyramid-
like structure suitable for fast evaluation. They also automatically set
all but two of the parameters of their ltering method, so that the user
control resembles the bilateral lters two parameters. Figure 6.3 shows
signal
smoothing
smoothing
smoothing
Fig. 6.3 Large, smoothly varied gradients can cause stair-stepping in isotropic diusion,
and weak smoothing in the bilateral lter. Higher order PDEs (e.g., LCIS) and bilateral
variants that smooth toward piecewise linear results form stairsteps in gradients instead.
Fig. 6.4 Sample tone mapping results obtained with the trilateral lter (top two rows) and
sample mesh denoising (bottom row).
comparisons between the trilateral lter and other approaches. When
applied to tone mapping or mesh fairing, the trilateral lter results in
Figures 6.4 are visibly comparable or better than the bilateral lter
alone, but these improvements come at a high computational cost.
6.1.2 Symmetric Bilateral Filter
Elad [23] proposes to account for the image slope by comparing the
intensity of the ltered pixel with the average of another pixel and its
symmetric point:
SBF[I]
p
=
1
W
p
qS
G
s
([[p q[[) G
r
([[I
p
I
s
[[) I
s
, (31)
where I
s
is the average between the pixel q and its symmetric with
respect to p, that is: I
s
=
1
2
_
I(q) + I(2p q)
_
. As far as we know,
the performance of this extension is unclear because it has not been
extensively tested.
6.1.3 Regression Filter
The origin of the staircase eect can be explained with a 1-D con-
vex increasing signal (respectively a 1-D increasing concave signal)
(Figure 6.5). For each p, the range of points q such that I(p) h <
I(q) I(p) is larger (respectively smaller) than the number of points
satisfying I(p) I(q) I(p) + h. Thus, the average value Y
s,h
is
smaller (respectively larger) than I(p). As edges correspond to inec-
tion points (i.e., points where I
= 0) the signal is enhanced there; the

discontinuities become more marked.
To overcome this diculty, Buades et al. [14] introduced an inter-
mediate regression correction to better approximate the signal locally.
p p-h p+h
p(x)
p(x-h)
p(x+h)
p p+h p-h
u(p)-v
u(p)
u(p)+v
Fig. 6.5 Why the Yaroslavsky lter (and similarly the bilateral lter) creates stepwise
functions: (a) The 1D illustrations show that for each p, the range of points q such
that I(p) h < I(q) I(p), is larger than the number of points satisfying I(p) I(q)
I(p) + h. Thus the average will be biaised. (b) This can be avoided with a locally linear
approximation.
For every p in a 2D image, one searches for a triplet (a, b, c) minimizing:
_
B(p,s)
w(p, q)(I(q) aq
1
bq
2
c)
2
dq, (32)
where w(p, q) = exp
|I(q)I(p)|
2
r
2
, and then replacing I(p) by (ap
x

bp
y
c). Let us denote this improved version of the original
Yaroslavsky lter (see also Section 5.3.2) by L
s,r
.
Theorem 6.1. Suppose I C
2
(o), and let
s
, h > 0 be such that
s
,
r
0 and O(
s
) = O(
r
). Let g be the continuous function dened
by g(0) =
1
6
and g(t) =
8t
2
e
t
2
8te
t
2
E(t)+2E(t)
2
t
2
(4E(t)
2
8te
t
2
E(t))
, for t ,= 0, where E(t) =
_
t
0
e
s
2
ds. Then:
L
s,r
[I](p) I(p)
_
1
6
I
+ g
_
r
[[DI[[
_
I
s
2
. (33)
According to Theorem 6.1, the enhancing eect has disappeared;
the coecient in the normal direction is now always positive and
decreasing. When the gradient is large, the weighting function in the
normal direction tends to zero and the image is ltered only in the tan-
gent direction. Figure 6.6 shows how regression can improve the results.
6.2 Using Several Images
6.2.1 Cross and Joint Bilateral Filter
Eisemann and Durand [22] and Petschnigg et al. [53] introduced simul-
taneously the cross bilateral lter, also known as the joint bilateral
lter, a variant of the bilateral lter that decouples the notion of edges
to preserve from the image to smooth. Given an image I, the cross
bilateral lter smooths I while preserving the edges of a second image
E. In practice, the range weight is computed using E instead of I:
CBF[I, E]
p
=
1
W
p
qS
G
s
([[p q[[) G
r
(E
p
E
q
) I
q
,
with W
p
=
qS
G
s
([[p q[[) G
r
(E
p
E
q
).
Figure 6.7 shows a simple use of cross bilateral lter to lter a low-light
picture.
6.2 Using Several Images 63
Fig. 6.6 The staircase eect can be eliminated with regression (see Section 6.1.3). First row
shows the results. Second rows represents the level lines giving a clear representation of the
image smoothness degree. Figure reproduced from Buades et al. [14].
Fig. 6.7 Simple example of cross bilateral ltering. The low-light image (a) is too noisy to
yield satisfying result if ltered on its own with bilateral ltering, see result in (b). Using a
ash picture of the same content (c) and cross bilateral ltering produces a better result (d).
Eisemann and Durand [22] and Petschnigg et al. [53] propose more sophisticated techniques
to handle this ash/no-ash scenario. Figure reproduced from Paris and Durand [50].
6.2.2 Dual Bilateral Filter
Bennett et al. [9] introduced dual bilateral ltering as a variant of
bilateral ltering and cross bilateral ltering. As the cross bilateral
lter, the dual bilateral lter takes two images I and J as input and
lters I
1
. The dierence is that both I and J are used to dene the
edges whereas the cross bilateral lter uses only J. The dual bilateral
is dened by:
DBF[I]
p
=
1
W
p
qS
G
s
([[p q[[) G
I
([[I
p
I
q
[[) G
J
([[J
p
J
q
[[) I
q
(34)
The advantage of this formulation is any edge visible in I or J is taken
into account. Bennett et al. have demonstrated the usefulness of this
strategy in the context of low-light imaging where I is a classical RGB
video stream and J comes infrared cameras. The infrared camera cap-
tures more edges but lacks the colors of a standard RGB camera. In
this context, the strength of dual bilateral ltering is that the noise
properties of I and J can be accounted for separately by setting
I
and
J
independently.
From a formal standpoint, the dual bilateral lter can be interpreted
as a normal bilateral lter based on extended range data (I, J), that
is, the channels of I are glued to those of J to form a single image
with more channels. The range weight is then a classical one except
that it involves higher dimensional data. A minor dierence with the
formulation of Bennett et al. is that the J data are ltered as well, but
one can discard them if needed to obtain the exact same result.
7
Conclusions
We have presented the bilateral lter, its applications, its variants,
reviewed our current theoretical understanding of it, and explained fast
algorithms to evaluate it. We believe that the success of the bilateral
lter lies in its combination of simplicity, good results, and ecient
algorithms. Although alternatives exist for each of these points, few, if
any, combine all these advantages.
The lter is very exible because the range weight can be adapted to
accommodate any notion of pixel value dierence, including arbitrary
color spaces, data from other images, or any information about the
relevance of one pixel to another pixel.
The original goal of the lter was denoising, in which case a small
spatial kernel suces and the residual of the lter is discarded as the
noise component. In contrast, many new applications leverage the bilat-
eral lter to create two-scale decompositions that rely on large spatial
kernels and where the residual of the lter is preserved because it is
much more relevant to the human visual system. The use of large spa-
tial support has motivated a variety of accelerations schemes and the
bilateral lter can now be applied in real time to large inputs.
65
66 Conclusions
Our review of bilateral ltering highlights several avenues for future
research. Although the simple edge model based on color dierence that
subsumes the bilateral lter is often sucient, there is room for a bet-
ter characterization of the important contours to be preserved. The
bilateral lter is also often used to extract the texture of an image.
This is another direction where a more sophisticated model of what
is texture would be benecial. On the theoretical side, while the link
with PDEs is well understood when the spatial kernel is shrunk to
zero, the full implications of large spatial supports deserve more atten-
tion. While ecient implementations exist, they are often limited to a
low-dimensional range, and the extension of these techniques to higher
dimensional data is an exciting challenge. The bilateral lter is most
often employed to yield a two-scale decomposition, but fully multiscale
approaches deserve more investigation because the interplay between
the spatial and range terms make such denitions non-trivial.
Acknowledgments
The work of Sylvain Paris at MIT and Fredo Durand was supported
by a National Science Foundation CAREER award 0447561 Tran-
sient Signal Processing for Realistic Imagery, an NSF Grant No.
0429739 Parametric Analysis and Transfer of Pictorial Style, and
a grant from Royal Dutch/Shell Group. Fredo Durand acknowledges
a Microsoft Research New Faculty Fellowship and a Sloan Fellow-
ship. Jack Tumblins work was supported in part by National Science
Foundation grants NSF-IIS 0535236 and NSF-SGER 0645973. He also
acknowledges and thanks Adobe Systems, Inc. and Mitsubishi Elec-
tric Research Laboratories (MERL) for their support of his research by
two unrestricted gifts to support research on topics in computational
photography.
67
References
[1] M. Aleksic, M. Smirnov, and S. Goma, Novel bilateral lter approach: Image
noise reduction with sharpening, in Proceedings of the Digital Photography II
Conference, volume 6069, SPIE, 2006.
[2] G. Aubert and P. Kornprobst, Mathematical Problems in Image Processing:
Partial Dierential Equations and the Calculus of Variations (Second edition).
volume 147 of Applied Mathematical Sciences. Springer-Verlag, 2006.
[3] J. Aujol, G. Aubert, L. Blanc-Feraud, and A. Chambolle, Image decompo-
sition into a bounded variation component and an oscillating component,
Journal of Mathematical Imaging and Vision, vol. 22, no. 1, January 2005.
[4] V. Aurich and J. Weule, Non-linear gaussian lters performing edge preserving
diusion, in Proceedings of the DAGM Symposium, pp. 538545, 1995.
[5] S. Bae, S. Paris, and F. Durand, Two-scale tone management for photographic
look, ACM Transactions on Graphics, vol. 25, no. 3, pp. 637645, Proceedings
of the ACM SIGGRAPH conference, 2006.
[6] D. Barash, A fundamental relationship between bilateral ltering, adaptive
smoothing and the nonlinear diusion equation, IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, vol. 24, no. 6, pp. 844847, 2002.
[7] D. Barash and D. Comaniciu, A Common framework for nonlinear diusion,
adaptive smoothing, bilateral ltering and mean shift, Image and Video Com-
puting, vol. 22, no. 1, pp. 7381, 2004.
[8] B. E. Bayer, Color imaging array, US Patent 3971065, 1976.
[9] E. P. Bennett, J. L. Mason, and L. McMillan, Multispectral bilateral video
fusion, IEEE Transactions on Image Processing, vol. 16, no. 5, pp. 11851194,
May 2007.
68
References 69
[10] E. P. Bennett and L. McMillan, Video enhancement using per-pixel virtual
exposures, ACM Transactions on Graphics, vol. 24, no. 3, pp. 845852, Pro-
ceedings of the ACM SIGGRAPH conference, July, 2005.
[11] M. J. Black, G. Sapiro, D. H. Marimont, and D. Heeger, Robust anisotropic
diusion, IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 421432,
March 1998.
[12] A. Buades, Image and lm denoising by non-local means, PhD thesis, Uni-
versitat de les Illes Balears, 2006.
[13] A. Buades, B. Coll, and J.-M. Morel, Neighborhood lters and PDEs,
Numerische Mathematik, vol. 105, no. 1, pp. 134, November 2006.
[14] A. Buades, B. Coll, and J.-M. Morel, The staircasing eect in neighborhood
lters and its solution, IEEE Transactions on Image Processing, vol. 15, no. 6,
pp. 14991505, 2006.
[15] F. Catte, P.-L. Lions, J.-M. Morel, and T. Coll, Image selective smoothing and
edge detection by nonlinear diusion, SIAM Journal of Numerical Analysis,
vol. 29, no. 1, pp. 182193, February 1992.
[16] J. Chen, S. Paris, and F. Durand, Real-time edge-aware image processing
with the bilateral grid, ACM Transactions on Graphics, vol. 26, no. 3, p. 103,
Proceedings of the ACM SIGGRAPH conference, 2007.
[17] K. Chiu, M. Herf, P. Shirley, S. Swamy, C. Wang, and K. Zimmerman, Spa-
tially nonuniform scaling functions for high contrast images, in Proceedings of
Graphics Interface 93, pp. 245254, May 1993.
[18] H. Chong, S. Gortler, and T. Zickler, A perception-based color space for
illumination-invariant image processing, ACM Transactions on Graphics,
vol. 27, no. 3, pp. 17, Proceedings of the ACM SIGGRAPH conference, 2008.
[19] P. Choudhury and J. Tumblin, The trilateral lter for high contrast images
and meshes, in Proceedings of the Eurographics Symposium on Rendering,
pp. 111, 2003.
[20] D. DeCarlo and A. Santella, Stylization and abstraction of photographs, in
Proceedings of the ACM SIGGRAPH conference, pp. 769776, 2002.
[21] F. Durand and J. Dorsey, Fast bilateral ltering for the display of high-
dynamic-range images, ACM Transactions on Graphics, vol. 21, no. 3, pp. 257
266, Proceedings of the ACM SIGGRAPH conference, 2002.
[22] E. Eisemann and F. Durand, Flash photography enhancement via intrinsic
relighting, ACM Transactions on Graphics, vol. 23, no. 3, pp. 673678, Pro-
ceedings of the ACM SIGGRAPH conference, July, 2004.
[23] M. Elad, On the bilateral lter and ways to improve it, IEEE Transactions
on Image Processing, vol. 11, no. 10, pp. 11411151, October 2002.
[24] M. Elad, Retinex by two bilateral lters, in Proceedings of the Scale-Space
conference, pp. 217229, 2005.
[25] R. Fattal, M. Agrawala, and S. Rusinkiewicz, Multiscale shape and detail
enhancement from multi-light image collections, ACM Transactions on Graph-
ics, vol. 26, no. 3, p. 51, Proceedings of the ACM SIGGRAPH conference, 2007.
[26] S. Fleishman, I. Drori, and D. Cohen-Or, Bilateral mesh denoising, ACM
Transactions on Graphics, vol. 22, no. 3, pp. 950953, Proceedings of the ACM
SIGGRAPH conference, July, 2003.
70 References
[27] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 6, no. 6, pp. 721741, 1984.
[28] G. Gimelfarb, Image Textures and Gibbs Random Fields. Kluwer Academic
Publishers, 1999. ISBN 0792359615.
[29] F. R. Hampel, E. M. Ronchetti, P. M. Rousseeuw, and W. A. Stahel, Robust
Statistics The Approach Based on Inuence Functions. Wiley Interscience,
1986. ISBN 0-471-73577-9.
[30] P. J. Huber, Robust Statistics. Probability and Statistics. Wiley-Interscience,
February 1981. ISBN: 9780471418054.
[31] L. Itti and C. Koch, Computational modeling of visual attention, Nature
Reviews Neuroscience, vol. 2, no. 3, pp. 194203, 2001.
[32] D. J. Jobson, Z. Rahman, G. A. Woodell, N. Center, and V. A. Hampton, A
multiscale Retinex for bridging the gap between color images and the human
observation of scenes, IEEE Transactions on Image Processing, vol. 6, no. 7,
pp. 965976, 1997.
[33] T. Jones, F. Durand, and M. Zwicker, Normal improvement for point ren-
dering, IEEE Computer Graphics & Applications, vol. 24, no. 4, pp. 5356,
2004.
[34] T. R. Jones, F. Durand, and M. Desbrun, Non-iterative, feature-preserving
mesh smoothing, ACM Transactions on Graphics, vol. 22, no. 3, Proceedings
of the ACM SIGGRAPH conference, July, 2003.
[35] E. A. Khan, E. Reinhard, R. Fleming, and H. Bueltho, Image-based material
editing, ACM Transactions on Graphics, vol. 25, no. 3, pp. 654663, Proceed-
ings of the ACM SIGGRAPH conference, 2006.
[36] J. J. Koenderink and A. J. Van Doorn, The structure of locally order-
less images, International Journal of Computer Vision, vol. 31, no. 23,
pp. 159168, 1999.
[37] J. Kopf, M. Uyttendaele, O. Deussen, and M. Cohen, Capturing and view-
ing gigapixel images, ACM Transactions on Graphics, vol. 26, no. 3, p. 93,
Proceedings of the ACM SIGGRAPH conference, 2007.
[38] E. H. Land and J. J. McCann, Lightness and Retinex theory, Journal of the
Optical Society of America, vol. 61, no. 1, pp. 111, 1971.
[39] H. Land, Edwin, The Retinex, American Scientist, vol. 52, pp. 247264, 1964.
[40] S. Li, Markov Random Field Modeling in Computer Vision. Springer-Verlag,
1995. ISBN 4-431-70145-1.
[41] C. Liu, W. T. Freeman, R. Szeliski, and S. Kang, Noise estimation from a
single image, in Proceedings of the Conference on IEEE Computer Vision and
Pattern Recognition, volume 1, pp. 901908, 2006.
[42] B. D. Lucas and T. Kanade, An iterative image registration technique with
an application to stereo vision, in Proceedings of the International Joint Con-
ference on Articial Intelligence, volume 81, pp. 674679, 1981.
[43] S. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1999. ISBN:
0-12-466606-X.
[44] Y. Meyer, Oscillating Patterns in Image Processing and Nonlinear Evolution
Equations, volume 22 of University Lecture Series. American Mathematical
Society, 2001.
References 71
[45] A. Miropolsky and A. Fischer, Reconstruction with 3D geometric bilateral
lter, in Proceedings of the ACM Symposium on Solid Modeling and Applica-
tions, pp. 225229, 2004.
[46] P. Mrazek, J. Weickert, and A. Bruhn, Geometric Properties from Incomplete
Data. On Robust Estimation and Smoothing with Spatial and Tonal Kernels.
Springer, 2006. ISBN: 978-1-4020-3857-0.
[47] D. A. Murio, The Mollication Method and the Numerical Solution of Ill-Posed
Problems. Wiley-Interscience, 1993. ISBN: 0471594083.
[48] B. M. Oh, M. Chen, J. Dorsey, and F. Durand, Image-based modeling
and photo editing, in Proceedings of the ACM SIGGRAPH Conference,
pp. 433442, 2001.
[49] S. Paris, H. Brice no, and F. Sillion, Capture of hair geometry from multiple
images, ACM Transactions on Graphics, vol. 23, no. 3, pp. 712719, Proceed-
ings of the ACM SIGGRAPH conference, 2004.
[50] S. Paris and F. Durand, A fast approximation of the bilateral lter using a
signal processing approach, International Journal of Computer Vision, vol. 81,
no. 1, pp. 2452, 2009.
[51] S. N. Pattanaik, J. A. Ferwerda, M. D. Fairchild, and D. P. Greenberg, A
multiscale model of adaptation and spatial vision for realistic image display,
in Proceedings of the ACM SIGGRAPH conference, pp. 287298, 1998.
[52] P. Perona and J. Malik, Scale-space and edge detection using anisotropic diu-
sion, IEEE Transactions Pattern Analysis Machine Intelligence, vol. 12, no. 7,
pp. 629639, July 1990.
[53] G. Petschnigg, M. Agrawala, H. Hoppe, R. Szeliski, M. Cohen, and K. Toyama,
Digital photography with ash and no-ash image pairs, ACM Transactions
on Graphics, vol. 23, no. 3, pp. 664672, Proceedings of the ACM SIGGRAPH
Conference, 2004.
[54] T. Q. Pham, Spatiotonal adaptivity in super-resolution of undersampled image
sequences, PhD thesis, Delft University of Technology, 2006.
[55] T. Q. Pham and L. J. van Vliet, Separable bilateral ltering for fast video
preprocessing, in Proceedings of the IEEE International Conference on Multi-
media and Expo, 2005.
[56] R. Ramanath and W. E. Snyder, Adaptive demosaicking, Journal of Elec-
tronic Imaging, vol. 12, no. 4, pp. 633642, 2003.
[57] P. Sand and S. Teller, Particle video: Long-range motion estimation using
point trajectories, International Journal of Computer Vision, vol. 80, no. 1,
pp. 7291, 2008.
[58] C. Schlick, Quantization techniques for visualization of high dynamic range
pictures, in Proceedings of the Eurographics Rendering Workshop, pp. 720,
1994.
[59] S. M. Smith and J. M. Brady, SUSAN A new approach to low level
image processing, International Journal of Computer Vision, vol. 23, no. 1,
pp. 4578, May 1997.
[60] N. Sochen, R. Kimmel, and A. M. Bruckstein, Diusions and confusions in
signal and image processing, Journal of Mathematical Imaging and Vision,
vol. 14, no. 3, pp. 237244, 2001.
72 References
[61] N. Sochen, R. Kimmel, and R. Malladi, A general framework for low level
vision, IEEE Transactions in Image Processing, vol. 7, pp. 310318, 1998.
[62] T. G. Stockham, Image processing in the context of a visual model, Proceed-
ings of the IEEE, vol. 60, no. 7, pp. 828842, 1972.
[63] C. Tomasi and R. Manduchi, Bilateral ltering for gray and color images,
in Proceedings of the IEEE International Conference on Computer Vision,
pp. 839846, 1998.
[64] J. Tumblin and G. Turk, Low curvature image simpliers (LCIS): A boundary
hierarchy for detail-preserving contrast reduction, in Proceedings of the ACM
SIGGRAPH Conference, pp. 8390, 1999.
[65] J. van de Weijer and R. van den Boomgaard, Local mode ltering, in Pro-
ceedings of the conference on IEEE Computer Vision and Pattern Recognition,
pp. 428433, 2001.
[66] J. van de Weijer and R. van den Boomgaard, On the equivalence of local-
mode nding, robust estimation and mean-shift analysis as used in early vision
tasks, in Proceedings of the International Conference on Pattern Recognition,
pp. 927930, 2002.
[67] L. Vese and S. Osher, Modeling textures with total variation minimization
and oscillating patterns in image processing, Journal of Scientic Computing,
Journal of Scientic Computing, vol. 19, pp. 553572, 2003.
[68] C. C. Wang, Bilateral recovering of sharp edges on feature-insensitive sampled
meshes, IEEE Transactions on Visualization and Computer Graphics, vol. 12,
no. 4, pp. 629639, 2006.
[69] L. Wang, L.-Y. Wei, K. Zhou, B. Guo, and H.-Y. Shum, High dynamic range
image hallucination, in Proceedings of the Eurographics Symposium on Ren-
dering, pp. 321326, 2007.
[70] G. S. Watson, Statistics on Spheres. John Wiley and Sons, 1983.
[71] B. Weiss, Fast median and bilateral ltering, ACM Transactions on Graphics,
vol. 25, no. 3, pp. 519526, Proceedings of the ACM SIGGRAPH conference,
2006.
[72] H. Winnemoller, S. C. Olsen, and B. Gooch, Real-time video abstraction,
ACM Transactions on Graphics, vol. 25, no. 3, pp. 12211226, Proceedings of
the ACM SIGGRAPH conference, 2006.
[73] W. C. K. Wong, A. C. S. Chung, and S. C. H. Yu, Trilateral ltering for
biomedical images, in Proceedings of the IEEE International Symposium on
Biomedical Imaging, pp. 820823, 2004.
[74] J. Xiao, H. Cheng, H. Sawhney, C. Rao, and M. Isnardi, Bilateral ltering-
based optical ow estimation with occlusion detection, in Proceedings of the
European Conference on Computer Vision, pp. 211224, 2006.
[75] Q. Yang, R. Yang, J. Davis, and D. Nister, Spatial-depth super resolution for
range images, in Proceedings of the conference on IEEE Computer Vision and
Pattern Recognition, pp. 18, 2007.
[76] Q. Yang, R. Yang, H. Stewenius, and D. Nister, Stereo matching with color-
weighted correlation, hierarchical belief propagation and occlusion handling,
in Proceedings of the Conference on IEEE Computer Vision and Pattern Recog-
nition, pp. 23472354, 2006.
References 73
[77] L. P. Yaroslavsky, Digital Picture Processing. An Introduction. Springer Verlag,
1985.
[78] K. Yoon and I. Kweon, Adaptive support-weight approach for correspondence
search, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 28, no. 4, pp. 650656, 2006.

Bilateral Filtering

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Bilateral Filtering

Загружено:

Авторское право:

Доступные форматы

Foundations and Trends

(x) denotes the 2D Gaussian kernel (see Figure 2.1):

([[p q[[), where is a parameter dening the neighborhood size.

sampling in the x space

Separable kernel (Section 4.2) O

Local histograms (Section 4.3) O

Layered approximation (Section 4.4) O

Bilateral grid (Section 4.5) O

. These graphs were adapted from Black et al. [11]; Durand

(t) + c(t). By choosing c(t) = g(

(t). We have h(t) ,= f(t) but the coecients in the

= 0) the signal is enhanced there; the

Вам также может понравиться