Академический Документы
Профессиональный Документы
Культура Документы
Abstract—Convolutional neural networks have been highly successful in image-based learning tasks due to their translation
equivariance property. Recent work has generalized the traditional convolutional layer of a convolutional neural network to non-Euclidean
spaces and shown group equivariance of the generalized convolution operation. In this paper, we present a novel higher order Volterra
convolutional neural network (VolterraNet) for data defined as samples of functions on Riemannian homogeneous spaces. Analagous to
the result for traditional convolutions, we prove that the Volterra functional convolutions are equivariant to the action of the isometry group
admitted by the Riemannian homogeneous spaces and, under some restrictions, any non-linear equivariant function can be expressed as
our homogeneous space Volterra convolution, generalizing the non-linear shift equivariant characterization of Volterra expansions in
Euclidean space. We also prove that second order functional convolution operations can be represented as cascaded convolutions which
leads to an efficient implementation. Beyond this, we propose a dilated VolterraNet model. These advances lead to large parameter
reductions relative to baseline non-Euclidean CNNs.
To demonstrate the efficacy of the VolterraNet performance, we present several real data experiments involving classification tasks on
spherical-MNIST, atomic energy, Shrec17 data sets, and group testing on diffusion MRI data. Performance comparisons to the
state-of-the-art are also presented.
Index Terms—Homogeneous spaces, Volterra Series, Convolutions, Geometric Deep Learning, Equivariance
By combining them with the spatial transformer [10], they nian homogeneous manifolds. 3) A novel generalization of
achieve the required equivariance to translations as well. convolution operations to higher order Volterra series on non-
Recently, the equivariance of convolutions to more general Euclidean domains specifically, Riemannian homogeneous
classes of group actions has been reported in literature [11], manifolds which are often encountered both in computer
and later in [12]. In [7], Esteves et al. used the correlation vision and medical imaging applications. 4) A generalization
defined in [13] to propose an SO(3) equivariant operator and of the classical non-linear shift invariance (in our terminology,
in turn define a spherical convolution. In this paper, we will equivariance) characterization theorem for Volterra convo-
define correlations and the Volterra series on a homogeneous lution operations on Riemannian homogeneous manifolds.
manifold and show that the equivariance property holds for 5) Experiments on real data sets that are publicly available
both. such as the spherical-MNIST, atomic energy and Shrec17. For
Volterra kernels were first proposed in image classifica- these real data, we present comparisons to the state-of-the-
tion literature in [14], [15]. In [15], authors learn the kernels in art methods. 6) An extension of the VolterraNet to Dilated
a data driven fashion and formulate the learning problem as a VolterraNet and demonstrate its efficiency via group testing
generalized eigenvalue problem. Volterra theory of nonlinear on diffusion MRI brain scans from controls (normal subjects)
systems was applied more than two decades ago to a single and movement disorder patients. Further, ablation studies
hidden layer feed-forward neural network with a linear on VolterraNet to demonstrate the usefulness of the higher
output layer and a fully dynamic recurrent network in [16]. order convolution operations.
Most recent use of Volterra kernels in deep networks was The rest of the paper is organized as follows: In section
reported in [17], where, authors presented a single layer of 3.3 we define the correlation operation on homogenous mani-
Volterra kernel based convolutions followed by conventional folds and prove a generalization of the Euclidean LSI theorem
CNN layers. They however did not explore equivariance for this correlation operation. Then, in section 3.4 we present
properties of the network or consider non-Euclidean input a framework for principled choice of basis in representing
domains. functions on a Riemmanian homogeneous manifold. In
In this paper, we define a Volterra kernel to replace section 4.1, we define the Volterra higher-order convolution
traditional convolutions kernels. We present a novel gen- operation and prove a generalization of the non-linear shift
eralization of the convolution group-equivariance property invariance theorem for this Volterra operation. Following this,
to higher order convolutions expressed using Volterra theory we present a detailed description of the proposed VolterraNet
of functional convolution on non-Euclidean domains, specif- architecture in 5 and a description of the proposed dilated
ically, the Riemannian homogenous spaces [18] referred to VolterraNet in 6. Finally, section 7 contains the experimental
earlier. Most of these manifolds are commonly encountered results and section 8 the conclusions.
in mathematical formulations of various computer vision
tasks such as action recognition, covariance tracking etc.,
and in medical imaging for example, in diffusion magnetic
resonance imaging (MRI), elastography etc. By generalizing
traditional CNNs in two possible ways, 1) to cope with 2 L IST OF N OTATIONS
data domains that are non-Euclidean and 2) to higher order
convolutions expressed using Volterra series, we expect to We now summarize the list of notations that will be used
extend the success of CNNs in yet unexplored ways. throughout this paper.
We begin with a significant extension of prior work by
M Riemannian homogeneous space (manifold)
the authors of [19], where the authors defined a correlation G a group
operation for homogeneous manifolds. Specifically, our SO(n) n-dimensional special orthogonal group
extension consists of a proof that not only is the correlation of matrices
operation group equivariant, but additionally any linear I(M) Isometry group admitted by M
group equivariant function can be written as a correlation L2 (M, R) Space of real-valued square integrable
on the manifold ( [19] only showed the first fact). We functions on M
present experiments to demonstrate better performance L2 (G, R) Space of real-valued square integrable
of the proposed VolterraNet on spherical-MINST and the functions on G
ωM Volume form of M
Shrec17 data with less number of parameters than previously
µG Haar measure of G
shown in literature for the Spherical-CNN and the Clebsch- g · x/Lg (x) Action of g ∈ G on x ∈ M
Gordan net. We then present a dilated convolution model gh/Lg (h) Action of g ∈ G on h ∈ G
based on the VolterraNet and demonstrate its efficacy in g.f /L∗g−1 (f ) Action of g ∈ G on f : M → R
group testing on diffusion magnetic resonance data acquired and is given by x 7→ f (g −1 .x)
from patients with movement disorders. The domain of Sn n-sphere
this data is another example of a Riemannian homogeneous R+ Space of positive reals
space. P3 Space of 3 × 3 symmetric
In summary, our key contributions in this paper are: positive-definite matrices
1) A principled method for choice of basis in designing a GL(n) General linear group of n × n matrices
O(n) Space of n × n orthogonal matrices
deep network architecture on a Riemannian homogeneous
R \ {0} Space of reals without the origin
manifold M. 2) A proof of a generalization of the classical Stab(x) Stabilizer of an element x ∈ M
linear shift invariance (in our terminology, equivariance) char- gM Riemannian metric on the manifold M
acterization theorem for correlation operations on Rieman- logm Matrix log operation
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XXX, NO. XXX, MONTH YEAR 3
n
3 C ORRELATION ON R IEMANNIAN HOMOGENEOUS Theorem 1. Let w : R → R be a weight kernel, then the
SPACES operator given as Gw : U → V is defined by Gw (f ) = f ? w,
where ? is the Euclidean convolution operation which is a bounded,
In this section we define a correlation operation which
linear, and translation equivariant operator. Further, if F is any
generalizes the Euclidean convolution layer to arbitrary
bounded linear translation equivariant operator, then there exists
Riemannian homogeneous spaces. Further, we prove a
w : Rn → R such that F = Gw , i.e. F (f ) = f ? w, for all
generalization to Riemannian homogeneous spaces of the
f ∈ U.
linear shift-invariant system (LSI) characterization of Eu-
clidean convolutions. Similar theorems were first proved Thus, Euclidean convolutions with a weight kernel have
in [11] and later in [12]. This result is not meant to be an interesting and powerful characterization as linear shift
novel but to motivate the analogous result for higher order invariant operators. Next we show that the correlation oper-
convolutions on Riemannian homogeneous spaces that we ation on any Riemannian homogeneous manifolds satisfy a
prove subsequently. generalization of the aforementioned LSI theorem.
2 2
We are now ready to state the following theorems: 3.4.1 Basis on L (M, R) induced from L (G, R)
Theorem 2. Let S and U be L2 (M, R) or L2 (G, R) and To induce a basis on L2 (M, R), we use the principal fiber
F : S → U be a function given by f 7→ (f ? w). Then F is bundle structure of the homogeneous manifold M. A fiber
equivariant with respect to the pullback action of G, i.e. bundle is a space that locally looks like a product space. It
is expressed as a base space B with fibers making up a fiber
(φ · f ) ? w = φ · (f ? w) space F , and their union being the total space denoted by
E . There is a projection map π : E → B mapping fibers to
for φ ∈ G a symmetry of M where
their "base point" on B . A principal fiber G-bundle is a fiber
φ · h := h ◦ φ−1 bundle with a continuous (right) action of a group G, such
that the action of G is free, transitive and preserves the fibers.
for any h : M → R square-integrable. For more details on fiber bundle theory see [22].
As mentioned in 3.1, M can be identified with G/H for
Proof. See appendix B. G the group action on M and H the stabilizer of a point
x ∈ M, usually called the "origin". It is well known that this
This constitutes the forward direction of the LGE theorem.
identification induces a principal fiber G-bundle structure
Now we show the converse statement, namely, every linear
on M via the projection map.
group equivariant function is a correlation.
Proposition 1. [18] The homogeneous space, M identified as
Theorem 3. Let S and U be L2 (M, R) or L2 (G, R) and F :
G/H together with the projection map π : G → G/H is a
S → U be a linear equivariant function with respect to the
principal bundle with H as the fiber. Furthermore there exists a
pullback action of I(M). Then, ∃w ∈ S such that, (F (f )) (g) =
diffeomorphism ψ : G/H → M given by gH 7→ g.o, where o is
(f ? w) (g), for all f ∈ S and g ∈ G.
the “origin” of M.
Proof. See appendix B. Moreover, a section is
fibers
Together with these two theorems, we can generalize the a continuous right inverse
≃H
LSI theorem to homogeneous spaces using the correlation of π , which is denoted by
zero section
defined in Def. 2 . σ : B → E . In literature E=G
S0
[18], a zero section (denoted
A NOTE ON VOLUME FORMS / MEASURES by S ⊂ G) is the section
σ0
containing the identity ele- π
In Def. 2, we specify the Haar measure for integration of ment of H . Let σ0 : S → M B=M
a function on G. If f and w are functions on G, then the be a diffeomorphism. Given
Haar measure µG has several desirable properties. For {vα : G → R} be the set of Fig. 1. Fiber bundle (B, E, π).
example, the Haar measure is invariant to “translations”, basis of L2 (G,R). Then, we can get the induced basis on
i.e. if S ⊂ G is measurable then µG (S) = µG (gS) for L2 (M, R) as veα = vα ◦ σ0−1 . A schematic of an example
any g ∈ G. Further, using the Haar measure provides a fiber bundle is shown in Fig. 1.
convolution theorem which makes correlation a simple Example: Consider the example of M = S2 ,
multiplication under the generalized Fourier transform where G = SO(3), H = SO(2). A choice of ba-
for groups. This particular property is vital for efficient sis on L2 (G, R) is Wigner D-functions
n o denoted by
j
implementations. Dl,m |j ∈ {0, 1, · · · , ∞}, −j ≤ l, m ≤ j . Let (α, β, γ) be
Note that on the other hand, we do not specify a specific the parametrization of SO(3). n The zero section (S ) be
volume form ω M for integration of function on M in
o
j
denoted by {α, β, 0} and then Dl,0 are the choice of
definition 2. In many cases, the Haar measure on G will
basis on S ⊂ q G, which gives the induced basis on S2
induce a G-invariant volume form on M ' G/H , but
as D e j (θ, φ) = 2l+1 j
stating the exact conditions for this to be possible requires n ol 4π Dl,0 (φ, θ, 0). Moreover, observe that
some work. Instead, we define the correlation using an j
Dl are the spherical harmonics basis.
e
arbitrary volume form.
3.4 Basis functions for L2 -functions on homogeneous 4 H IGHER ORDER CORRELATION ON R IEMANNIAN
spaces HOMOGENEOUS SPACES
Our goal in this section is to induce a natural basis on In this section, we define a higher order correlation operator
L2 (M, R) from the canonical basis on L2 (G, R) where on Riemannian homogeneous spaces using the Volterra
G is the group acting on the homogeneous manifold M. series, state a theorem demonstrating it’s symmetry equiv-
The basis on G consists of matrix elements of irreducible ariance and show how to compute it efficiently using first
unitary representations, which provides a Fourier transform order correlation operations. Further, we prove that the set
on G (for more details reader is referred to [21]). We show of functions which can be written as sums of products of
that this construction matches the commonly used basis for linear operators and are G-equivariant can be expressed as
specific manifolds, e.g. the spherical harmonics and Wigner- a Volterra series. This partially generalizes the non-linear
D functions in [5]. This construction can be used to induce shift equivariance characterization of Volterra expansions in
basis on arbitrary Riemannian homogeneous spaces. Euclidean space.
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XXX, NO. XXX, MONTH YEAR 5
G → R is defined as,
Z Z Fig. 2. Visualization (in the spirit of [17]) of second-order term w2 :
M2 → R of a Volterra kernel (here on a 2-manifold parametrized by
(f ?n wn ) (g) := ··· f (h1 ) · · · f (hn ) (g.wn ) (θ, φ)). The coordinates above each grid represent the first entry w2 (x, ·),
G G
and within each grid the gray-scale value represents the weight of the
(h1 , · · · , hn )µG (h1 ) · · · µG (hn ) associated kernel w2 (x, y).
Theorem 4. Let S and U be L2 (M,PR) or L2 (G, R) and F : These results partially generalize the well know non-
S → U be a function given by f 7→ ∞ linear shift equivariance characterization of Volterra expan-
n=1 (f ?n wn ). Then, F
is equivariant. sions in Euclidean space and justifies the use of the Volterra
series as a higher-order generalization of the correlation
Proof. Observe that the sum of equivariant operators is operation Definition 2.
equivariant. Hence, we only need to check that f ?n wn
is equivariant for all n. Let g, h ∈ G, let n ∈ N. Then,
4.2 Efficient Computation of the Second-Order Volterra
(g.f ?n wn ) (h) = L∗g−1 f ?n wn (h) Kernel
Z Z The Volterra series presented in the previous definition is
= ··· L∗g−1 f (x1 ) · · · L∗g−1 f (xn ) significantly more expressive than the correlation operation
M M
defined in Definition 2 since it captures higher order rela-
(L∗h−1 wn ) (x1 , · · · , xn )ω M (x1 ) · · · ω M (xn ) tionships between inputs, but it requires the computation of
Z Z
= ··· f (y1 ) · · · f (yn )wn (h−1 g).y1 iterated integrals and does not have an efficient GPU imple-
M M mentation. Note that for separable second order kernel w2 ,
, · · · , (h−1 g).yn ω M (g.y1 ) · · · ω M (g.yn )
(f ?2 w2 ) (g) can be factored as ((f ? w̃2 ) (g)) ((f ? w̄2 ) (g)).
Z Z Thus, we can compute the second order Volterra series
= ··· f (y1 ) · · · f (yn )wn (h−1 g).y1 with separable kernel as a product of traditional correlation
M M
operations. In general, we can use a convex combination of
, · · · , (h−1 g).yn ω M (y1 ) · · · ω M (yn )
first order and second order terms of the Volterra series to
= (f ?n wn ) (g −1 h) define second order Volterra network.
= L∗g−1 (f ?n wn ) (h) A schematic diagram for the second order Volterra
= (g. (f ?n wn )) (h) correlation operator is shown in Fig. 3. This representation
of a second order kernel using product of two separable
Here, (L∗g f )(h) = f (gh) (see appendix for details), since, kernels is analogous to tensor product approximation of a
g, h ∈ G and n are arbitrary F is equivariant. function and can be shown to achieve approximation error
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XXX, NO. XXX, MONTH YEAR 6
w × f ⋆ w1 + (1 − w) × f ⋆2 w2
w̄2 Since, if the input signal is transformed by a group element
g ∈ G, so is the output of CorrM 2 as this layer is equivariant.
w̃2 Thus the output of CorrM 2 is transformed by the same group
f ⋆ 2 w2
element g . Hence, the input of CorrG 2 is transformed by g
w1 w and due to the equivariance so is the output of CorrG 2 . This
f f ⋆ w1
justifies that the cascaded layers are equivariant to the action
of G. Hence, after the cascaded correlation layers, the output
Fig. 3. Second order Volterra correlation operator with first order kernel fe ∈ L2 (G, R) lies on a G set. Similar to the Euclidean CNN,
w1 and separable second order kernel w2 . we want the last layer to be G invariant. Hence, we will
integrate fe on the domain G and return a scalar. Note that
of an arbitrary precision [23]. The separability assumption on
in the experiment, we learn multiple channels analogous to
the kernels leads to efficient computation which is especially
the Euclidean CNN, where in each channel, we learn a G
valuable in the network setting where these operations are
equivariant fe ∈ L2 (G, R). Thus, after the integration, we
performed numerous times.
have c scalars, where c is the number of channels, which
will be input to the softmax fully connected layer similar
5 A RCHITECTURE to the Euclidean CNN. We abbreviate this last layer as iL
We now present the basic modules for implementing our (invariant layer).
correlation and higher-order Volterra operations as layers in A schematic diagram of our proposed VolterraNet is
a deep network. shown in Fig. 4.
M 2 w1
Correlation on M - Corr (f, w): Let f ∈ L (M, R) f f ? w1 w
w × f ? w1 + (1 w) × f ? 2 w2
r
st laye
G → R. We have shown in Theorem 2, that CorrM (f, w) is
ReLU layer
ant la
equivariant to action of G. Hence, we can use CorrG layer as f !R
Invari
f ? w11 ff ? w1k
n 2
volution function (x ?d w) : N → R is defined as, homogeneous space distinct from S . This experiment serves
(x ?d w) (s) = k−1
P
i=0 w(i)x(s − d × i), where N is the set as an example demonstrating the ability of VolterraNet to
of natural numbers and k and d are the kernel size and the cope with manifolds other than the sphere.
dilation factor respectively. Note that with d = 1, we get the Choice of basis: In our experiments, we have three
normal convolution operator. In a dilated CNN, the receptive examples of manifolds, S2 , S2 × R+ and P3 . For SO(3),
field size will depend on the depth of the network as well as we use the Weigner basis and for S2 ' SO(3)/SO(2), we
on the choice of k and d. use the induced basis, i.e., the Spherical Harmonics basis. For
S2 × R+ , we use the product basis of each of the spaces, i.e.,
Spherical Harmonics for S2 and the canonical basis for R+ .
6.2 Dilated VolterraNet
Since P3 can be written as GL(3)/O(3) , we use the induced
Now we present a dilated VolterraNet model by combining basis on P3 which are induced from the canonical basis on
the VolterraNet with the dilated CNN model. Given a one- GL(3).
dimensional input sequence {fi : M → R}, we will first In all the experiments we compare the VolterraNet archi-
apply CorrM G
2 and cascaded Corr2 layers to each point in the tecture to the state of the art models, and additionally com-
sequence independently. The output of a CorrG 2 layer is a pare it to an architecture which we call Homogeneous CNN
function G → R. Let the output of the last CorrG 2 layer be (HCNN) which replaces the Volterra non-linear convolution-
{gi : G → R}. Then, we discretize the group G, to represent s/correlations with the correlation operations from definition
each gi by a vector gi (as shown in Fig. 5). The steps of 2. We have released an implementation of the VolterraNet
discretization, i.e., length of gi , are chosen via grid search in architecture with the Spherical MNIST experiment which can
the experimental section. This is analogous to the standard be found at, https://github.com/cvgmi/volterra-net.
practice in literature [5], [7]. Polar coordinates on G are used
to discretize G and then we use the dilated CNN by treating 7.1 Synthetic Data Experiment: Classification of data on
each sample as a vector. This essentially amounts to choosing P3
a uniform grid in the parameter space using Rodrigues In this section, we first describe the process of synthesizing
vectors [26], although more sophisticated techniques can functions f : P3 → [0, 1]. In this experiment, we generated
be employed in this context [27]. Now, we input {gi } to the data samples drawn from distinct Gaussian distributions
Euclidean dilated CNN (since the components of gi are real) defined on P3 [28]. Let X be a P3 valued random variable
to construct a dilated VolterraNet framework. In Fig. 5, we that follows N (M, σ), then, the p.d.f. of X is given by [28]:
present a schematic of dilated VolterraNet with input {fi } 2
followed by CorrM G G 1 d (M, X)
2 →ReLU → Corr2 → ReLU → Corr2 . fX (X; M, σ) = exp − , (4)
A self explanatory schematic diagram of the dilated C(σ) 2σ 2
VolterraNet architecture is shown in Fig. 5. where, d(, .) is the affinerinvariant geodesic distance on P3
2
CorrM
2
CorrG
2
CorrG
2
as given by d(M, X) = trace (logm (M −1 X)) .
f ⋆ wn1 f ⋆ wnk Dilated Convolution
R
ReLU layer
y0 y1 y2 y3 y4 y5 y6 y7 y8
Shared Weights
perturbations of Mi and with variance 1. This gives us two
f ⋆ wn1 f ⋆ wnk
clusters in the space of Gaussian densities on P3 , which we
ReLU layer
7.2 Spherical MNIST Data Experiment the origin is sent which collects the ray length, cosine and
This version of MNIST is generated using the scheme sine of the surface angle. Additionally, the convex hull of the
described in [5]. There are two instances of this data, one in 3D shape gives 3 more channels, which results in 6 input
which we project MNIST digits on the southern hemisphere channels. The spherical signal is discretized using Discoll-
(denoted by ‘NR’) and the other where we apply random Healy grid [13] grid with a bandwidth of 128.
rotation afterwards (denoted by ‘R’). The spherical signal is The Spherical CNN model [5] we used has the following
2
discretized using a bandwidth of 60. architecture (as was reported in [5]), CorrS → BN →
We selected the same baseline model as was chosen in [5], ReLU → CorrSO(3) → BN → ReLU → CorrSO(3) → BN
which is a Euclidean CNN with 5 × 5 filters and 32, 64, 10 → ReLU → FC with bandwidths 32, 22 and 7 and the
channels with a stride of 3 in each layer. This CNN is trained number of channels 50, 70 and 350 respectively. We used the
by mapping the digits from the northern hemisphere onto same architecture for Clebsch-Gordan net as was reported in
the plane. The Spherical CNN model [5] we used has the [8].
2
following architecture (as was reported in [5]), CorrS → In our method, we used a second order Volterra network
2
ReLU → CorrSO(3) → ReLU → FC with bandwidths 20, with the following architecture: CorrS → BN → ReLU →
SO(3)
12 and the number of channels 20, 40 respectively. We used Corr2 → BN → ReLU → iL → FC with bandwidths
the same architecture for Clebsch-Gordan net as was reported 10, 8, 8 respectively and number of features 60, 80, 100
in [8]. respectively. We chose a batch size of 100 and a learning
For our method, we used a second order Volterra network rate of 5 × 10−3 with ADAM optimization [29]. Table 3
2
with the following architecture: CorrS 2 → ReLU → iL summarizes comparison of VolterraNet with other existing
→ FC with bandwidth 30, 20 respectively and number of deep network architectures that reported results on this data
features 25, 10 respectively. We chose a batchsize of 32 and in literature. From this table, it is evident that VolterraNet
learning rate of 5 × 10−3 with ADAM optimization [29]. consistently yields classification accuracy results within
We performed three sets of experiments: non rotated the top three methods, while having the best parameter
training and test sets (denoted by ‘NR/NR’), non rotated efficiency.
training and randomly rotated test sets (denoted by ‘NR/R’) Comparison with Esteves et al. [7]: We also compared
and randomly rotated both training and test sets (denoted our VolterraNet with recent work of Esteves et al. [7] using
by ‘R/R’). The comparative results in terms of classification an extra in-batch triplet loss [34] (as used in Esteves et al. [7]).
accuracy are shown in Table 2. We show the comparison results in Table 3 (last two rows),
which clearly shows that, (a) The VolterraNet outperforms
Method NR/NR NR/R R/R # params. the network in [7] (which is the state-of-the-art algorithm in
Baseline CNN 97.67 22.18 12.00 68000 terms of parameter efficiency). (b) The triplet loss boosts the
Spherical CNN [5] 95.59 94.62 93.40 58550
Clebsch-Gordan net [8] 96.00 95.86 95.80 342086
performance of VolterraNet relative to the baseline loss of
VolterraNet 96.72 96.10 96.71 46010 cross entropy.
TABLE 2
Comparison of classification accuracy on Spherical MNIST data 7.4 Regression Experiment: Prediction of Atomic En-
We can see that the VolterraNet performed better than ergy
all the three competing networks for both the ‘R/R’ and Here, we report the application of our VolterraNet to the
’NR/R’ cases. Note that in terms of number of parameters, QM7 dataset [35], [36], where the goal is to regress over
VolterraNet used 46010, while Spherical CNN used 58550 atomization energies of molecules given atomic positions
and Clebsch-Gordan net used 342086. The baseline CNN (pi ) and charges (zi ). Each molecule consists of at most
used 68000 parameters. Thus in comparison, we have approx- 23 atoms and the molecules are of 5 types (C, N, O, S, H).
imately an 86% reduction in parameters over the Clebsch- We use the Coulomb Matrix (CM) representation proposed
Gordan net with almost equal or better classification accuracy. by [36], which is rotation and translation invariant but not
In comparison to the Spherical CNN, we have approximately permutation invariant. We used a similar experimental setup
a 21% reduction in the parameters over the Spherical to that described in [5] for this regression problem. We define
CNN while achieving significantly better performance. This a sphere Si around pi for each ith atom. We define the
clearly depicts the usefulness of our proposed VolterraNet potential functions
in comparison to existing networks used in processing this
type of data in a non-Euclidean domain.
X zit z
Uz (x) = , (5)
i6=j,z =z
kx − pi k
j
7.3 3D Shape Recognition Experiment for every z and for every x on the sphere Si . This yields
We now report results for shape classification using the a spherical signal consisting of 5 features which were dis-
Shrec17 dataset [30] which consists of 51300 3D mod- cretized using the Discoll-Healy grid [13] with a bandwidth
els spread over 55 classes. This dataset is divided into of 20. For the VolterraNet, we used one S2 and SO(3) second
a 70/10/20 split for train/validation/test. Following the order Volterra block with bandwidths 12, 8, 8, 4 and number
method in [5], we perturbed the dataset using random of features 8, 10, 20, 50 respectively.
rotations. We processed the dataset as in [5]. Basically, we We compute the loss and report it in Table 4. We can see
represented each 3D model by a spherical signal using a ray that VolterraNet performs better than the competing methods.
casting scheme. For each point on the sphere, a ray towards For Spherical CNN [5] and Clebsch-Gordan net [8], we used
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XXX, NO. XXX, MONTH YEAR 9
Method P@N R@N F1@N mAP NDCG # params.
Tasuma_ReVGG [31] 0.70 0.76 0.72 0.69 0.78 3M
Furuya_DLAN [32] 0.81 0.68 0.71 0.65 0.75 8.4M
SHREC16-bai_GIFT [33] 0.68 0.66 0.66 0.60 0.73 36M
Deng_CM-VGG5-6DB 0.41 0.70 0.47 0.52 0.62 -
Spherical CNN [5] 0.70 0.71 0.69 0.67 0.76 1.4Mil
Clebsch-Gordan net [8] 0.70 0.72 0.70 0.68 0.76 -
Ours (VolterraNet) 0.71 (2nd) 0.70 (3rd) 0.70 (3rd) 0.67 (3rd) 0.75 (3rd) 396297
[7] (w triplet loss) 0.72 0.74 0.69 - - 0.5M
Ours (w triplet loss) 0.73 0.74 0.70 0.68 0.76 396297
TABLE 3
Comparison results in terms of classification accuracy on the shrec17 data
Method MSE
f : S2 × R+ → R. Since S2 × R+ is a Riemannian homoge-
MLP/ Random CM [37] 5.96
LGIKA (RF) [38] 10.82
neous space (endowed with the product metric), we will use a
RBF Kernels/ Random CM [37] 11.42 cascade of the Volterra correlation layers defined earlier (with
RBF Kernels/ Sorted CM [37] 12. 59 standard non-linearity between layers) to extract features
MLP/ Sorted CM [37] 16.06 which are equivariant to the action of SO(3) × (R \ {0}).
Spherical CNN [5] 8.47
Clebsch-Gordan net [8] 7.97 These features are extracted independently within each voxel.
Ours (VolterraNet) 5.92 (1st) Observe that this equivariance property is natural in the
TABLE 4
context of dMRI data. Since in each voxel of the dMRI data,
Comparison results on atomic energy prediction the signal is acquired in different directions (in 3D), we want
the features to be equivariant to the 3D rotations and scaling.
similar architectures as described in the respective papers.
7.5.2 Extracting inter-voxel features
Spherical CNN [5] and Clebsch-Gordan net [8] use 1.4M
and 1.1M parameters respectively, while the VolterraNet After the extraction of the intra-voxel features (which are
used 128460, nearly an order of magnitude reduction of equivariant to the action of G), we seek to derive features
parameters, while achieving the best classification accuracy. based on the interactions between the voxels. Here we use
This illustrates the parameter efficiency gains that we get the standard dilated convolution (as described in 6.1) layers
from using a higher order correlation, a richer feature, in the to capture the interaction between features extracted from
VolterraNet. voxels.
Now, we are ready to give the details of the data used
for the experiment of our proposed Dilated-VolterraNet. For
7.5 Network architecture for dMRI data Using Dilated
this experiment, we used a second order Dilated-VolterraNet
VolterraNet
with 3 dilated layers of kernel size (5×5) and dilation factors
Diffusion MRI (dMRI) is an imaging modality that non- of 1, 2 and 4 respectively.
invasively measures the diffusion of water molecules in
tissue samples being imaged. It serves as an interesting 7.6 Dilated VolterraNet Experiment: Group testing on
example of our framework since dMRI data can naturally movement disorder patients
be described by functions on a Riemannian homogeneous
This dMRI data was collected from 50 PD patients and
space. In this section we describe the dMRI data and its
44 controls at the University of Florida and are accessi-
processing using the framework presented in this paper,
ble via request from the NIH-NINDS Parkinson’s Disease
which will help the reader understand the results of the
Biomarker Program portal https://pdbp.ninds.nih.gov/. All
following subsections.
images were collected using a 3.0 T MR scanner (Philips
In each voxel of a dMRI data set, the signal magnitude is
Achieva) and 32-channel quadrature volume head coil. The
represented by a real number along each gradient magnetic
parameters of the diffusion imaging acquisition sequence
field over a hemi-sphere of directions in 3D. Hence, in each
were as follows: gradient directions = 64, b-values = 0/1000
voxel, we have a function f : S2 × R+ → R. The proposed
s/mm2, repetition time =7748 ms, echo time = 86 ms, flip
network architecture has two components: intra-voxel layers
angle = 90◦ , field of view = 224 × 224 mm, matrix size
and inter-voxel layers. The intra-voxel layers extract features
= 112 × 112, number of contiguous axial slices = 60, slice
from each voxels, while the inter-voxel layers use dilated
thickness = 2 mm. Eddy current correction was applied to
convolution to capture the interaction between extracted
each data set by using standard motion correction techniques.
features. In our application in the next section we extract
a sequence of voxels lying along a nerve fiber bundle in We first extracted the sen-
the brain known to be affected in Parkinson disease. Hence sory motor area tracts called
we have a sequence of functions along the fiber bundle M1 fiber tracts (as shown in
{fi : S2 × R+ → R}, making the application of the dilated Fig.6) using the FSL software
VolterraNet in 4.1 appropriate. [40] from both the left (‘LM1’)
and right hemispheres (‘RM1’).
7.5.1 Extracting intra-voxel features We applied the Dilated-Volterra
We extract intra-voxel features independently from each to the raw signal measure-
voxel. As mentioned before, in each voxel we have a function ments along the fiber tracts. Our
With a slight abuse of terminology, we will use the term, A PPENDIX B: P ROOFS
homogeneous space, to denote a Riemannian homogeneous
In all the propositions below suppose S and U are L2 (M, R)
space i.e., that the homogeneous space is endowed with a
or L2 (G, R). Explicitly, F : L2 (M, R) → L2 (G, R) or F :
Riemannian metric.
L2 (G, R) → L2 (G, R).
Properties of Homogeneous spaces: Let (M, g M ) be a
Homogeneous space. Let ω M be the corresponding volume Theorem 2. xLet F : S → U be a function given by f 7→
density and f : M → R be any integrable function. Let (f ? w). Then F is equivariant with respect to the pullback action
g ∈ G, s.t. y = g.x, x, y ∈ M. Then, the following facts are of I(M), i.e. (φ·f )?w = φ·(f ?w) for φ ∈ I(M) a symmetry of
true: M where, φ·h := h◦φ−1 , for any h : M → R square-integrable.
1) gxM (dx, dx) = gyM (dy, dy). Proof.
Z z) = d(y, g.z),Zfor all z ∈ M.
2) d(x, Z
3) f (y)ω M (x) = f (x)ω M (x) ((φ · f ) ? w)(g) = (f ◦ φ−1 )(x)(w ◦ g −1 )(x)ω M (x)
M M ZM
In property-1 given above, dx and dy denote the in-
= f (x)w(x ◦ g −1 ◦ φ)(x)ω M (x)
finitesimal displacement away from x and y respectively. M
Because of property-1, the volume density is also preserved = (f ? w)(φ−1 ◦ g)
by the transformation x 7→ g.x, for all x ∈ M and g ∈ G.
= (φ · (f ? w))(g)
Now, we will present some definitions and propositions
needed to define the higher order correlation operator on a
homogeneous space.
Theorem 3. Let F : S → U be a linear equivariant function.
Definition (Spherical Harmonics). A spherical harmonics Then, ∃w ∈ S such that, (F (f )) (g) = (f ? w) (g), for all f ∈ S
function is indexed by two integers, l, m with −l ≤ m ≤ l and g ∈ G.
and is given by
RProof. Let f ∈ M S and g ∈ G. Let x ∈ M, then, f (x) =
s
l (2l + 1)(l − m)! l f (y)u (x)ω (y), where, uy : M → R is zero every-
Ym (θ, φ) = Pm (cos(θ) expjmφ M y
4π(l + m)! where except at x ∈ M when d(x, y) = 0. uy is analogous to
Where, Pm l
are the Legendre polynomials and (θ, φ) is the
the Rdelta function. Using R of F , we get, (F (f )) (g) =
linearity
F M f (y)uy ω M (y) (g) = M f (y)F (uy ) (g)ω M (y). Let
parametrization of S2 . A function f ∈ L2 (S2 , R) is band-limited us define,
l
with bandwidth B if fem = 0 for all l ≥ B , where fe is the
w g −1 .y := w
representation of f in Spherical harmonic basis. ey (g) := F (uy ) (g), (6)
Definition (Wigner D-functions). A Wigner D-function is for some w ∈ S and w ey : G → R as defined above. Clearly,
j
indexed by three integers, j, l, m and is denoted by Dl,m (α, β, γ). as F is linear, thus, F (uy +vy ) = w ey + x
ey , where, w
ey = F (uy )
The degree j ranges over the non-negative integers. For each j , and xey = F (vy ). Below, we state the left group action of w ey
the order indices l, m , satisfy the constraint −j ≤ l, m ≤ j . The is well-defined.
Wigner D-function is of the form Claim: L∗h−1 w
ey (g) = w ey h−1 g .
[8] R. Kondor, Z. Lin, and S. Trivedi, “Clebsch-gordan nets: a fully [34] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding
fourier space spherical convolutional neural network,” arXiv preprint for face recognition and clustering,” in Proceedings of the IEEE
arXiv:1806.09231, 2018. conference on computer vision and pattern recognition, 2015, pp.
[9] C. Esteves, C. Allen-Blanchette, X. Zhou, and K. Daniilidis, “Polar 815–823.
transformer networks,” arXiv preprint arXiv:1709.01889, 2017. [35] L. C. Blum and J.-L. Reymond, “970 million druglike small molecules
[10] M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer for virtual screening in the chemical universe database gdb-13,” Journal
networks,” in Advances in neural information processing systems of the American Chemical Society, vol. 131, no. 25, pp. 8732–8733,
(NIPS), 2015, pp. 2017–2025. 2009.
[11] R. Kondor and S. Trivedi, “On the generalization of equivariance and [36] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. Von Lilienfeld, “Fast
convolution in neural networks to the action of compact groups,” arXiv and accurate modeling of molecular atomization energies with machine
preprint arXiv:1802.03690, 2018. learning,” Physical review letters, vol. 108, no. 5, p. 058301, 2012.
[37] G. Montavon, K. Hansen, S. Fazli, M. Rupp, F. Biegler, A. Ziehe,
[12] T. Cohen, M. Geiger, and M. Weiler, “A general theory of equivariant
A. Tkatchenko, A. V. Lilienfeld, and K.-R. Müller, “Learning invariant
cnns on homogeneous spaces,” arXiv preprint arXiv:1811.02017, 2018.
representations of molecules for atomization energy prediction,” in
[13] J. R. Driscoll and D. M. Healy, “Computing fourier transforms and Advances in Neural Information Processing Systems, 2012, pp.
convolutions on the 2-sphere,” Advances in applied mathematics, 440–448.
vol. 15, no. 2, pp. 202–250, 1994. [38] A. Raj, A. Kumar, Y. Mroueh, P. T. Fletcher, and B. Schölkopf, “Local
[14] R. Kumar, A. Banerjee, and B. C. Vemuri, “Volterrafaces: Discriminant group invariant representations via orbit embeddings,” arXiv preprint
analysis using volterra kernels,” 2009. arXiv:1612.01988, 2016.
[15] R. Kumar, A. Banerjee, B. C. Vemuri, and H. Pfister, “Trainable [39] R. Chakraborty, C.-H. Yang, X. Zhen, M. Banerjee, D. Archer, D. Vail-
convolution filters and their application to face recognition,” IEEE lancourt, V. Singh, and B. C. Vemuri, “Statistical recurrent models on
transactions on pattern analysis and machine intelligence, vol. 34, manifold valued data,” ArXiv e-prints, 2018.
no. 7, pp. 1423–1436, 2012. [40] D. B. Archer, D. E. Vaillancourt, and S. A. Coombes, “A template and
[16] N. Hakim, J. Kaufman, G. Cerf, and H. Meadows, “Volterra characteriza- probabilistic atlas of the human sensorimotor tracts using diffusion mri,”
tion of neural networks,” in Signals, Systems and Computers, 1991. Cerebral Cortex, vol. 28, no. 5, pp. 1685–1699, 2017.
1991 Conference Record of the Twenty-Fifth Asilomar Conference [41] U. Triacca, “Measuring the distance between sets of ARMA models,”
on. IEEE, 1991, pp. 1128–1132. Econometrics, vol. 4, no. 3, p. 32, 2016.
[17] G. Zoumpourlis, A. Doumanoglou, N. Vretos, and P. Daras, “Non-linear Monami Banerjee received her Ph.D. in com-
convolution filters for cnn-based learning,” in Computer Vision (ICCV), puter science from the Univ. of Florida in 2018.
2017 IEEE International Conference on. IEEE, 2017, pp. 4771–4779. She is currently a research staff member at
[18] S. Helgason, Differential geometry and symmetric spaces. Academic Facebook Research, Menlo Park. Her research
press, 1962, vol. 12. interests lie at the intersection of Geometry, Com-
[19] M. Banerjee, R. Chakraborty, D. Archer, D. Vaillancourt, and B. C. puter Vision and Medical Image Anlaysis.
Vemuri, “Dmr-cnn: A cnn tailored for dmr scans with applications to
pd classification,” in 2019 IEEE 16th International Symposium on
Biomedical Imaging (ISBI 2019). IEEE, 2019, pp. 388–391.
[20] D. S. Dummit and R. M. Foote, Abstract algebra. Wiley Hoboken,
2004, vol. 3. Rudrasis Chakraborty received his Ph.D. in
[21] E. Hewitt and K. A. Ross, Abstract Harmonic Analysis: Volume computer science from the Univ. of Florida in
I Structure of Topological Groups Integration Theory Group 2018. He is currently a post doctoral researcher
Representations. Springer Science & Business Media, 2012, vol. at UC Berkeley. His research interests lie in
115. the intersection of Geometry, ML and Computer
[22] P. W. Michor, Topics in differential geometry. American Mathematical Vision.
Soc., 2008, vol. 93.
[23] W. Hackbusch and B. N. Khoromskij, “Tensor-product approximation to
operators and functions in high dimensions,” Journal of Complexity,
vol. 23, no. 4-6, pp. 697–714, 2007.
[24] J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14,
no. 2, pp. 179–211, 1990. Jose Bouza is a fourth year Mathematics and
[25] S. Bai, J. Z. Kolter, and V. Koltun, “Convolutional sequence modeling Computer Science undergraduate at the Univer-
revisited,” 2018. sity of Florida. His primary interests encompass
[26] W. R. Hamilton, Elements of quaternions. Longmans, Green, & computer vision and applied topology.
Company, 1866.
[27] G. Kurz, F. Pfaff, and U. D. Hanebeck, “Discretization of so (3) using
recursive tesseract subdivision,” in 2017 IEEE International Conference
on Multisensor Fusion and Integration for Intelligent Systems (MFI).
IEEE, 2017, pp. 49–55.
[28] G. Cheng and B. C. Vemuri, “A novel dynamic system in the space of spd
matrices with applications to appearance tracking,” SIAM journal on
imaging sciences, vol. 6, no. 1, pp. 592–615, 2013.
Baba C. Vemuri received his PhD in Electrical
[29] D. Kinga and J. B. Adam, “A method for stochastic optimization,” in and Computer Engineering from the University of
International Conference on Learning Representations (ICLR), vol. 5, Texas at Austin. Currently, he holds the Wilson
2015. and Marie Collins professorship in Engineering
[30] L. Yi, L. Shao, M. Savva, H. Huang, Y. Zhou, Q. Wang, B. Graham, at the University of Florida and is a full professor
M. Engelcke, R. Klokov, V. Lempitsky et al., “Large-scale 3d shape in the Department of Computer and Information
reconstruction and segmentation from shapenet core55,” arXiv preprint Sciences and Engineering. His research interests
arXiv:1710.06104, 2017. include Statistical Analysis of Manifold-valued
[31] A. Tatsuma and M. Aono, “Multi-fourier spectra descriptor and aug- Data, Medical Image Computing, Computer Vi-
mentation with spectral clustering for 3d shape retrieval,” The Visual sion and Machine Learning. He is a recipient of
Computer, vol. 25, no. 8, pp. 785–804, 2009. the IEEE Technical Achievement Award (2017)
[32] T. Furuya and R. Ohbuchi, “Deep aggregation of local 3d geometric and is a Fellow of the IEEE (2001) and the ACM (2009).
features for 3d model retrieval.” in BMVC, 2016, pp. 121–1.
[33] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas,
“Volumetric and multi-view cnns for object classification on 3d data,”
in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2016, pp. 5648–5656.