Вы находитесь на странице: 1из 44

Proofs subject to correction. Not to be reproduced without permission.

Contributions to the dis-


cussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor.
Sendcontributionstojournal@rss.org.uk.Seehttp://www.rss.org.uk/preprints
J. R. Statist. Soc. B (2017)
79, Part 5, pp. 144
1
2
3 Sparse graphs using exchangeable random
4
5 measures
6
7
8 Francois Caron
9 University of Oxford, UK
10
11 and Emily B. Fox
12 University of Washington, Seattle, USA
13
14 [Read before The Royal Statistical Society at a meeting organized by the Research
Section on Wednesday, May 10th, 2017 , Professor C. Leng in the Chair ]
15
16
Summary. Statistical network modelling has focused on representing the graph as a discrete
17 structure, namely the adjacency matrix. When assuming exchangeability of this arraywhich
18 can aid in modelling, computations and theoretical analysisthe AldousHoover theorem in-
19 forms us that the graph is necessarily either dense or empty. We instead consider representing
the graph as an exchangeable random measure and appeal to the Kallenberg representation
20 theorem for this object. We explore using completely random measures (CRMs) to define the
21 exchangeable random measure, and we show how our CRM construction enables us to achieve
22 sparse graphs while maintaining the attractive properties of exchangeability. We relate the spar-
sity of the graph to the Levy measure defining the CRM. For a specific choice of CRM, our
23 graphs can be tuned from dense to sparse on the basis of a single parameter. We present a
24 scalable Hamiltonian Monte Carlo algorithm for posterior inference, which we use to analyse
25 network properties in a range of real data sets, including networks with hundreds of thousands
of nodes and millions of edges.
26
27 Keywords: Exchangeability; Generalized gamma process; Levy measure; Point process;
Random graphs
28
29
30 1. Introduction
31 The rapid increase in the availability and importance of network data has been a driving force
32 behind the signicant recent attention on random-graph models. In devising such models, there
33 are several competing forces:
34
35 (a) exibility to capture network features like sparsity of connections between nodes, heavy-
36 tailed node degree distributions, dense spots or clusters, block structure;
37 (b) interpretability of the network model and associated parameters;
38 (c) theoretical tractability of analysis of network properties;
39 (d) computational tractability of inference with the ability to scale analyses to large collections
40 of nodes.
41
A plethora of network models have been proposed in recent decades, each with different trade-
42
offs made between exibility, interpretability and theoretical and computational tractability;
43
we refer the interested reader to overviews of such models provided by Newman (2003, 2010),
44
45 Address for correspondence: Francois Caron, Department of Statistics, University of Oxford, 2429 St Giles,
46 Oxford, OX1 3LB, UK.
E-mail: caron@stats.ox.ac.uk
47
48 2017 Royal Statistical Society 13697412/1/79000

RSSB 12233 Dispatch: 9.4.2017 No. of pages:44


2 F. Caron and E. B. Fox
1 Bollobas (2001), Durrett (2007), Goldenberg et al. (2010) and Fienberg (2012). In this paper,
2 our focus is on providing a new framework in which to make these trade-offs. We demonstrate
3 the ability to make gains in multiple directions using this framework through a specic example
4 where the goal is to capture
5
(i) sparsitytunable from sparse to dense via interpretable parameters,
6
(ii) heavy-tailed degree distributionsagain controlled via interpretable parametersand
7
(iii) computational tractability of Bayesian inference, scaling to networks with hundreds of
8
thousands of nodes and millions of edges.
9
10 Classically, the graph being modelled has been represented by a discrete structure, or adjacency
11 matrix, Z where Zij is a binary variable with Zij = 1 indicating an edge from node i to node j.
12 In the case of undirected graphs, we furthermore restrict Zij = Zji . Then, the statistical network
13 model is devised with this structure representing the observable quantity.
14 From a modelling, computational and theoretical standpoint, making an assumption of ex-
15 changeability is attractive. Under the adjacency matrix graph representation, such a statement
16 informally equates with an invariance in distribution to permutations of node orderings. One rea-
17 son why this assumption is attractive can be seen from applying the celebrated AldousHoover
18 theorem (Aldous, 1981; Hoover, 1979) to the adjacency matrix: innite exchangeability of the
19 binary matrix implies a mixture model representation involving transformations of uniform
20 random variables (see theorem 7 in Appendix A.1). For undirected graphs, this transformation
21 is specied by the graphon (Borgs et al. 2008; Lovasz, 2013), which was originally studied as
22 the limit object of dense graph sequences (Lovasz and Szegedy, 2006; Borgs et al. 2010). The
23 connection with the AldousHoover theorem was made by Diaconis and Janson (2008). The
24 graphon provides an object by which to study the theoretical properties of the statistical network
25 process and to devise new estimators, as has been studied extensively in recent years (Bickel and
26 Chen, 2009; Bickel et al., 2011; Rohe, et al., 2011; Zhao et al., 2012; Airoldi et al., 2013; Choi
27 and Wolfe, 2014). Furthermore, the mixture model is a cornerstone of Bayesian modelling and
28 provides a framework in which computational strategies are straightforwardly devised. Indeed,
29 the AldousHoover constructive denition has motivated new models (Lloyd et al., 2012) and
30 many popular existing models can be recast in this framework, including the stochastic block
31 model (Nowicki and Snijders, 2001; Airoldi et al., 2008) and latent space model (Hoff et al.,
32 2002).
33 One consequence of the AldousHoover theorem is that graphs that are represented by an
34 exchangeable random array are either empty or dense, i.e. the number of edges grows quadrat-
35 ically with the number of nodes n (see Lovasz (2013) and Orbanz and Roy (2015)). However,
36 empirical analyses suggest that many real world networks are sparse (Newman, 2010). Formally,
37 sparsity is an asymptotic property of a graph. Following Bollobas and Riordan (2009), we refer
38 to graphs with .n2 / edges as dense and graphs with o.n2 / edges as sparse (for notation, see
39 Appendix C). The conclusion appears to be that we cannot have both exchangeability, with
40 the associated benets described above, and sparse graphs. Although network models can of-
41 ten adapt parameters to t nite graphs, it is appealing to have a modelling framework with
42 theoretically provable properties that are consistent with observed network attributes.
43 There are a couple of approaches to handling this apparent issue. One is to give up on ex-
44 changeability to obtain sparse graphs, such as in the popular preferential attachment model
45 (Barabasi and Albert, 1999; Berger et al., 2014) or conguration model (Bollobas, 1980; New-
46 man, 2010). Indeed, the networks literature is dominated by sparse non-exchangeable models.
47 Alternatively, there is a body of literature that examines rescaling graph properties with network
48 size n, leading to sparse graph sequences where each graph is nitely exchangeable (Bollobas
Sparse Graphs 3
1
2
3
4
5
6
7
8
9
10
11
12
13 Fig. 1. Point process representation of a random graph: each node i is embedded in RC at some location
14 i and is associated with a sociability parameter wi ; an edge between nodes i and j is represented by a
15 point at locations .i , j / and .j , i / in R2C
16
17 et al., 2007; Bollobas and Riordan, 2009; Olhede and Wolfe, 2012; Wolfe and Olhede, 2013;
18 Borgs et al., 2014a,b). Convergence of sparse graph sequences, analogous to the study of limit-
19 ing objects for dense graph sequences, has also been studied (e.g. Borgs et al. (2016)). However,
20 any method building on a rescaling approach provides a graph distribution n that lacks pro-
21 jectivity: marginalizing node n does not yield n1 , the distribution on graphs of size n 1.
22 We instead propose to set aside the discrete structure of the adjacency matrix and examine
23 a different notion of exchangeability for a continuous space representation of networks. In
24 particular, we consider a point process on R2+ :
25 
26 Z = zij .i ,j / , .1/
i,j
27
28 where zij is 1 if there is a link between node i and node j, and is 0 otherwise, and i and j are in
29 R+ = [0, + / (Fig. 1). We can think of i as a time index for node i. Exchangeability, as dened
30 in Section 2, then equates with invariance to the time of arrival of the nodes. See Section 3.5 for
31 a further interpretation of i .
32 We note that exchangeability of the point process representation does not imply exchange-
33 ability of the associated adjacency matrix; however, the same modelling, computational and
34 theoretical advantages remain. Interestingly, we arrive at a direct analogue to the construc-
35 tive representation of the AldousHoover theorem for exchangeable arrays and the associated
36 graphon. Appealing to Kallenberg (1990, 2005), chapter 9 a point process on R2+ is exchange-
37 able if and only if it can be represented as a transformation of unit rate Poisson processes and
38 uniform random variables (see theorem 1 in Section 2).
39 As a case-study in how this exchangeable random-measure framework can enable statistical
40 network models with properties that are different from what can be achieved in the exchangeable
41 array framework, we consider the following specication. To induce node heterogeneity in the
42 link probabilities, we endow each node with a scalar sociability parameter wi > 0. We then
43 consider a straightforward link probability model. For any i = j,
44
Pr.zij = 1|wi , wj , i , j / = 1 exp.2wi wj /: .2/
45
46 This link function has been previously used by several others to build network models (Aldous,
47 1997; Norros and Reittu, 2006; Bollobas, et al., 2007; van der Hofstad, 2014). Note that, under
48 this specication, the time indices i and j of nodes i and j do not inuence the probability
4 F. Caron and E. B. Fox
1 of these two nodes to form a link. This is in contrast with, for example, standard latent space
2 models (Hoff et al., 2002). See Section 4 for further discussion.
3 To dene the set of .wi , i /i=1,2,::: underlying this statistical network model, we explore the
4 use of completely random measures (CRMs) (Kingman, 1967). The .wi /i=1,2,::: are the jumps of
5 the CRM and the .i /i=1,2,::: the locations of the atoms. We show that, by carefully choosing
6 the Levy measure characterizing this CRM, we can construct graphs ranging from sparse to
7 dense. In particular, any Levy measure yielding an innite activity CRM leads to sparse graphs;
8 alternatively, nite activity CRMs yield dense graphs. For the class of innite activity regularly
9 varying CRMs, we can sharpen the results to obtain graphs where the number of edges increases
10 at a rate below na , where 1 < a < 2 depending on the Levy measure. We focus on the exible
11 generalized gamma process CRM and show that one can tune the graph from dense to sparse
12 via a single parameter.
13 Building on the framework of CRMs leads to other desirable properties as well. One is
14 that our CRM-based exchangeable point process leads to an analytic representation for the
15 graphon analogue in the Kallenberg framework (see Section 5.1). Another is that, by drawing
16 on the considerable theory of CRMs that has been well studied in the Bayesian non-parametric
17 community, we can derive network simulation techniques and develop a principled statistical
18 estimation procedure. For the latter, in Section 7 we devise a scalable Hamiltonian Monte Carlo
19 (HMC) sampler that can automatically handle a range of graphs from dense to sparse. We
20 empirically show in Section 8 that our methods scale to graphs with hundreds of thousands of
21 nodes and millions of edges. Importantly, exchangeability of the random measure underlies the
22 efciency of the sampler.
23 In summary, the CRM-based formulation combined with the specic link model of equation
24 (2) serves as a proof of concept that moving to the point process representation of equation
25 (1) can yield models with desirable attributes that are different from what can be obtained by
26 using the discrete adjacency matrix representation. More generally, the notion of modelling the
27 graph as an exchangeable random measure and appealing to a Kallenberg representation for
28 such exchangeable random measures serves as an important new framework for devising and
29 studying random-graph models, just as the original graphon concept stimulated considerable
30 work in the network community in the past decade.
31 Our paper is organized as follows. In Section 2, we provide background on exchangeability
32 and CRMs. Our statistical network models for directed multigraphs, undirected graphs and
33 bipartite graphs are in Section 3. A discussion of our framework compared with related network
34 models is provided in Section 4. Properties, such as exchangeability and sparsity, and methods
35 for simulation are presented in Section 5. Specic cases of our formulation leading to dense and
36 sparse graphs are considered in Section 6, including an empirical analysis of network properties
37 of our proposed formulation. Our Markov chain Monte Carlo (MCMC) posterior computations
38 are in Section 7. Finally, Section 8 provides a simulation study and an analysis of a variety of
39 large, real world graphs.
40 The programs that were used to analyse the data can be obtained from
41
http://www.rss.org.uk/preprints
42
43
2. Background on exchangeability
44
45 Our focus is on exchangeable random structures that can represent networks. We rst briey
46 review exchangeability for random sequences, continuous time processes and discrete network
47 arrays. Thorough and accessible overviews of exchangeability of random structures have been
48 presented in the surveys of Aldous (1985) and Orbanz and Roy (2015). Here, we simply abstract
Sparse Graphs 5
1 Table 1. Overview of representation theorems
2
3 Discrete structure Continuous time or space
4
5 Exchangeability de Finetti (1931) Buhlmann (1960)
6 Joint or separate Aldous (1981) and Kallenberg (1990)
7 exchangeability Hoover (1979)
8
9
10 away the notions that are relevant to placing our network formulation in context, as summarized
11 in Table 1.
12 The classical representation theorem arising from a notion of exchangeability for discrete
13 sequences of random variables is due to de Finetti (1931). The theorem states that a sequence
14 Z1 , Z2 , : : : with Zi Z is exchangeable if and only if there is a random probability measure P on
15 Z with law such that the Zi are conditionally independently and identically distributed (IID)
16 given P, i.e. all exchangeable innite sequences can be represented as a mixture with mixing
17 measure . If examining continuous time processes instead of sequences, the representation that
18 is associated with exchangeable increments was given by Buhlmann (1960) (see also Freedman
19 (1996) in terms of mixing Levy processes.
20 The focus of our work, however, is on graph structures. For generic matrices Z in some space
21 Z, an (innite) exchangeable random array (Diaconis and Janson, 2008; Lauritzen, 2008) is one
22 such that
23 d
.Zij / = .Z.i/.j/ / for .i, j/ N2 .3/
24
25 for any permutations , of N (separate exchangeability), or for any permutation = of N
26 (joint exchangeability), where the notation =d stands for equal in distribution. A representa-
27 tion theorem for exchangeability of the classical discrete adjacency matrix Z arises by consid-
28 ering a special case of the AldousHoover theorem (Aldous, 1981; Hoover, 1979) to 2-arrays.
29 For undirected graphs where Z is a binary, symmetric adjacency matrix, the AldousHoover
30 representation can be expressed as the existence of a graphon. For completeness, the Aldous
31 Hoover theorem (specialized to 2-arrays under joint exchangeability) and further details on the
32 graphon are provided in Appendix A.1.
33 Throughout this paper, we instead consider representing a graph as a point process Z =
34 i,j zij .i ,j / with nodes i embedded in R+ , as in equation (1), and then examine notions of
35 exchangeability in this context. Paralleling result (3), the point process Z on R2+ is exchangeable
36 if and only if (Kallenberg (2005), chapter 9)
37 d
.Z.Ai Aj // = .Z.A.i/ A.j/ // for .i, j/ N2 , .4/
38
39 for any permutations , of N, with = in the jointly exchangeable case, and any intervals
40 Ai = [h.i 1/, hi] with i N and h > 0.
41 In words, result (4) states that the point process Z is exchangeable if, for any arbitrary regular
42 square grid on the plane, the associated innite array of increments (edge counts between nodes
43 in a square) is exchangeable (Fig. 2). This provides a notion of exchangeability akin to that of
44 the AldousHoover theorem, but fundamentally different as the array being considered here
45 is formed on the basis of an underlying continuous process. This array is not equivalent to an
46 adjacency matrix, regardless of how ne a grid is considered.
47 Kallenberg (1990) derived de-Finetti-style representation theorems for separately and jointly
48 exchangeable random measures on R2+ , which we present for the jointly exchangeable case in
6 F. Caron and E. B. Fox
1
2
3
4
5
6
7
8
9
10
11
12
(a) (b)
13
Fig. 2. Illustration of the notion of exchangeability for point processes on the plane: for any regular square
14 grid on the plane (a), the associated infinite array counting the number of points in each square (b) is
15 exchangeable in the sense of result (3)
16
17
18 theorem 1. In what follows denotes the Lebesgue measure on R+ , D the Lebesgue measure
19 on the diagonal D = {.s, t/ R2+ |s = t} and N2 = {{i, j}|.i, j/ N2 }. We also dene a U-array to
20 be an array of independent uniform random variables.
21
22 Theorem 1 (representation theorem for jointly exchangeable random measures on R2+ (Kallen-
23 berg (1990) and Kallenberg (2005), theorem 9.24)). A random measure on R2+ is jointly
24 exchangeable if and only if almost surely
25

26 = f.0 , i , j , {i, j } /i ,j + 0 D + 0 . /
27 i,j

28 + {g.0 , j , jk /j , jk + g  .0 , j , jk /jk ,j }
29 j,k

30 + {h.0 , j /.j / + h .0 , j /. j /}
31 j

32 + {l.0 , k /k , + l .0 , k / ,k } .5/
k k
33 k
34
35 for some measurable functions f : R4+ R+ , g, g  : R3+ R+ and h, h , l, l : R2+ R+ . Here,
36 .{i,j } / with {i, j} N2 is a U-array. {.j , j /} and {.ij , ij /} on R2+ and {.j , j , j /} on R3+
37 are independent, unit rate Poisson processes. Furthermore, 0 , 0 , 0  0 are an independent
38 set of random variables.
39
40 We can think of the i as random time indices, the sets {i } R+ and R+ {j } form-
41 ing Poisson processes of vertical and horizontal lines. The representation (1) is slightly more
42 involved than the representation theorem for exchangeable arrays (see Appendix A.1). The
43 rst component of is, however, similar to the representation for exchangeable arrays, the se-
44 quence of xed indices i = 1, 2, : : : and uniform random variables .Ui /i=1,2,::: in equation (46) in
45 Appendix A.1 being replaced by a unit rate Poisson process {.i , i /} on R2+ . We place our pro-
46 posed network model of Section 3 within this Kallenberg representation in Section 5.1, yielding
47 direct analogues to the classical graphon representation of graphs based on exchangeability of
48 the adjacency matrix.
Sparse Graphs 7
1 3. Proposed statistical network model
2
Recall that we represent an undirected graph using an atomic measure
3
4 

Z= zij .i ,j / ,
5 i=1 j=1
6
with the convention zij = zji {0, 1}. Here, zij = zji = 1 indicates an undirected edge between
7
nodes i and j . See Section 3.5 for the interpretation of i .
8
There are many options for dening a statistical model for the point process graph represen-
9
tation Z. We consider one in particular in this paper based on a specic choice of
10
11
(a) link probability model and
12
(b) a prior on the model parameters.
13
14 Expanding on the discussion of Section 1, we introduce a collection of per-node sociability
15 parameters w = {wi } and specify link probabilities via
16 
17 1 exp.2wi wj / i = j,
Pr.zij = 1|w/ = .6/
18 1 exp.wi2 / i = j:
19 As mentioned in Section 1, this link probability model is not new to the statistical networks
20 community and is a straightforward method for achieving node heterogeneity (see Aldous (1997)
21 and Norros and Reittu (2006)).
22
23
24 3.1. Defining node parameters by using completely random measures
25 The model parameters consist of a collection of node-specic sociability parameters wi > 0 and
26 continuous-valued node indices i R+ .
27 Our generative model jointly species .wi , i /i=1,2,::: by rst dening an atomic random mea-
28 sure
29 
30 W = wi i .7/
i=1
31
32 and then taking W to be distributed according to a homogeneous CRM (Kingman, 1967).
33 CRMs have been used extensively in the Bayesian non-parametric literature for proposing
34 exible classes of priors over functional space (Regazzini et al., 2003; Lijoi and Prunster, 2010).
35 We briey review a few important properties of CRMs that are relevant to our construction; the
36 reader can refer to Kingman (1993) or Daley and Vere-Jones (2008) for an exhaustive coverage.
37 A CRM W on R+ is a random measure such that, for all nite families of disjoint, bounded
38 measurable sets .A1 , : : : , An / of R+ , the random variables W.A1 /, : : : , W.An / are mutually inde-
39 pendent.
40 We shall focus here on CRMs with no deterministic component and stationary increments
41 (i.e. the distribution of W.[t, s]/ depends only on t s). In this case, the CRM takes the form (7),
42 with .wi , i /iN the points of a Poisson point process on .0, / R+ dened by a mean measure
43 .dw, d/ = .dw/.d/, where is the Lebesgue measure and is a Levy measure on .0, /.
44 We denote this by
45
W CRM., /: .8/
46
47  that W.[0, T ]/ < almost surely for any T < , whereas W.R+ / = almost surely if
Note
0 .dw/ > 0.
48
8 F. Caron and E. B. Fox
1 The jump part of the mean measurewhich characterizes the increments of Wis of
2 particular interest in our graph construction, as explored in Section 5. If satises the condition
3 
4 .dw/ = , .9/
5 0
6 then there will be an innite number of jumps in any interval [0, T ], and we refer to the CRM
7 as innite activity. Otherwise, the number of jumps will be nite almost surely. In our model,
8 the jumps correspond to potentially connected nodes, i.e. these nodes need not be connected to
9 any other node within a bounded interval and instead represent an upper bound on the set of
10 connected nodes. See Section 3.5 for further discussion.
11 In Section 6, we consider special cases including the (compound) Poisson process and gener-
12 alized gamma process (GGP) (Brix, 1999; Lijoi et al., 2007).
13
14
15 3.2. Directed multigraphs
16 Formally, our undirected graph model is viewed as a transformation of a directed integer-
17 weighted graph, or multigraph, as detailed in Section 3.3. We now specify this directed multi-
18 graph. Although our primary focus is on undirected network models, in some applications the
19 directed multigraph might actually be the direct quantity of interest. For example, in social
20 networks, interactions are often not only directed (person i messages person j), but also have
21 an associated count. Additionally, interactions might be typed (message, SMS, like, tag).
22 Our proposed framework could be directly extended to model such data.
23 Let V = .1 , 2 , : : :/ be a countably innite set of node indices with i R+ . We represent the
24 directed multigraph of interest with an atomic measure on R2+
25
26 


D= nij .i ,j / , .10/
27 i=1 j=1
28
29 where nij counts the number of directed edges from node i to node j, with time indices i and
30 j . See Fig. 3 for an illustration.
31 Given W as dened in expressions (7) and (8), D is simply generated from a Poisson process
32 with intensity given by the product measure W = W W on R2+ :
33
34 D|W PP.W/, .11/
35
36 i.e., informally, the individual counts nij are generated as Poisson.wi wj /. (We consider a gener-
37 alized denition of a Poisson process, where the mean measure is allowed to have atoms (Daley
38 and Vere-Jones (2003), section 2.4).) By construction, for any A, B R, we have W .A B/ =
39 W.A/W.B/. On any bounded interval A of R+ , W.A/ < , implying that W .A A/ has nite
40 mass.
41
42
43 3.3. Undirected graphs via transformations of directed graphs
44 We arrive at the undirected graph via a simple transformation of the directed graph: set zij =
45 zji = 1 if nij + nji > 0 and zij = zji = 0 otherwise, i.e. place an undirected edge between nodes i
46 and j if and only if there is at least one directed interaction between the nodes. In this denition
47 of an undirected graph, we allow self-edges. This could represent, for example, a person posting
48 a message on his or her own prole page. The resulting hierarchical model is
Sparse Graphs 9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 (a)
17
18
19
20
21
22
23
24
25
26 (b) (c)
27
Fig. 3. Example of (a) an atomic measure D as in equation (10) restricted to [0, 1]2 , (b) the corresponding
28 directed multigraph and (c) the corresponding undirected graph
29
30
31


32 W= wi i W CRM., /,



33 i=1




34 D= nij .i ,j / D|W PP.W W/, .12/
35 i=1 j=1




36

Z= min.nij + nji , 1/.i ,j / :
37 i=1 j=1
38
39 This process is depicted graphically in Fig. 4.
40 To see the equivalence between this formulation and that specied in equation (6), note
41 that, for i = j, Pr.zij = 1|w/ = Pr.nij + nji > 0|w/. By properties of the Poisson process, nij and
42 nji are independent random variables conditioned on W . The sum of two Poisson random
43 variables, each with rate wi wj , is again Poisson with rate 2wi wj . Result (6) arises from the fact
44 that Pr.nij + nji > 0|w/ = 1 Pr.nij + nji = 0|w/. Likewise, the i = j case arises by using a similar
45 reasoning for Pr.zii = 1|w/ = Pr.nii > 0|w/.
46 We note that our computational strategy of Section 7 relies on this interpretation of our model
47 for undirected graphs as a transformation of a directed multigraph. In particular, we introduce
48 the directed edge counts as latent variables and do inference over these counts.
10 F. Caron and E. B. Fox
1
2
3 6 4
4
3
5 4
6 2
7 2
1
8
9 0 0
10 1 1
1 1
11 0.5 0.5
0.5 0.5
12
0 0 0 0
13
14 (a) (b)
15
16
17 1
18 0.8
19 0.6
20
0.4
21
22 0.2
23 0
1
24 1
25 0.5
0.5
26 0 0
27 (c)
28
29 Fig. 4. Example of (a) the product measure WQ D W  W for CRM W , (b) a draw of the directed multi-
graph measure DjW  PP .W  W/ and (c) the corresponding undirected measure Z D 1 1
iD1 jD1 min.nij C
30 nji , 1/. , /
i j
31
32 3.4. Bipartite graphs
33 The above construction can also be extended to bipartite graphs. Let V = .1 , 2 , : : :/ and
34 V  = .1 , 2 , : : :/ be two countably innite sets of nodes with i , i R+ . We assume that only
35 connections between nodes of different sets are allowed.
36 We represent the directed bipartite multigraph of interest by using an atomic measure on R2+ :
37 


38 D= nij .i ,j / , .13/
39 i=1 j=1

40 where nij counts the number of directed edges from node i to node j . Similarly, the bipartite
41 graph is represented by an atomic measure
42



43 Z= zij .i ,j / :
44 i=1 j=1
45 Our bipartite graph formulation introduces two independent CRMs, W CRM., / and W 
46 CRM. , /, whose jumps correspond to sociability parameters for nodes in sets V and V 
47 respectively. The generative model for the bipartite graph mimics that of the non-bipartite
48 graph:
Sparse Graphs 11


1 W= wi i W CRM., /,



2 i=1



3 W = wj j W  CRM. , /,



4 j=1


.14/
D|W , W  PP.W W  /,
5
D= nij .i ,j /

6

i=1 j=1

7  

Z=

8 min.nij , 1/.i , j / :
i=1 j=1
9
10
Model (14) has been proposed by Caron (2012) in a slightly different formulation. In this paper,
11
we recast this model within our general framework, enabling new theoretical and practical
12
insights.
13
14
15 3.5. Interpretation of i
16 We can think of the positive, continuous valued node index i as representing the time at which
17 a potential node enters the network and has the opportunity to link with other existing nodes
18 j < i . We use the terminology potential node here to clarify that this node need not form any
19 observed connections with other nodes existing before time i . We emphasize that an observed
20 link between i and some other node k > i will eventually occur almost surely as time progresses.
21 This could represent, for example, signing on to a social networking service before your friends
22 do and only forming a link once they join. On the basis of our CRM specication, we have
23 almost surely an innite number of potential nodes as time goes to . For innite activity
24 CRMs, we have almost surely an innite set of potential nodes even at any nite time.
25 In Section 5, we examine properties of the network process across time, and we describe
26 methods for simulating networks at any nite time. There, our focus is on the observed link
27 process from this set of potential nodes. For example, sparsity is examined with respect to the
28 set of nodes with degree at least 1, not with respect to the set of potential nodes. Since we need
29 not think of i as a time index, but rather just a general construct of our formulation, we also
30 generically refer to i as the node location in the remainder of the paper.
31
32
4. Related work
33
34 There has been extensive work over recent years on exible Bayesian non-parametric models
35 for networks, allowing complex latent structures of unknown dimension to be uncovered from
36 real world networks (Kemp et al., 2006; Miller et al., 2009; Lloyd et al., 2012; Palla et al., 2012;
37 Herlau et al., 2014). However, as mentioned in the unifying overview of Orbanz and Roy (2015),
38 these methods all t in the AldousHoover framework and as such produce dense graphs.
39 Norros and Reittu (2006) proposed a conditionally Poissonian multigraph process with sim-
40 ilarities to be drawn to our multigraph process. In their formulation, each node has a given
41 sociability parameter and the number of edges between two nodes i and j is drawn from a
42 Poisson distribution with rate the product of the sociability parameters, normalized by the sum
43 of the sociability parameters of all the nodes. The normalization makes this model similar to
44 models based on rescaling of the graphon and, as such, does not dene a projective model, as
45 explained in Section 1. See van der Hofstad (2014) for a review of this model and Britton et al.,
46 (2006) for a similar model.
47 As pointed out by Jacobs and Clauset (2014) in their discussion of an earlier version of
48 this paper, another related model is the degree-corrected random-graph model (Karrer and
12 F. Caron and E. B. Fox
1 Newman, 2011), where edges of the multigraph are drawn from a Poisson distribution whose
2 rate is the product of node-specic sociability parameters and a parameter tuning the interaction
3 between the latent communities to which these nodes belong. When the sociability parameters
4 are assumed to be IID from some distribution, this model yields an exchangeable adjacency
5 matrix and thus a dense graph.
6 Additionally, there are similarities to be drawn with the extensive literature on latent space
7 modelling (e.g. Hoff et al. (2002), Penrose (2003) and Hoff (2009)). In such models, nodes
8 are embedded in a low dimensional, continuous latent space and the probability of an edge is
9 determined by a distance or similarity metric of the node-specic latent factors. In our case, the
10 continuous node index i is of no importance in forming edge probabilities. It would, however,
11 be possible to extend our approach to time- or location-dependent connections by considering
12 inhomogenous CRMs.
13 Finally, as we shall detail in Section 5.5, our model admits a construction with connections
14 to the conguration model (Bollobas, 1980; Newman, 2010), which is a popular model for
15 generating simple graphs with a given degree sequence.
16 The connections with this broad set of past work place our proposed network model within
17 the context of existing literature. Importantly, however, to the best of our knowledge this work
18 represents the rst fully generative and projective approach to sparse graph modelling (see Sec-
19 tion 5), and with a notion of exchangeability that is essential for devising our scalable statistical
20 estimation procedure, as shown in Section 7.
21
22
23 5. General properties and simulation
24 We provide general properties of our network model depending on the properties of the Levy
25 measure .
26
27
28 5.1. Exchangeability under the Kallenberg framework
29 Proposition 1 (joint exchangeability of undirected graph measure). For any CRM W CRM
30 ., /, the point process Z dened by equation (12), or equivalently by equation (6), is jointly
31 exchangeable.
32
33 The proof is given in Appendix B. In the adjacency matrix representation, we think of ex-
34 changeability as invariance to node orderings. Here, we have invariance to the time of arrival of
35 the nodes, thinking of i as a time index.
36 We now reformulate our network process in the Kallenberg representation (5). Because of
37 exchangeability, we know that such a representation exists. What we show here is that our CRM-
38 based formulation has an analytic and interpretable representation. In particular, the CRM W
39 can be constructed from a two-dimensional unit rate Poisson process on R2+ by using the inverse
40 Levy method (Khintchine, 1937; Ferguson and Klass, 1972). Let .i , i / be a unit rate Poisson
41 process on R2+ . Let .x/ be the tail Levy intensity
42 
43 .x/ = .dw/: .15/
44 x

45 Then the CRM W = i wi i with Levy measure .dw/d can be constructed from the bidi-
46 mensional point process by taking wi = 1 .i /. Note that the inverse Levy intensity 1 is a
47 monotone function. It follows that our undirected graph model can be formulated under rep-
48 resentation (5) by selecting any 0 , 0 = 0 = 0, g = g  = 0, h = h = l = l = 0 and
Sparse Graphs 13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 (a) (b)
18 Fig. 5. Illustration of the model construction based on the Kallenberg representation: (a) a unit rate Poisson
19 process .i , i /, i 2 N, on [0, ]  RC ; (b) for each pair {i, j } 2 N2 , set zij D zji D 1 with probability M.i , j /
20 (here, M is indicated by the blue shading (darker shading indicates higher value) for a stable process (GGP
with D 0); in this case there is an analytic expression for N1 and therefore M)
21
22 
1 {i,j }  M.i , j /,
23 f.0 , i , j , {i,j } / = .16/
0 otherwise
24
25 where M : R2+ [0, 1] is dened by
26 
1 exp{2 1 .i / 1 .j /} if i = j ,
27 M.i , j / =
28 1 exp{1 .i /2 } if i = j :
29 In Section 6, we provide explicit forms for depending on our choice of Levy measure .
30 Expression (16) represents a direct analogue to that arising from the AldousHoover framework.
31 In particular, M here is akin to the graphon of expression (47) in Appendix A.1, and thus
32 allows us to connect our CRM-based formulation with the extensive literature on graphons.
33 An illustration of the network construction from the Kallenberg representation, including the
34 function M, is in Fig. 5. Note that, if we had started from the Kallenberg representation and
35 selected an f (or M) arbitrarily, we would probably not have obtained a network model with
36 the normalized CRM interpretation that enables both interpretability and analysis of network
37 properties.
38 For the bipartite graph, Kallenbergs representation theorem for separate exchangeability
39 (Kallenberg (1990) and Kallenberg (2005), theorem 9.23) can likewise be applied.
40
41
42 5.2. Interactions between groups
43 For any disjoint set of nodes A, B R+ , A B = , the probability that there is at least one
44 connection between a node in A and a node in B is given by
45
Pr{Z.A B/ > 0|W } = 1 exp{2W.A/ W.B/},
46
47 i.e. the probability of a between-group edge depends on the sum of the sociabilities in each
48 group, W.A/ and W.B/.
14 F. Caron and E. B. Fox
1 5.3. Graph restrictions
2 Let us consider the restriction of our process to the square [0, ]2 . For nite activity CRMs,
3 there will be a nite number of potential nodes (jumps) in the interval [0, ]. For innite activity
4 CRMs, we shall have an innite number of potential nodes. We are interested in the properties of
5 the process as grows, where we can think of as representing time and observing the process
6 as new potential nodes and any resulting edges enter the network. We note that, in the limit of
7 , the number of edges approaches since W.R+ / = almost surely.
8 Let D and Z be the restrictions of D and Z-respectively to the square [0, ]2 . Then, .D /0
9 and .Z /0 are measure-valued stochastic processes, indexed by . We also denote by W and
10 the corresponding CRM and Lebesgue measure on [0, ]. In what follows, our interests are
11 in studying how the following quantities vary with :
12
(a) N , the number of nodes with degree at least one in the network, and
13
(b) N.e/ , the number of edges in the undirected network.
14
15 We refer to N as the number of observed nodes. In our construction, recall that .N /0 and
16 .N.e/ /0 are non-decreasing, integer-valued stochastic processes corresponding to the number
17 of nodes with at least one connection in Z and the number of edges in Z respectively. Formally,
18
19 N = card.{i [0, ]|Z.{i } [0, ]/ > 0}/, .17/
20
21 N.e/ = Z[{.x, y/ R2+ |0  x  y  }]: .18/
22
23 The two processes have the same jump times, which correspond to the addition of one or
24 more new nodes with at least one connection in the graph. An example of these processes is
25 represented in Fig. 6. In later sections we use Z = Z .[0, ]2 / to denote the total mass on [0, ]2 ,
26 and similarly for D and W .
27
28 5.4. Sparsity
29 In this section we state the sparsity properties of our graph model, which relate to the properties
30 of the Levy measure . In particular, we are interested in the relative asymptotic behaviour of the
 of observed nodes N as . Henceforth,
31 number of edges .e/
 N with respect to the number
32 we consider 0 .dw/ > 0, since the case of 0 .dw/ = 0 trivially gives N.e/ = N = 0 almost
33 surely.
34 In theorem 2 we characterize the sparsity of the graph with respect to the properties of
35 its Levy measure: graphs obtained from innite activity CRMs are sparse, whereas graphs
36 obtained from nite activity CRMs are dense. The rate of growth can be further specied when
37 is a regularly varying Levy measure (Feller, 1971; Karlin, 1967; Gnedin et al., 2006, 2007), as
38 dened in Appendix A.2. We follow the notation of Janson (2011) for probability asymptotics
39 (see Appendix C.1 for details).
40
41 Theorem 2. Consider a point process Z representing an undirected graph. Let N.e/ be the
42 number of edges and N be the number of observed nodes in the point process restriction Z

43 (see equations (17) and (18)). Assume that the dening Levy measure is such that 0 w .dw/ <
44 . If the CRM W is nite activity, i.e.
45 
46 .dw/ < ,
0
47
48 then the number of edges scales quadratically with the number of observed nodes
Sparse Graphs 15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 Fig. 6. Example of point process Z and above it the associated integer-valued stochastic processes for
the number of observed nodes .N /0 ( ) and edges .N.e/ /0 ( )
23
24
25 N.e/ = .N2 / .19/
26 almost surely as , implying that the graph is dense.
27 If the CRM is innite activity, i.e.
28 
29 .dw/ = ,
30 0
31
then the number of edges scales subquadratically with the number of observed nodes
32
33 N.e/ = o.N2 / .20/
34
almost surely as , implying that the graph is sparse.
35
Furthermore, if the Levy measure is regularly varying (see denition 1 in Appendix A.2),
36
with exponent .0, 1/ and slowly varying function l satisfying lim inf l.t/ > 0, then
37 t
38 N.e/ = O.N2=1+ / almost surley as : .21/
39
Theorem 2 is a direct consequence of two theorems that we state now and prove in Appendix
40
C. The rst theorem states that the number of edges grows quadratically with , whereas the
41
second states that the number of nodes scales superlinearly with for innite activity CRMs,
42
and linearly otherwise.
43 
44 Theorem 3. Consider the point process Z. If 0 w .dw/ < , then the number of edges in
45 Z grows quadratically with :
46
47 N.e/ = .2 / .22/
48 almost surely. Otherwise, N.e/ = .2 / almost surely.
16 F. Caron and E. B. Fox
1 Theorem 4. Consider the point process Z. Then
2 
./ if W is a nite activity CRM,
3 N = .23/
4 ./ if W is an innite activity CRM
5 almost surely as . In words, the number of nodes with degree at least 1 in Z scales
6 linearly with for nite activity CRMs and superlinearly with for innite activity CRMs.
7 Furthermore, for a regularly varying Levy measure with slowly varying function l such that
8 lim inf t l.t/ > 0, we have
9
10 N = .+1 / almost surely as : .24/
11 We nally give the expressions of the expectations for the number of edges and nodes in the
12 model. The proof is given in Appendix C.4. (Equations (26) and (27) could alternatively be
13 derived as particular cases of the results in Veitch and Roy (2015).)
14
15 Theorem 5. The expected number of edges D in the multigraph, edges N .e/ in the undirected

16 graph and observed nodes N are given as follows:
17 
2 
18
E[D ] = 2
w .dw/ + w2 .dw/, .25/
19 0 0
20  
21 E[N.e/ ] = 2 .w/ .dw/ + {1 exp.w2 /}.dw/, .26/
22 0 0
23 
24 E[N ] = [1 exp{w2 .2w/}] .dw/, .27/
0
25 
26 where .t/ = 0 {1 exp.wt/}.dw/ is the Laplace exponent. Additionally, if is a regularly

27 varying Levy measure with exponent [0, 1/ and slowly varying function l, and 0 w.dw/ <
28 then
29  

1+
30 E[N ] l./ .1 / 2 w .dw/ : .28/
0
31
32
33 5.5. Simulation
34 5.5.1. Direct simulation of graph restrictions
35 By denition, the directed multigraph restriction D is drawn from a Poisson process with nite
36 mean measure W W , where W CRM., /: Leveraging standard properties of the CRM
37 and Poisson process, we can rst simulate the total number of directed edges D based on the
38 total mass W :
39
40 D |W Poisson.W 2 /:
.29/
41 For k = 1, : : : , D a particular edge is drawn by sampling a pair of nodes
42
IID W
43 Ukj |W j = 1, 2, .30/
44 W
45 where W =W is called a normalized CRM. We form directed edges .Uk1 , Uk2 /, resulting in
46

47 
D
D = .Uk1 ,Uk2 / : .31/
48 k=1
Sparse Graphs 17
1 Because of the discreteness of W , there will be ties between the .Uk1 , Uk2 /, and the number
2 of such ties corresponds to the multiplicity of that edge. In particular, a total of 2D nodes
3 Ukj are drawn but result in some N  2D distinct values. We overload the notation N here
4 because this quantity also corresponds to the number of nodes with degree at least 1 in the
5 resulting undirected network. Recall that the undirected network construction simply forms an
6 undirected edge between a set of nodes if there is at least one directed edge between them. If we
7 consider unordered pairs {Uk1 , Uk2 }, the number of such unique pairs takes a number N.e/  D
8 of distinct values, where N.e/ corresponds to the number of edges in the undirected network.
9 The construction above enables us to re-express our Cox process model in terms of normalized
10 CRMs (Regazzini et al., 2003). This is very attractive both practically and theoretically. As we
11 show in Section 6 for special cases of CRMs, one can use the results surrounding normalized
12 CRMs to derive an exact simulation technique for our directed and undirected graphs.
13
14 Remark 1. The construction above enables us to draw connections with the conguration
15 model (Bollobas, 1980; Newman, 2010), which proceeds as follows. First, the degree ki of each
16 node i = 1, : : : , n is specied such that the sum of ki is an odd number. Each node i is given a
17 total of ki stubs, or demi edges. Then, we repeatedly choose pairs of stubs uniformly at random,
18 without replacement, and connect the selected pairs to form an edge. The simple graph is
19 obtained either by discarding the multiple edges and self-loops (an erased conguration model),
20 or by repeating the above sampling until obtaining a simple graph. In our case, we have an innite
21 set of (potential) nodes and do not prespecify the node degrees. Furthermore, each node in the
22 pair .Uk1 , Uk2 / is drawn from a normalized CRM rather than the pair being selected uniformly
23 at random. However, at a high level, there is a similar avour to our construction.
24
25
5.5.2. Urn-based simulation of graph restrictions
26
We now describe an urn formulation that allows us to obtain a nite dimensional generative
27
process. Recall that, in practice, we cannot sample W CRM., / if the CRM is innite
28
activity since there will be an innite number of jumps.
29
Let .U1 , : : : , U2D

/ = .U11 , U12 , : : : , UD

1 , UD 2 /. For some classes of Levy measure , it is

30
possible to integrate out the normalized CRM = W =W in expression (30) and derive
31 
the conditional distribution of Un+1 given .W , U1 , : : : , Un /. We rst recall some background
32
on random partitions. As is discrete with probability 1, variables U1 , : : : , Un take k  n
33
distinct values j , with multiplicities 1  mj  n. The distribution on the underlying partition
34
is usually dened in terms of an exchangeable partition probability function (EPPF) (Pitman,
35
1995) .k/
n .m1 , : : : , mk |W / which is symmetric in its arguments. The predictive distribution of
36  
Un+1 given .W , U1 , : : : , Un / is then given in terms of the EPPF:

37
38
.k+1/
.m1 , : : : , mk , 1|W / 1

39 Un+1 |.W , U1 , : : : , Un / n+1.k/
40 n .m1 , : : : , mk |W /
41 k .k/ .m1 , : : : , mj + 1, : : : , mk |W /
n+1
42 + .k/
j : .32/
j=1 n .m1 , : : : , mk |W /
43
44 Using this urn representation, we can rewrite our generative process as
45
46 W PW ,
47 D |W Poisson.W 2 /,

48 .Ukj /k=1,:::,D ;j=1,2 |W urn process .32/,
18 F. Caron and E. B. Fox

1 
D
2 D = .Uk1 ,Uk2 / , .33/
k=1
3
4 where PW is the distribution of the CRM total mass W . Representation (33) can be used to
5 sample exactly from our graph model, assuming that we can sample from PW and evaluate the
6 EPPF. In Section 6 we show that this is indeed possible for specic CRMs of interest.
7
8 5.5.3. Approximate simulation of graph restrictions
9 If we cannot sample from PW in expression (33) and evaluate the EPPF in expression (32), we
10 resort to approximate simulation methods. In particular, we harness the directed multigraph
11 representation and approximate the draw of W . For our undirected graphs, we simply transform
12 the (approximate) draw of a directed multigraph as described in Section 3.3.
13 One approach to approximate simulation of W , which is possible for some Levy measures ,
14 is to resort to adaptive thinning (Lewis and Shedler, 1979; Ogata, 1981; Favaro and Teh, 2013).
15 A related alternative approximate approach, but applicable to any Levy measure satisfying
16 condition (9), is the inverse Levy method. This method rst denes a threshold " and then
17 samples the weights = {wi |wi > "} by using a Poisson measure on [", ]. One then simulates
18 D using these truncated weights .
19 A naive application of this truncated method that considers sampling directed or undirected
20 edges as in expression (12) or expression (6), respectively can prove computationally problematic
21 since a large number of possible edges must be considered (one Poisson or Bernoulli draw for
22 each .i , j / pair for the directed or undirected case). Instead, we can harness the Cox process
23 representation and resulting sampling procedure of expression (29)(30) to sample rst the total
24 number of directed edges and then their specic instantiations. More specically, to simulate
25 approximately a point process on [0, ]2 , we use the inverse Levy method to sample
26
27 ," = {.w, / , 0 <  , w > "}: .34/
Let W," = K
i=1 wi i be the associated truncated CRM and W," = W, " .[0, ]/ its
28
total mass.
D,
29
We then sample D," and Ukj as in expression (29)(30), and set D," = k=1 "
.Uk1 ,Uk2 / .
30 The undirected graph measure Z," is set to the manipulation of D," as in expression (12).
31
32
6. Special cases
33
34 In this section, we examine the properties of various models and their link to classical random-
35 graph models depending on the Levy measure . We show that, in the GGP case, the resulting
36 graph can be either dense or sparse, with the sparsity tuned by a single hyperparameter. Fur-
37 thermore, exact simulation is possible via expression (33). We focus on the undirected graph
38 case, but similar results can be obtained for directed multigraphs and bipartite graphs.
39
40 6.1. Poisson process
41 Consider a Poisson process with xed increments w0 > 0:
42
43 .dw/ = w0 .dw/:
44 
This measure denes a nite activity CRM. Recalling the denition .x/ = x .dw/, in this
45 case, we have
46 
47 1 if x < w0 ,
.x/ =
48 0 otherwise:
Sparse Graphs 19
1 Ignoring self-edges, the graph construction can be described as follows. To sample W CRM
2 ., /, we generate n Poisson./ and then sample i Unif.[0, ]/ for i = 1, : : : , n. We then
3 sample edges according to expression (6): for 0 < i < j < n, set zij = zji = 1 with probability
4 1 exp.2w02 / and 0 otherwise. The model is therefore equivalent to the ErdosRenyi random-
5 graph model G.n, p/ with n Poisson./ and p = 1 exp.2w02 /. Therefore, this choice of
6 leads to a dense graph, as our theory suggests, where the number of edges grows quadratically
7 with the number of nodes n.
8
9 6.2. Compound Poisson process
10 A compound Poisson process is a process where
11
.dw/ = h.w/ dw
12

13 and h : R+ R+ is such that 0 h.w/ dw = 1 and denes a nite activity CRM. In this case,
14 we have .x/ = 1 H.x/ where H is the distribution function that is associated with h. Here,
15 we arrive at a framework that is similar to the standard graphon. Leveraging the Kallenberg
16 representation (16), we rst sample n Poisson./. Then, for i = 1, : : : , n we set zij = zji = 1 with
17 probability M.Ui , Uj / where Ui are uniform [0, 1] variables and M is dened by
18
19 M.Ui , Uj / = 1 exp{2H 1 .Ui /H 1 .Uj /}:
20 This representation is the same as with the AldousHoover theorem, except that the number
21 of nodes is random and follows a Poisson distribution. As such, the resulting random graph is
22 either trivially empty or dense, again agreeing with our theory.
23
24 6.3. Generalized gamma process
25 The GGP (Hougaard, 1986; Aalen, 1992; Lee and Whitmore, 1993; Brix, 1999) is a exible two-
26 parameter CRM with interpretable parameters and remarkable conjugacy properties (James,
27 2002; Lijoi and Prunster, 2003; Lijoi et al., 2007; Caron et al., 2014). The process is also known
28 as the Hougaard process (Hougaard, 1986) when is the Lebesgue measure, as in this paper,
29 but we shall use the more standard term GGP in the rest of this paper. The Levy measure of the
30 GGP is given by
31
1
32 .dw/ = w1 exp. w/dw, .35/
33 .1 /
34 where the two parameters ., / satisfy
35
., / ., 0] .0, + / or ., / .0, 1/ [0, + /: .36/
36
37 The GGP has different properties if  0 or < 0. When < 0, the GGP is a nite activity CRM
38 (i.e. a compound Poisson process); more precisely, the number of jumps in [0, ] is nite with
39 probability 1 and drawn from a Poisson distribution with rate .=/ whereas the jumps wi
40 are IID gamma., /.
41 When  0, the GGP has an innite number of jumps over any interval [s, t]. It includes
42 as special cases the gamma process ( = 0, > 0), the stable process ( .0, 1/, = 0) and the
43 inverse Gaussian process ( = 21 , > 0).
44 The tail Levy intensity of the GGP is given by
45 ., x/
46 
if > 0,
1 1 .1 /
47 .x/ = w exp. w/ dw =
.1 /
x
48 x if = 0,
.1 /
20 F. Caron and E. B. Fox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 (a) (b)
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34 (c) (d)
35
Fig. 7. Sample graphs: (a) ErdosRenyi graph G.n, p/ with n D 1000 and p D 0:05, and GGP graphs
36 GGP., , / with (E) G D 100, D 2 and (b) D 0, (c) D 0:5 and (d) D 0:8 (the size of a node is proportio
37 QDOto its degree; the Jraphs were generated with the software Gephi (Bastian et al., 2009))
38
39 where .a, x/ is the incomplete gamma function. Example realizations of the process for various
40 values of  0 are displayed in Fig. 7 alongside a realization of an ErdosRenyi graph.
41
42 6.3.1. Exact sampling via an urn approach
43 In the case > 0, W is an exponentially tilted stable random variable, for which exact samplers
44 exist (Devroye, 2009). As shown by Pitman (2003) (see also Lijoi et al. (2008)), the EPPF
45 conditional on the total mass W = t depends only on the parameter (and not and ) and
46 is given by
 t
47 .n/ k t n 
k .m /
i
k .m1 , : : : , mk |t/ = snk1 g .t s/ ds , .37/
48 .n k/g .t/ 0 i=1 .1 /
Sparse Graphs 21
1 where g is the probability density function of the positive stable distribution. Plugging the EPPF
2 (37) into expression (32) yields the urn process for sampling in the GGP case. In particular, we
3 can use the generative process (33) to sample exactly from the model.
4 In the special case of the gamma process ( = 0), W is a gamma., / random variable and
5 the resulting urn process is given by (Blackwell and MacQueen, 1973; Pitman, 1996):
6
  k mj
7 Un+1 |.W , U1 , : : : , Un / + : .38/
8 + n j=1 + n j
9 When < 0, the GGP is a compound Poisson process and can thus be sampled exactly.
10
11 6.3.2. Sparsity
12 Appealing to theorem 2, we use the following facts about the GGP to characterize the sparsity
13 properties of this special case.
14 
(a) For < 0, the CRM is nite activity with 0 w.dw/ < ; thus theorem 2 implies that
15 the graph is dense.
16 
(b) When  0 the CRM is innite activity; moreover, for > 0, 0 w.dw/ < , and thus
17 theorem 2 implies that the graph is sparse.
18 (c) For > 0, the tail Levy intensity has the asymptotic behaviour
19
x0 1
20 .x/ x
21 .1 /
22 and, as such, is regularly varying with exponent and constant slowly varying function.
23
24 We thus conclude that

.N / if < 0,
2
25
26
.e/ 2
N = o.N / if = 0, > 0, .39/
2=.1+/
27 O.N / if .0, 1/, > 0,
28 almost surely as , i.e. the GGP parameter tunes the sparsity of the graph: The underlying
29 graph is sparse if  0 and dense otherwise.
30 
31 Remark 2. The proof technique of theorem 2 requires 0 w .dw/ < and thus excludes
32 the stable process . = 0, .0, 1//, although we conjecture that the graph is also sparse in that
33 case.
34 Additionally, applying theorem 5, we obtain
35
36
37

if < 0,
E[N ] log./ if = 0,
38

1+ 2 .1/
39 if > 0, > 0:
40
41
42 6.3.3. Empirical analysis of graph properties
43 For the GGP-based formulation, we provide an empirical analysis of our network properties in
44 Fig. 8 by simulating undirected graphs by using the approach that was described in Section 5.5
45 for various values of and . We compare with an ErdosRenyi random graph, preferential
46 attachment (Barabasi and Albert, 1999) and the Bayesian non-parametric network model of
47 Lloyd et al. (2012). The particular features that we explore are as follows.
48 (a) Degree distribution: Fig. 8(a) suggests empirically that the model can exhibit power law
22 F. Caron and E. B. Fox
1
2
3
100 100
4
5
101
6 101
7
8 102

9
Distribution

102

Distribution
10 103
11
12 103
104
13
14
104 105
15
16
106
17 105
18 100 101 102 100 101 102 103
19 Degree Degree
20 (a) (b)
21
22
103 104
23
Number of nodes of degree one

24
25
26
Number of edges

27 102 103

28
29
30
31 101 102
32
33
34
35 100 101
36 101 102 101 102
37 Number of nodes Number of nodes
38 (c) (d)
39 Fig. 8. Examination of the GGP undirected network properties (averaging over graphs with various )
40 in comparison with an ErdosRenyi G.n, p/ model with p D 0:05 ( ), the preferential attachment model of
Barabasi and Albert (1999) ( ) and the non-parametric formulation of Lloyd et al. (2012) ( ): (a) degree
41 distribution on a loglog-scale for (a) various values of ( , D 0:2; , D 0:5; , D 0:8) ( D 102 ) and (b)
42 various values of ( , D 101 ; , D 1; , D 5) ( D 0:5) for the GGP; (c) number of nodes with degree
43 1 versus number of nodes on a loglog-scale ( , D 0:2; , D 0:5; , D 0:8) (note that the Lloyd method
leads to dense graphs such that no node has only degree 1 ; (d) number of edges versus number of nodes
44 ( , D 0:2; , D 0:5; , D 0:8) (here we note growth at a rate o.n2 / for our GGP graph models, and .n2 /
45 for the Erdos-Renyi and Lloyd models (dense graphs))
46
47
48
Sparse Graphs 23
1 behaviour providing a heavy-tailed degree distribution. As shown in Fig. 8(b), the model
2 can also handle an exponential cut-off in the tails of the degree distribution, which is an
3 attractive property (Clauset et al., 2009; Olhede and Wolfe, 2012).
4 (b) Number of degree 1 nodes: Fig. 8(c) examines the fraction of degree 1 nodes versus the
5 number of nodes.
6 (c) Sparsity: Fig. 8(d) plots number of edges versus number of nodes. The larger , the
7 sparser the graph is. In particular, for the GGP random-graph model, we have network
8 growth at a rate O.na / for 1 < a < 2 whereas the ErdosRenyi (dense) graph grows as
9 .n2 /.
10
11 6.3.4. Interpretation of hyperparameters
12 On the basis of the properties derived and illustrated empirically in this section, we see that our
13 hyperparameters have the following interpretations.
14
15 (a) from Figs 8(a) and (d), relates to the slope of the degree distribution in its power law
16 regime and the overall network sparsity. Increasing leads to higher power law exponent
17 and sparser networks.
18 (b) from theorem 5, provides an overall scale that affects the number of nodes and
19 directed interactions, with larger leading to larger networks.
20 (c) from Fig. 8(b), determines the exponential decay of the tails of the degree distribu-
21 tion, with small looking like pure power law. This is intuitive from the form of .dw/ in
22 equation (35), where we see that affects large weights more than small weights.
23
24 7. Posterior characterization and inference
25
26 In this section, we consider the posterior characterization and MCMC inference of parameters
27 and hyperparameters in our statistical network models.
28 Assume that we have observed a set of undirected connections .zij /1i,j N or directed con-
29 nections .nij /1i,j N where N is the observed number of nodes with at least one connection.
30 Without loss of generality, we assume that the locations of these nodes 0 < 1 <: : : < N < are
31 ordered, and we write wi = W.{i }/ as their associated sociability parameters. For simplicity, we
32 are overloading notation here with the unordered nodes in W = i wi i of equation (7).
33 We aim to infer the sociability parameters wi , i = 1, : : : , N , for each of the observed nodes.
34 We also aim to infer the sociability parameters of the nodes with no connections (the difference
35 between the set of potential nodes and those with observed interactions). We refer to these as
36 unobserved nodes. Under our framework, the number of such nodes is either nite but unknown
37 or innite. The observed connections, however, provide information about only the sum of their
38 sociabilities, denoted w . The node locations i of both observed and unobserved nodes are
39 also not likelihood identiable and are thus ignored. We additionally aim to estimate and
40 the hyperparameters of the Levy intensity of the CRM; we write for the set of hyperpa-
41 rameters. We therefore aim to approximate the posterior p.w1 , : : : , wN , w , |.zij /1<i,j<N / for
42 an observed undirected graph and p.w1 , : : : , wN , w , |.nij /1<i,j<N / for an observed directed
43 graph. (Formally, this density is with respect to a product measure that has a Dirac mass at 0
44 for w , as detailed in Appendix F.)
45
46 7.1. Directed multigraph posterior
47 In theorem 6, we characterize the posterior in the directed multigraph case. This plays a key role
48 in the undirected case that is explored in Section 7.2 as well.
24 F. Caron and E. B. Fox
1 Theorem 6. For N  1, let 1 <: : : < N be the set of support points of the measure D such
2 that D = 1i, j N nij .i ,j / . Let wi = W .{i }/ and w = W N
i=1 wi . We have
3
P{.wi dwi /1iN , w dw |.nij /1i,j N , /
4  N 2
 N

5   mi
exp wi + w wi .dwi / G .dw / .40/
6 i=1 i=1
7
8 where mi = Nj=1 .nij + nji / > 0 for i = 1, : : : , N are the node degrees of the multigraph and

9 G is the probability distribution of the random variable W , with Laplace transform


10 E[exp.tW /] = exp{.t/}: .41/


11
12 Additionally, conditionally on observing an empty graph, i.e. N = 0, we have
13
P.w dw |N = 0, / exp.w2 / G .dw /: .42/
14
15 The proof builds on posterior characterizations for normalized CRMs (James, 2002, 2005;
16 Prunster, 2002; Pitman, 2003; James et al., 2009) using the hierarchical construction of expres-
17 sion (29)(30). See Appendix E.
18 The conditional distribution of .w1 , : : : , wN , w / given .nij /1i,j N does not depend on the
19 locations .1 , : : : , N / because we considered a homogeneous CRM. This fact is important
20 since the locations .1 , : : : , N / are typically not observed, and our algorithm outlined below
21 will not consider these terms in the inference.
22
23
24 7.2. Markov chain Monte Carlo Sampling for generalized gamma process based
25 directed and undirected graphs
26 We now specialize to the case of the GGP, for which we derive an MCMC sampler for posterior
27 inference. Let = ., , / be the set of hyperparameters that we also want to estimate. We
28 assume improper priors on the hyperparameters:
29 
p./ 1=,
30 p./ 1=.1 /, .43/
31 p. / 1= :
32
33 To emphasize the dependence on the hyperparamters of the Levy measure and distribution of
34 the total mass w , we write .w|, / and G,, .dw /.
35 In the case of an undirected graph, we simply impute the missing directed edges in the graph.
36 For each i  j such that zij = 1, we introduce latent variables nij = nij + nji with conditional
37 distribution

38 if zij = 0,
39 nij |z, w 0 .44/
tPoisson.2wi wj / if zij = 1, i = j,
40
41 and nii |zii = 1, wi tPoisson.wi2 /, where tPoisson() is the zero-truncated Poisson distribution
42 with probability mass function
43
k exp./
44 , for k = 1, 2, : : : :
45 {1 exp./}k!
46 By convention, we set nij = nji for j < i and mi = N
j=1 nij .
47 For scalable exploration of the target posterior, we propose to use HMC (Duane et al., 1987;
48 Neal, 2011) within Gibbs sampling to update the weights .w1 , : : : , wN /. The HMC step requires
Sparse Graphs 25
1 computation of the gradient of the log-posterior, which in our case, letting i = log.wi /, is given
2 by
3  

N
4 [1:N log{p.1:N , w |D /}]i = mi wi + 2 wj + 2w : .45/
5 j=1
6 For the update of the total mass w and hyperparameters , we use a MetropolisHastings
7 step. Unless = 0 or = 21 , G,, .dw / does not admit any tractable analytical expression. We
8 therefore use a specic proposal for w based on exponential tilting of G,, that alleviates
9 the need to evaluate this probability density function in the MetropolisHasting ratio (see the
10 details in Appendix F). To summarize, the MCMC sampler is dened as follows.
11
12 Step 1: update the weights .w1 , : : : , wN / given the rest by using an HMC update.
13 Step 2: update the total mass w and hyperparameters = ., , / given the rest by using a
14 MetropolisHastings update.
15 Step 3: (undirected graph) update the latent counts .nij / given the rest by using the conditional
16 distribution (44) or a MetropolisHastings update.
17 The computational bottlenecks lie in steps 1 and 3, which roughly scale linearly in the number
18 of nodes and edges respectively, although one can parallelize step 3 over edges. If L is the number
19 of leapfrog steps in the HMC algorithm and niter the number of MCMC iterations, the overall
20 complexity is in O{niter .LN + N.e/ /}. We show in Section8 that the algorithm scales well to
21 large networks with hundreds of thousands of nodes and edges. To scale the HMC algorithm to
22 even larger collections of nodes of edges, one could explore the methods of Chen et al. (2014).
23
24
8. Experiments
25
26 8.1. Simulated data
27 We rst study the convergence of the MCMC algorithm on simulated data where the graph is
28 simulated from our model. We simulated a GGP undirected graph with parameters = 300, =
29 0:5 and = 1, which places us in the sparse regime. The sampled graph resulted in 13 995 nodes
30 and 76 605 edges. We ran three MCMC chains each with 40 000 iterations and with different
31 initial values and L = 10 leapfrog steps; the step size of the leapfrog algorithm was adapted during
32 the rst 10 000 iterations to obtain an acceptance rate of 0.6. Standard deviations of the random-
33 walk MetropolisHastings steps for log. / and log.1 / were set to 0.02. The computing
34 time for running the three chains successively was 10 min using a MATLAB implementation
35 on a standard computer (central processor unit at 3.10 GHz; four cores). Trace plots of the
36 parameters , , and w are given in Fig. 9. We computed the potential scale factor reduction
37 (Brooks and Gelman, 1998; Gelman et al., 2014) for all 13 999 parameters .w1:N , w , , and
38 / and found a maximum value of 1.01, suggesting convergence of the algorithm. This is quite
39 remarkable as the MCMC sampler actually samples from a target distribution of dimension
40 13 995 + 76 605 + 4 = 90 604. Posterior credible intervals of the sociability parameters wi of
41 the nodes with highest degrees and log-sociability parameters log.wi / of the nodes with lowest
42 degrees are displayed in Figs 10(a) and 10(b) respectively, showing the ability of the method to
43 recover sociability parameters of both low and high degree nodes accurately.
44 To show the versatility of the GGP graph model, we now examine our approach when the
45 observed graph is actually generated from an ErdosRenyi model with n = 1000 and p = 0:01.
46 The generated graph had 1000 nodes and 5058 edges. We ran three MCMC chains with the
47 same specications as above. In this dense graph regime, the following transformation of our
48 parameters , and is more informative: 1 = .=/ , 2 = = and 3 = = 2 . When
26 F. Caron and E. B. Fox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(a) (b)
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36 (c) (d)
37 Fig. 9. MCMC trace plots of parameters (a) , (b) , (c) and (d) w* for a graph generated from a GGP
38 model with parameters D 300, D 0:5 and D 1: , chain 1; , chain 2; , chain 3; ,
true
39
40 < 0, 1 corresponds to the expected number of nodes, 2 to the mean of the sociability parame-
41 ters and 3 to their variance (see Section 6.3). In contrast, the parameters and are only weakly
42 identiable in this case. The potential scale reduction factor is computed on .w1:N , w , 1 , 2 , 3 /,
43 and its maximum value was 1.01, suggesting convergence.
of 1 converges around the true number of nodes and 2 to the true sociability
44 The value
45 parameter { 21 log.1 p/} (constant across nodes for the ErdosRenyi model), whereas 3 is
46 close to 0 as the variance over the sociability parameters is very small. The total mass is also
47 very close to 0, indicating that there are no nodes with degree 0.
48 Posterior credible intervals for the nodes with highest and lowest degrees are in Fig. 11,
Sparse Graphs 27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(a) (b)
18
19 Fig. 10. 95% posterior intervals ( ) of (a) the sociability parameters wi of the 50 nodes with highest degree
and (b) the log-sociability parameter log.wi / of the 50 nodes with lowest degree, for a graph generated from
20 a GGP model with parameters D 300, D 0.5 and D 1: , true values
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 (a) (b)
39 Fig. 11. 95% posterior intervals ( ) of (a) sociability parameters wi of the 50 nodes with highest degree
40 and (b) log-sociability parameters log.wi / of the 50 nodes with lowest degree, for a graph generated from
41
an Erdos-Renyi model pwith parameters n D 1000 and p D 0:01: in this case, all nodes have the same true
sociability parameter { 12 log.1  p/} . /
42
43 showing that the model can accurately recover sociability parameters of both low and high
44 degree nodes in the dense regime as well.
45
46 8.2. Assessing properties of real world graphs
47 We now turn to using our methods to assess properties of a collection of real world graphs,
48 including their degree distributions and aspects of sparsity. For the latter, evaluation based on
28 F. Caron and E. B. Fox
1 a single nite graph is notoriously challenging as sparsity relates to the asymptotic behaviour
2 of the graph. Measures of sparsity from nite graphs exist but can be costly to implement
3 (Nesetril and Ossona de Mendez, 2012). On the basis of our GGP-based formulation and
4 associated theoretical results described in Section 6, we consider Pr.  0|z/ as informative
5 of the connectivity structure of the graph since the GGP graph model yields dense graphs for
6 < 0, and sparse graphs for [0, 1/ (see equation (39)). For our analyses, we consider improper
7 priors on the unknown parameters ., , /. We report Pr.  0|z/ based on a set of observed
8 connections .zij /1<i,j<N , which can be directly approximated from the MCMC output.We
9 consider 12 different data sets:
10
(a) facebook107social circles in Facebook (https://snap.stanford.edu/data/egonets-
11
Facebook.html) (McAuley and Leskovec, 2012);
12
(b) polblogspolitical blogosphere (February 2005) (http://www.cise.u.edu/research/sparse/
13
matrices/Newman/polblogs) (Adamic and Glance, 2005);
14
(c) USairportUS airport connection network in 2010 (http://toreopsahl.com/datasets/)
15
(Colizza et al., 2007);
16
(d) UCirvinesocial network of students at the University of California, Irvine (http://toreop
17
sahl.com/datasets/) (Opsahl and Panzarasa, 2009);
18
(e) yeastyeast protein interaction network (http://www.cise.u.edu/research/sparse/
19
matrices/Pajek/yeast.html) (Bu et al., 2003);
20
(f) USpowernetwork of the high-voltage power grid in the western states of the USA
21
(https://snap.stanford.edu/data/email-Enron.html) (Watts and Strogatz, 1998);
22
(g) IMDBactor collaboration network based on acting in the same movie (http://www.cise.
23
u.edu/research/sparse/matrices/Pajek/IMDB.html);
24
(h) cond-mat1co-authorship network (https://snap.stanford.edu/data/email-Enron.html)
25
(Newman, 2001), based on preprints posted to condensed matter of arXiv between 1995
26
and 1999, obtained from the bipartite preprintsauthors network using a one-mode pro-
27
jection;
28
(i) cond-mat2as in cond-mat1, but using Newmans projection method;
29
(j) EnronEnron collaboration network from a multigraph e-mail network (https://snap.
30
stanford.edu/data/email-Enron.html)
31
(k) internetconnectivity of internet routers (http://www.cise.u.edu/research/sparse/
32
matrices/Pajek/internet.html);
33
(l) wwwlinked World Wide Web pages in the nd.edu domain (http://lisgi1.engr.ccny.cuny.
34
edu/makse/soft data.html).
35
36 The sizes of the various data sets are given in Table 2 and range from a few hundred nodes
37 or edges to a million. The adjacency matrices for these networks are plotted in Fig. 12 and
38 empirical degree distributions in Fig. 13 (red).
39 We ran three MCMC chains for 40 000 iterations with the same specications as above and
40 report the estimate of Pr.  0|z/ and 99% posterior credible intervals of in Table 2; we addi-
41 tionally provide run times. MCMC trace plots suggested rapid convergence of the sampler. Since
42 sparsity is an asymptotic property of a graph, and we are analysing nite graphs, our inference
43 of here simply provides insight into some structure of the graph and is not formally a test of
44 sparsity. From Table 2, we note that we infer negative -values for many of the smaller networks.
45 This might indicate that these graphs have dense connectivity; for example, our facebook107
46 data set represents a small social circle that is probably highly interconnected and the polblogs
47 data set represents two tightly connected political parties. We infer positive -values for three of
48 the data sets (USairport, Enron and www); note that two of these data sets are in the top three
Sparse Graphs 29
1 Table 2. Size of real world data sets and posterior probability of sparsity
2
3 Data set Number of Number of Time Pr(  0|z) 99% credible
4 nodes edges (min) interval
5
6 facebook107 1034 26749 1 0.00 [1:06, 0:82]
7 polblogs 1224 16715 1 0.00 [0:35, 0:20]
USairport 1574 17215 1 1.00 [0:10, 0:18]
8 UCirvine 1899 13838 1 0.00 [0:14, 0:02]
9 yeast 2284 6646 1 0.28 [0:09, 0:05]
10 USpower 4941 6594 1 0.00 [4:84, 3:19]
IMDB 14752 38369 2 0.00 [0:24, 0:17]
11 cond-mat1 16264 47594 2 0.00 [0:95, 0:84]
12 cond-mat2 7883 8586 1 0.00 [0:18, 0:02]
13 Enron 36692 183831 7 1.00 [0:20, 0:22]
internet 124651 193620 15 0.00 [0:20, 0:17]
14 www 325729 1090108 132 1.00 [0:26, 0:30]
15
16
17
18 largest networks considered, where sparse connectivity is more commonplace. In the remaining
19 large network, internet, a question is why the inferred is negative. This may be due to dense
20 subgraphs or spots (for example, spatially proximate routers may be highly interconnected, but
21 sparsely connected outside the group) (Borgs et al., 2014b). This relates to the idea of commu-
22 nity structure, though not every node need be associated with a community. As in many sparse
23 network models that assume no dense spots (Bollobas and Riordan, 2009; Wolfe and Olhede,
24 2013), our approach does not explicitly model such effects. Capturing such structure remains a
25 direction of future research that is likely to be feasible within our generative framework. How-
26 ever, our current method has the benet of simplicity with three hyperparameters tuning the
27 network properties. Finally, we note in Table 2 that our analyses nish in a remarkably short
28 time although the code base was implemented in MATLAB on a standard desktop machine,
29 without leveraging possible opportunities for parallelizing and other mechanisms for scaling
30 the sampler (see Section 7 for a discussion).
31 To assess our t to the empirical degree distributions, we used the methods that were described
32 in Section 5.5 to simulate 5000 graphs from the posterior predictive distribution in each scenario.
33 Fig. 13 provides a comparison between the empirical degree distributions and those based on
34 the simulated graphs. In all cases, we see a reasonably good t. For the largest networks, Figs
35 13(j)13(l), we see a slight underestimate of the tail of the distribution, i.e. we do not capture as
36 many high degree nodes as truly present. This may be because these graphs exhibit a power law
37 behaviour, but only after a certain node degree (Clauset et al., 2009), which is not an effect that is
38 explicitly modelled by our framework. Instead, our model averages the error in the low and high
39 degree nodes. Another reason for underestimating the tails might be dense spots, which we also
40 do not explicitly model. However, our model does capture power law behaviour with possible
41 exponential cut off in the tail. We see a similar trend for cond-mat1, but not cond-mat2. Based on
42 the bipartite articlesauthors graph, cond-mat1 uses the standard one-mode projection and sets
43 a connection between two authors who have co-authored a paper; this projection clearly creates
44 dense spots in the graph. In contrast, cond-mat2 uses Newmans projection method (Newman
45 et al., 2001). This method constructs a weighted undirected graph by counting the number of
46 papers that were co-authored by two scientists, where each count is normalized by the number
47 of authors on the paper. To construct the undirected graph, an edge is created if the weight is
48 equal to or greater than 1; cond-mat1 and cond-mat2 thus have a different number of edges
30 F. Caron and E. B. Fox
1
2
3
4
5
6
7
8
9
10
11 (a) (b) (c)
12
13
14
15
16
17
18
19
20
21
22
23 (d) (e) (f)
24
25
26
27
28
29
30
31
32
33
34
(g) (h) (i)
35
36
37
38
39
40
41
42
43
44
45
46 (j) (k) (l)
47 Fig. 12. Adjacency matrices for various real world networks: (a) facebook107; (b) polblogs; (c) USairport;
48 (d) UCirvine; (e) yeast; (f) USpower; (g) IMDB; (h) cond-mat1; (i) cond-mat2; (j) Enron; (k) internet; (l) www
Sparse Graphs 31
1
2
3
4
5
6
7
8
9
10
11 (a) (b) (c)
12
13
14
15
16
17
18
19
20
21
22 (d) (e) (f)
23
24
25
26
27
28
29
30
31
32
33
(g) (h) (i)
34
35
36
37
38
39
40
41
42
43
44
45 (j) (k) (l)
46 Fig. 13. Empirical degree distribution ( ) and posterior predictive ( ) for various realworld net-
47 works ( , data): (a) facebook107; (b) polblogs; (c) USairport; (d) UCirvine; (e) yeast; (f) USpower; (g) IMDB;
48 (h) cond-mat1; (i) cond-mat2; (j) Enron; (k) internet; (l) www
32 F. Caron and E. B. Fox
1 and nodes, as only nodes with at least one connection are considered. It is interesting that the
2 projection method that was used for the cond-mat data set has a clear inuence on the sparsity
3 of the resulting graph, cond-mat2 being less dense than cond-mat1 (see Figs 13(h) and 13(i)).
4 The degree distribution for cond-mat1 is similar to that of internet, thus inheriting the same
5 issues as previously discussed. Overall, it appears that our model better captures homogeneous
6 power law behaviour with possible exponential cut-off in the tails than it does a graph with
7 perhaps structured dense spots or power law after a point behaviour.
8
9 9. Conclusion
10
11 We proposed a class of statistical network models building on exchangeable random measures.
12 Using this representation, we showed how it is possible to specify models with properties that
13 are different from those of models based on exchangeable adjacency matrices. As an example,
14 we considered a model building on the framework of CRMs that can yield sparse graphs while
15 maintaining attractive exchangeability properties. For a choice of CRMs, our fully generative
16 formulation can yield networks ranging from dense to sparse, as tuned by a single hyperpara-
17 meter.
18 In this paper, exchangeability is in the context of random measures for which we appealed to
19 the Kallenberg representation in place of the AldousHoover theorem for exchangeable arrays.
20 Using this framework, we arrived at a structure that is analogous to the graphon, which opens up
21 new modelling and theoretical analysis possibilities beyond those of the special case that is con-
22 sidered herein. Importantly, through the exchangeability of the underlying random measures
23 and leveraging HMC sampling, we devised a scalable algorithm for posterior computations.
24 This scheme enables inference of the graph parameters, including the parameter determining
25 the sparsity of the graph. We examined our methods on a range of real world networks, demon-
26 strating that our model yields a practically useful statistical tool for network analysis.
27 We believe that the foundational modelling tools and theoretical results that we presented
28 represent an important building block for future developments. Such developments can be
29 divided along two dimensions:
30
(a) modelling advances, such as incorporating notions of community structure and node
31
attributes, within this framework and
32
(b) theoretical analyses looking at the properties of the corresponding class of networks.
33
34 For the latter, we considered just one simplied version of the Kallenberg representation; exam-
35 ining a more general form could yield graphs with additional structure. Building on an initial
36 version of this paper (Caron and Fox, 2014), initial forays into advances on the modelling side
37 can be found in Herlau et al. (2016) and Todeschini and Caron (2016) and theoretical analyses
38 in Veitch and Roy (2015) and Borgs et al. (2016).
39
40
Acknowledgements
41
42 The authors thank Bernard Bercu, Arnaud Doucet, Yee Whye Teh, Stefano Favaro, Dan Roy,
43 Lancelot James, Jennifer Chayes, Christian Borgs, Judith Rousseau, George Deligiannidis and
44 Konstantina Palla for helpful discussions and feedback on earlier versions of this paper.
45 FC acknowledges the support of the European Commission under the Marie Curie intra-
46 European fellowship programme. This work was partially supported by the Alan Turing Institute
47 under Engineering and Physical Sciences Research Council grant EP/N510129/1 and by the
48 BNPSI ANR project ANR-13-BS-03-0006-01.
Sparse Graphs 33
1 EBF was supported in part by DARPA grant FA9550-12-1-0406 negotiated by AFOSR and
2 AFOSR grant FA9550-12-1-0453.
3
4 Appendix A: Further background
5
6 A.1. AldousHoover theorem and graphons
In theorem 7, we present the AldousHoover theorem for the case of joint exchangeabilityi.e. symmetric
7
permutations of rows and columnswhich is applicable to matrices Z where both rows and columns
8 index the same set of nodes. Separate exchangeability allows for different row and column permutations,
9 making it applicable to scenarios where one has distinct node identities on rows and columns, such as in
10 the bipartite graphs that we considered in Section 3.4. Extensions to higher dimensional arrays are likewise
11 straightforward. For a more general statement of the AldousHoover theorem, which holds for separate
exchangeability and higher dimensional arrays, see Orbanz and Roy (2015).
12
13 Theorem 7 (Aldous-Hoover representation of jointly exchangeable matrices (Aldous, 1981; Hoover,
14 1979)). A random matrix .Zij /i, jN is jointly exchangeable if and only if there is a random measurable
function f : [0, 1]3 Z such that
15
d
16 .Zij / =.f.Ui , Uj , Uij //, .46/
17 where .Ui /iN and .Uij /i, j>iN with Uij = Uji are a sequence and matrix respectively of IID uniform[0, 1]
18 random variables.
19 For undirected graphs where Z is a binary, symmetric adjacency matrix, the AldousHoover repre-
sentation can be expressed as the existence of a graphon M : [0, 1]2 [0, 1], symmetric in its arguments,
20
where
21 
22 1 Uij < M.Ui , Uj /,
f.Ui , Uj , Uij / = .47/
0 otherwise:
23
24
25 A.2. Regularly varying Levy measures
26 Here we provide a formal denition of a regularly varying Levy measure that is referred to in theorems
27 2 and 5.
28 Denition 1 (regular variation). A Levy measure on .0, / is said to be regularly varying if its tail
29 Levy intensity .x/ = x .dw/ is a regularly varying function (Feller, 1971), i.e. it satises
30 x0
.x/ l.1=x/x .48/
31
32 for [0, 1/ and l a slowly varying function satisfying limt l.at/=l.t/ = 1 for any a > 0.
33 For example, constant and logarithmic functions are slowly varying.
34
35 Appendix B: Proof of proposition 1
36
The proof of proposition 1 follows from the properties of W CRM., /. Let Ai = [h.i 1/, hi] for h > 0
37 and i N: We have
38 d
39 .W.Ai // = .W.A.i/ // .49/
40 for any permutation of N. As D.Ai Aj / Poisson{W.Ai /W .Aj /}, it follows that
41 d
.D.Ai Aj // = .D.A.i/ A.j/ // .50/
42
43 for any permutation of N. Joint exchangeability of Z follows directly.
44
45 Appendix C: Proofs of results on the sparsity
46 C.1. Probability asymptotics notation
47 We rst describe the asymptotic notation that is used in the remainder of this section, which follows the
48 notation of Janson (2011). All unspecied limits are as .
34 F. Caron and E. B. Fox
1 Let .X /0 and .Y /0 be two [0, /-valued stochastic processes dened on the same probability
2 space and such that lim X = lim Y = almost surely. We have
3 X = O.Y / almost surely lim sup X =Y < almost surely,
4
X = o.Y / almost surely lim X =Y = 0 almost surely,
5
X = .Y / almost surely Y = O.X / almost surely,
6
7 X = .Y / almost surely Y = o.X / almost surely,
8 X = .Y / almost surely X = O.Y / and X = .Y / almost surely:
9 The equivalence notation f.x/ x0 g.x/ is used for limx0 f.x/=g.x/ = 1 (not to be confused with the
10 notation alone for distributed from).
11
12 C.2. Proof of theorem 3 
13 Assume the moment condition 0 < 0 w .dw/ < : Let
14 
1 if Z.[i 1, i] [j 1, j]/ > 0,
15 Zij = .51/
0 otherwise
16
and Dij = D.[i 1, i] [j 1, j]/. Then, almost surely for any k N,
17
 
18 Zij  Nk.e/  Dij : .52/
1i<jk 1i, jk
19
20 As Z is a jointly exchangeable point process, .Zij /i, jN is a jointly exchangeable binary matrix, and so
21 
Zij = .k2 / almost surely as k : .53/
22 1i<jk
23
Moreover, we have
24
ind
25 Dij |W Poisson{W.[i 1, i]/W.[j 1, j]/} .54/
26
27 So lemma 1 in Appendix D and the strong law of large numbers for V -statistics (Arcones and Gine, 1992;
Gine and Zinn, 1992) imply that
28

29 Dij = .k2 / almost surely as k : .55/
1i, jk
30
31 We therefore conclude that Nk.e/ = .k2 / almost surely as k : Finally, for any k   k + 1,
32 .e/
33 k2 Nk.e/ N.e/ .k + 1/2 Nk+1
 
34 .k + 1/2 k2 2 k2 .k + 1/2
35 and as .k + 1/=k 1 we conclude that
36
N.e/ = .2 / almost surely as : .56/
37
38
39 C.3. Proof of theorem 4
40 C.3.1. Finite activity case
41 We rst consider the case of a nite activity CRM. In this case, the number of nodes is bounded below by the
square root of the number of edges, and bounded above by the (nite) number of jumps of the CRM. Let T =
42
0
.dw/, 0 < T < . Let J denote the number of points .wi , i / such
that i < . .J /0 is a homogeneous
43 Poisson process of rate T , and thus J = T almost surely. Since N.e/  N  J almost surely, it follows
44 from result (56) that N = ./ almost surely as .
45
46 C.3.2. Innite activity case 
47 We now consider the innite activity case where .dw/ = . First note that, by monotone convergence,
0
48 limt .t/ = .
Sparse Graphs 35
1
2
3
4
5
6
7
8
9
10
11 Fig. 14. Illustration of the proof of theorem 4 in the infinite activity case ( , support points of the measure
12 D): in this example, the coloured regions indicate the intersection of A2n1 with Sn.2/ for n D 1 ( ), n D 2 ( ),
13 and n D 3 ( ); from this, we see that X1 D 0 (the count in blue), X2 D 2 (the count in red) and X3 D 1 (the count
14 in green)
15
16 Consider sets Ak = [.k 1/=2, k=2/ for k = 1, 2, : : : , and let Sn.1/ = nk=1 A2k1 and Sn.2/ = nk=1 A2k . Sn.1/ and
Sn.2/ dene a partition of [0, n]. For n N, dene the random variable Xn as
17
18 Xn = #{i A2n1 |D.{i } Sn.2/ / > 0} .57/
19 and let
20 


21 N = Xn :
n=1
22
23 See Fig. 14 for an illustration. Clearly, N is a lower bound for the number of nodes:
24 N  N almost surely: .58/
25
Using the notation S .1/
= and S
k=1 A2k1
.2/
=
k=1 A2k ,
.1/ .2/
let W and W be respectively the restriction
26 of W to the set S .1/ and S . As S and S .2/ are non-overlapping and W is a CRM, W .1/ and W .2/ are
.2/ .1/
27 independent. Integrating over W .1/ and using the marking theorem for Poisson processes (see below for
28 more details), we obtain for n  1
29 ind
Xn |W .2/ Poisson[ 21 {W.Sn.2/ /}]: .59/
30
31 Lemma 1 thus implies that
32 
n
Xk
33 k=1
1 almost surely: .60/
34 
n
1
{W.Sk.2/ /}
35 2
k=1
36
37 We have .Sn.2/ / = n=2 and, using the law of large numbers,

38 W.Sn.2/ /
w .dw/ almost surely: .61/
39 n=2 0
40
Therefore {W.Sn.2/ /} almost surely. Its Cesaro mean also diverges and
41
42 
n
{W.Sk.2/ /}
43 k=1
almost surely, .62/
44 n
45 which, together with result (60), implies that 1=nk=1 Xk almost surely. We conclude that N =
n

almost surely and, using inequality (58), N = almost surely.


46 Consider now the case where .x/ x0 l.1=x/x where .0, 1/ and l.t/ is a slowly varying function
47 such that lim inf t l.t/ > 0: Then lemma 2 in Appendix D implies that lim inf t .t/=t > 0 and thus,
48 using result (61),
36 F. Caron and E. B. Fox
1 {W.Sn.2/ /}
lim inf >0 almost surely:
2 n n
3 Riemann integration and the StolzCesaro theorem then imply that
4 
n 
n
5 {W.Sk.2/ /} {W.Sk.2/ /}
k=1 k=1
6 lim inf = lim inf >0 almost surely:
n n+1 n 
n
7 . + 1/ k
k=1
8
+1
9 and nally N = . / almost surely as .
10
11 C.3.2.1. Proof of result (59). Let n  1. For any i A2n1 , let ui be a binary mark such that
12 Pr.ui = 1|W/ = 1 exp{wi W.Sn.2/ /}
13
14 Conditionally on Wn.2/ , the marking theorem for Poisson processes implies that the set of points {.wi /|i
A2n1 , ui = 1} is drawn from a Poisson process of intensity 21 .dw/[1 exp{wW.Sn.2/ /}] and the number
15 Xn of those points is Poisson distributed with rate
16 
17 1
2
.dw/[1 exp{w W.Sn.2/ /}] = 21 {W.Sn.2/ /}:
18 0

19 Finally, independence of the Xn s follows from the complete randomness of Wn.1/ .


20
21 C.4. Proof of theorem 5
22
23 E[D ] = E[E[D |W]] = E[W2 ] = E[W ]2 + var[W ]
 2 
24
= 2 w .dw/ + w2 .dw/
25 0 0
26
where the last line follows from Campbells theorem.
27
 
28 E[N.e/ ] = E[E[N.e/ |W]] = E[ [{1 exp.wi2 /} + {1 exp.2wi wj /}1j  ]1i  ]
i j=i
29  
30 = E[ [{1 exp.wi2 /} .1 exp.2wi2 //]1i  ] + E[ [ [1 exp.2wi wj /]1j  ]1i  ]:
i i j
31
32 Using the Palm formula for Poisson point processes (Bertoin, 2006; Daley and Vere-Jones 2008)

33  
34 E[ [ {1 exp.2wi wj /}1j  ]1i  ] = E[{1 exp.2w2 /} + {1 exp.2wwj /}1j  ].dw/
i j 0 j
35
and the nal expression is obtained by applying Campbells theorem. Finally, the expected number of
36 nodes
37  
38 E[N ] = E[E[N |W]] = E[ {1 exp.2wi wj 1j  + wi2 /}1i  ]
i j
39 

40 = E[{1 exp.2w wj 1j  w2 /}].dw/
0 j
41 

42 = .1 E[exp.2w wj 1j  w2 /]/.dw/
43 
0 j

44 = [1 exp{w2 .2w/}] .dw/
45 0
46 where we successively used the Palm formula and Campbells theorem. By dominated convergence,

47 E[N ] 0
.dw/ if the CRM is nite activity. Consider now that the CRM is innite activity.
48 Using integration by parts, we have
Sparse Graphs 37

1 E[N ] = {2w + 2  .2w/} exp{w2 .2w/}.w/dw: .63/
2 0

3 The Levy exponent is a strictly increasing function with .0/ = 0 and limt .t/ = and therefore
4 admits a well-dened inverse, denoted 1 : [0, / [0, /. Using the change of variable u = .2w/, we
obtain
5  
6 2  .2w/ exp{w2 .2w/}.w/ dw = exp[{ 1 .u/=2}2 u]{ 1 .u/=2} du:
7 0 0
  
8 Assume that 0 w .dw/ < . Now note that .t/ t0
t w .dw/ and therefore 1 .t/ t0 t= w.dw/.
0 0
9 If is a regularly varying Levy measure, then
10
11 .x/ x0 l.1=x/x
12 where [0, 1/ and l is a slowly varying function, and it therefore follows from lemma 3 in Appendix D
13 and 1 .0/ = 0 that
14   
u0
15 g.u/ := exp[{ 1 .u/=2}2 ]{ 1 .u/=2} l.1=u/u 2 w .dw/
0
16
17 where g is a monotone decreasing function. Applying the Tauberian theorem of proposition 2 in Appendix
D, we therefore have
18    

19 exp.u/g.u/du 1 l./ .1 / 2 w .dw/ :
20 0 0

21 Finally, combining the above asymptotics with equation (63), and noting that
22 
23 w exp{w2 .2w/}.w/dw = o.1/
0
24
25 by dominated convergence, and lim l./ > 0 for [0, 1/, we obtain
  
26
E[N ] 1+ l./.1 / 2 w.dw/ :
27 0
28
29 Appendix D: Technical lemmas
30
31 The following lemma is a corollary of theorem 3, page 239, in Feller (1971).
32 Lemma 1. Let .Xn /n=1, 2,::: be a sequence of mutually independent random variables with arbitrary
33 distribution and such that var.Xn /  E[Xn ] < . Let Sn = nk=1 Xk . If
34 lim E[Sn ] =
n
35 then
36
37 Sn =E[Sn ] 1 almost surely as n :
38
39 Proof. Assume for simplicity that E[Xn ] > 0 for all n (otherwise, consider the subsequence of random
variables with strictly positive mean). We have
40 
41 n var.X /
k n E[Xk ] 1
  dx <
k=1 1 + E[Sk ] k=1 1 + E[Sk ] 1 + x2
2 2
42 0
43 by Riemann integration. The result then follows from theorem 3, page 239, in Feller (1971) with bn = E[Sn ].
44
45 Lemma 2 (relating tail Levy intensity and Laplace
 exponent). (Gnedin et al. (2007), propositions

46 17 and 19) Let be a Levy measure, .x/ = x .dw/ be the tail Levy intensity and .t/ = 0 {1
exp.wt/}.dw/ its Laplace exponent. The following conditions are equivalent:
47
x0
48 .x/ l.1=x/x , .64/
38 F. Caron and E. B. Fox
t
1 .t/ .1 /t l.t/ .65/
2
where 0  < 1 and l is a function slowly varying at , i.e. satisfying l.cy/=l.y/ 1 as y for every
3 c > 0.
4
5 Lemma 3 (Resnick (1987), chapter 0, proposition 0.8). If U is a regularly varying function at 0 with
6 exponent R, and f is a positive function such that f.t/ t0 tc, for some constant 0 < c < , then
7 U{f.t/} t0 c U.t/.
8
9 Proposition 2 (Tauberian theorem). (Feller (1971), chapter XIII, section 5, theorems 3 and 4). Let
U.dw/ be a measure on .0, / with ultimately monotone density u, i.e. monotone in some interval .x0 , /.
10 Assume that
11 
12 L.t/ = exp.tw/u.w/dw
13 0

14 exists for t > 0. If l is slowly varying at and 0  a < , then the following two relationships are equivalent:
15
t
16 L.t/ t a l.t/, .66/
17 
x0 1 1
18 u.x/ a1
x l : .67/
.a/ x
19
20
21
22 Appendix E: Proof of theorem 6
23 The proof of theorem 6 relies on results on posterior characterization with models involving normal-
24 ized CRMs. We rst state a corollary of lemma 5 by Pitman (2003) and Theorem 8.1 by James (2002).
25 Similar results appear in (Prunster (2002), James (2005) and James et al. (2009). The corollary involves
the introduction of a discrete random variable R, conditional on which the CRM has strictly positive
26 mass.
27
28 Corollary 1. Let W be a (nite or innite) CRM on [0, ] without xed atoms nor deterministic
component, with mean measure .dw/d. Denote W = W .[0, ]/, with probability distribution G . Let
29
R {0, 1, 2, : : :} be a discrete random variable such that, for r  0,
30
31 .t/ := Pr.R = r|W = t/
r
32 with 0 .0/ = 1. The condition Pr.R = 0|W = 0/ = 1 ensures that, conditionally on R > 0, W > 0 almost
33 surely, and the normalized CRM below is properly dened.
34 Conditionally on R = r > 0, let X1 , : : : , Xr IIDW =W . Let 1 , : : : , k , k  r, be the unique values in
35 .X1 , : : : , Xr /, in order of appearance, with multiplicities 1  mj  r, wi = W .{i }/ the associated weights
and r = {A1 , : : : , Ak } with Ai = {j|Xj =
i } be the associated random partition of {1, : : : , r}. Let w =
36 W i wi . For r > 0, we have
37
38 Pr[R = r, R = {A1 , : : : , Ak }, .wi dwi /i=1,:::k , w dw /
 r 
39  k k
k

40 = w + wi r w + wi G .dw /k wimi .dwi / .68/


i=1 i=1 i=1
41
and
42
43 Pr.R = 0, w dw / = 0 .w /G .dw /:
44 We now prove theorem 6. Consider the conditionally Poisson construction that was described in Section
45 5.5:
46
D |W Poisson.W 2 /,

47
IID
48 .U1 , : : : , U2D

/|D , W

W =W :
Sparse Graphs 39
1 First, dene, for r {0, 2, 4, : : :} and t  0,
2 t r exp.t 2 /
3 r .t/ := Pr.2D = r|W = t/ =
.r=2/!
4
5

with 0 .0/ = 1. Conditionally on 2D = r > 0, U1 , : : : , Ur are IID from W =W . The variables U1 , : : : ,
 

6 Ur take N  r distinct values j , with multiplicities 1  mj  r. Let r = {A1 , : : : , AN } be the associated
partition of {1, : : : , r}. From corollary 1 we have, for r {2, 4, 6, : : :},
7
8 Pr[.D = r=2, 2D = {A1 , : : : , AN }, .wi dwi /i=1,:::N , w dw /]
9   2 
1 
N N

10 = exp w + wi G .dw /N wimi.dwi / .69/


.r=2/! i=1 i=1
11
and
12
13 Pr.D = 0, w dw / = exp.w2 /G .dw /: .70/
14 Finally, let be the permutation of {1, : : : , N } such that .1/ < .2/ < : : : < .N / and let wi = w.i/ and
15 mi = m.i/ . As the i are IID and independent of w
i , is uniformly distributed over the set of permutations
16 of {1, : : : , N }. The vector .m1 , : : : , mN / corresponds to the sizes of the partition 2D in exchangeable
17 random order (Pitman (2006), equation (2.7), page 39), and, for r {2, 4, 6, : : :},
18 Pr{D = r=2, .m1 , : : : , mN /, .wi dwi /i=1,:::N , w dw }
19   2 
r! N
N
20 = exp w + wi G .dw /N wimi .dwi /: .71/

N
i=1 i=1
21 .r=2/! N ! mi !
i=1
22
23
24 Appendix F: Details on the Markov chain Monte Carlo algorithms
25 The undirected graph sampler that was outlined in Section 7.2 iterates as follows.
26 Step 1: update w1:N given the rest with HMC sampling.
27 Step 2: update ., , , w / given the rest by using a MetropolisHastings step.
28 Step 3: update the latent counts nij given the rest by using either the full conditional or a Metropolis
29 Hastings step.
30
31 F.1. Step 1: update of w1:N
32 We use an HMC update for w1:N via an augmented system with momentum variables p. See Neal (2011)
33 for on overview. Let L  1 be the number of leapfrog steps and " > 0 the step size. For conciseness, we write
34 U  .w1:N , w , / = 1:N log{p.1:N , w , |D /}|w1:N , w ,
35
the gradient of the log-posterior in equation (45). The algorithm proceeds by rst sampling momentum
36 variables as
37
p N .0, IN /: .72/
38
39 The Hamiltonian proposal q.w1:N , p|w1:N , p/ is obtained by the following leapfrog algorithm (for sim-
40 plicity of exposition, we omit indices 1:N ). Simulate L steps of the discretized Hamiltonian via
41 "
p.0/ = p + U  .w, w , /,
42 2
43 w.0/ = w
44 and, for l = 1, : : : , L 1,
45
log.w.l/ / = log.w.l1/ / + "p.l1/ ,
46
47 p.l/ = p.l1/ + "U  .w.l/ , w , /
48 and nally set
40 F. Caron and E. B. Fox
1 log.w/= log.w.L1/ / + "p.L1/ ,
"
2 p = p.L1/ + U  .w, w , /
2
3 w = w.L/ :
4
5 Accept the proposal .w, p/ with probability min.1, r/ with
6 N  N 2  N
mi 
7 wi exp wi + w wi .wi /  
i=1 i=1 i=1 1N
8 r=     exp . p2
p2
/

N N 2

N 2 i=1 i i
9 wimi exp wi + w wi .wi /
i=1 i=1 i=1
10  N  mi   N 2  N 2 N   
11 wi    
N 1N
= exp wi + w + wi + w wi wi exp .p pi / :
2 2
12 i=1 wi i=1 i=1 i=1 i=1 2 i=1 i
13
14 F.2. Step 2: update of w* , , and
15 denote the density of G
Let g, , , , with respect to the reference measure + 0 , where we recall that
16 denotes the Lebesgue measure over [0, ]. For our MetropolisHastings step, we propose (, , ,w )
17 from q., , , w |, , , w / and accept with probability min.1, r/ where
18  N 2 

19 exp wi + w N  .w / p., , / q., , , w |, , , w /
i=1 .wi |, / g, ,
20 r=  N 2  : .73/
 i=1 .wi |, / g, , .w / p., , / q., , , w |, , , w /
21 exp wi + w
22 i=1

23 We shall use the proposal


24
q., , , w |, , , w / = q.| / q.|/ q.|, , w /q.w |, , , w /
25
26 where
27 q.| / = lognormal{; log. /, 2 },
28
q.|/ = lognormal{1 ; log.1 /, 2 }, .74/
29 
30 q.|, , w / = gamma{; N , , .2 wi + 2w /}
i
31
q.w |, , , w / = g, , +2i wi +2w .w /: .75/
32
33 The choice of the proposal for w and is motivated as follows. From equations (71) and (43), the
conditional density of ., w / given the rest is given by
34
35 .w / exp{.w +  w /2 }
p., w |rest/ N 1 g, , i
i
36
37 which is not of a standard form because of the square in the exponential. Motivated by a rst-order Taylor
38 approximation around the current MCMC value w ,
  
39 .w + wi /2  .w + wi /2 + 2.w w /.w + wi /,
i i i
40
41 we use a proposal
42 .w / exp{2w .w +  w /}
q., w |, , w / N 1 g, , i
43 i

44 which corresponds to the product of the proposals (74) and (75). The proposal for w can be written as
45 an exponential tilting of the probability density function g, .w / :
,
46  .w /
exp{2w . wi + w /}g, ,
47 g, , +2wi +2w .w / = 
48 exp{, .2 wi + 2w /}
i
Sparse Graphs 41
1 which will allow the terms involving the intractable probability density function g to cancel in the
2 MetropolisHastings ratio (73). g is either a gamma density . = 0/, a Poisson mixture of gamma densities
3 ( < 0) or an exponentially tilted stable density ( > 0) for which efcient samplers exist (Devroye, 2009;
Hofert, 2011).
4 Under the improper priors (43), the acceptance probability reduces to having
5
6  N 2  N 2   
  
N
7 r = exp wi + w + wi + w exp . + 2w 2w / wi
i=1 i=1 i=1
8 
N + .1 /, .2w + 2 wi / N
9
wi 
i
:
10 i=1 .1 /, .2w + 2 wi /
i
11
12
13 F.3. Step 3: update of the latent variables nN ij
14 Concerning the latent nij , the conditional distribution is a truncated Poisson distribution (44) from which
15 we can sample directly. An alternative strategy, which may be more efcient for a large number of edges,
16 is to use a MetropolisHastings random-walk proposal.
17
18 References
19 Aalen, O. (1992) Modelling heterogeneity in survival analysis by the compound Poisson distribution. Ann. Appl.
20 Probab., 951972.
21 Adamic, L. A. and Glance, N. (2005) The political blogosphere and the 2004 US election: divided they blog. In
22 Proc. 3rd Int. Wrkshp Link Discovery, pp. 3643. New York: Association for Computing Machinery.
Airoldi, E. M., Blei, D., Fienberg, S. E. and Xing, E. (2008) Mixed membership stochastic blockmodels. J. Mach.
23 Learn. Res., 9, 19812014.
24 Airoldi, E. M., Costa, T. B. and Chan, S. H. (2013) Stochastic blockmodel approximation of a graphon: theory
25 and consistent estimation. In Advances in Neural Information Processing Systems, vol. 26.
Aldous, D. (1985) Exchangeability and related topics. In Ecole dEte de Probabilites de Saint-Flour XIII1983, pp.
26 1198. Springer.
27 Aldous, D. (1997) Brownian excursions, critical random graphs and the multiplicative coalescent. Ann. Probab.,
28 812854.
Aldous, D. J. (1981) Representations for partially exchangeable arrays of random variables. J. Multiv. Anal., 11,
29 581598.
30 Arcones, M. and Gine, E. (1992) On the bootstrap of U and V statistics. Ann. Statist., 655674.
31 Barabasi, A. L. and Albert, R. (1999) Emergence of scaling in random networks. Science, 286, 509512.
Bastian, M., Heymann, S. and Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating
32 networks. ICWSM, 8, 361362.
33 Berger, N., Borgs, C., Chayes, J. T. and Saberi, A. (2014) Asymptotic behavior and distributional limits of pref-
34 erential attachment graphs. Ann. Probab., 42, 140.
Bertoin, J. (2006) Random Fragmentation and Coagulation Processes. Cambridge: Cambridge University Press.
35 Bickel, P. J. and Chen, A. (2009) A nonparametric view of network models and NewmanGirvan and other
36 modularities. Proc. Natn. Acad. Sci., 106, 2106821073.
37 Bickel, P. J., Chen, A. and Levina, E. (2011) The method of moments and degree distributions for network models.
Ann. Statist., 39, 22802301.
38 Blackwell, D. and MacQueen, J. B. (1973) Ferguson distributions via Polya urn schemes. Ann. Statist., 353355.
39 Bollobas, B. (1980) A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur.
40 J. Combin., 1, 311316.
Bollobas, B. (2001) Random Graphs. Cambridge: Cambridge University Press.
41 Bollobas, B., Janson, S. and Riordan, O. (2007) The phase transition in inhomogeneous random graphs. Rand.
42 Struct. Algs, 31, 3122.
43 Bollobas, B. and Riordan, O. (2009). Metrics for sparse graphs. In Surveys in Combinatorics (eds. S. Huczynska,
J. Mitchell and C. Roney-Dougal), pp. 211287. Cambridge: Cambridge University Press.
44 Borgs, C., Chayes, J. T., Cohn, H. and Holden, N. (2016) Sparse exchangeable graphs and their limits via graphon
45 processes. Preprint arXiv:1601.07134.
46 Borgs, C., Chayes, J., Cohn, H. and Zhao, Y. (2014a) An Lp theory of sparse graph convergence I: Limits, sparse
random graph models, and power law distributions. Preprint arXiv:1401.2906.
47 Borgs, C., Chayes, J., Cohn, H. and Zhao, Y. (2014b) An Lp theory of sparse graph convergence II: LD convergence,
48 quotients, and right convergence. Preprint arXiv:1408.0744.
42 F. Caron and E. B. Fox
1 Borgs, C., Chayes, J. T. and Gamarnik, D. (2016) Convergent sequences of sparse graphs: a large deviations
2 approach. Rand. Struct. Algs, to be published.
Borgs, C., Chayes, J. T. and Lovasz, L. (2010) Moments of two-variable functions and the uniqueness of graph
3 limits. Geometr. Functnl Anal., 19, 15971619.
4 Borgs, C., Chayes, J. T., Lovasz, L., Sos, V. T. and Vesztergombi, K. (2008) Convergent sequences of dense graphs
5 I: Subgraph frequencies, metric properties and testing. Adv. Math., 219, 18011851.
Britton, T., Deijfen, M. and Martin-Lof, A. (2006) Generating simple random graphs with prescribed degree
6 distribution. J. Statist. Phys., 124, 13771397.
7 Brix, A. (1999) Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Probab., 31, 929953.
8 Brooks, S. P. and Gelman, A. (1998) General methods for monitoring convergence of iterative simulations. J.
Computnl Graph. Statist., 7, 434455.
9 Bu, D., Zhao, Y., Cai, L., Xue, H., Zhu, X., Lu, H., Zhang, J., Sun, S., Ling, L. and Zhang, N. (2003) Topological
10 structure analysis of the proteinprotein interaction network in budding yeast. Nucleic Acids Res., 31, 2443
11 2450.
Buhlmann, H. (1960) Austauschbare stochastische Variablen und ihre Grenzwertsatze. PhD Thesis. University of
12 California at Berkeley, Berkeley.
13 Caron, F. (2012) Bayesian nonparametric models for bipartite graphs. In Advances in Neural Information Processing
14 Systems, vol. 25.
Caron, F. and Fox, E. B. (2014) Bayesian nonparametric models of sparse and exchangeable random graphs.
15 Preprint ArXiv 1401.1137.
16 Caron, F., Teh, Y. W. and Murphy, T. B. (2014) Bayesian nonparametric Plackett-Luce models for the analysis of
preferences for college degree programmes. Ann. Appl. Statist., 8, 11451181.
17 Chen, T., Fox, E. and Guestrin, C. (2014). Stochastic gradient Hamiltonian Monte Carlo. In Proc. Int. Conf.
18 Machine Learning, pp. 16831691.
19 Choi, D. and Wolfe, P. J. (2014) Co-clustering separately exchangeable network data. Ann. Statist., 42, 2963.
Clauset, A., Shalizi, C. R. and Newman, M. E. J. (2009) Power-law distributions in empirical data. SIAM Rev.,
20 51, 661703.
21 Colizza, V., Pastor-Satorras, R. and Vespignani, A. (2007) Reactiondiffusion processes and metapopulation
22 models in heterogeneous networks. Nat. Phys., 3, 276282.
Daley, D. J. and Vere-Jones, D. (2003) An Introduction to the Theory of Point Processes, vol. I, Elementary Theory
23 and Methods, 2nd edn. New York: Springer.
24 Daley, D. J. and Vere-Jones, D. (2008) An Introduction to the Theory of Point Processes, vol. II, General Theory
25 and Structure 2nd edn. New York: Springer.
Devroye, L. (2009) Random variate generation for exponentially and polynomially tilted stable distributions.
26 ACM Trans. Modlng Comput. Simuln, 19, 18.
27 Diaconis, P. and Janson, S. (2008) Graph limits and exchangeable random graphs. Rend. Mat. Applic. Ser., VII,
28 3361.
Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. (1987) Hybrid Monte Carlo. Phys. Lett. B, 195,
29 216222.
30 Durrett, R. (2007). Random Graph Dynamics. New York: Cambridge University Press.
31 Favaro, S. and Teh, Y. (2013) MCMC for normalized random measure mixture models. Statist. Sci., 28, 335359.
Feller, W. (1971) An Introduction to Probability Theory and its Applications, vol. II, 2nd edn. New York: Wiley.
32 Ferguson, T. and Klass, M. (1972) A representation of independent increment processes without gaussian com-
33 ponents. Ann. Math. Statist., 43, 16341643.
34 Fienberg, S. E. (2012) A brief history of statistical models for network analysis and open challenges. J. Computnl
Graph. Statist., 21, 825839.
35 de Finetti, B. (1931) Funzione caratteristica di un fenomeno aleatorio. Atti R. Acad. Nazn. Linc. Ser. 6, 4, 251
36 299.
37 Freedman, D. A. (1996) De Finettis theorem in continuous time. Lect. Notes Monogr. Ser., 8398.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D., Vehtari, A. and Rubin, D. (2014) Bayesian Data Analysis.
38 Boca Raton: Chapman and HallCRC.
39 Gine, E. and Zinn, J. (1992). Marcinkiewicz type laws of large numbers and convergence of moments for
40 U-statistics. In 8th Proc. Int. Conf. Probability in Banach Spaces, pp. 273291.
Gnedin, A., Hansen, B. and Pitman, J. (2007) Notes on the occupancy problem with innitely many boxes: general
41 asymptotics and power laws. Probab. Surv., 4 (146171), 88.
42 Gnedin, A., Pitman, J. and Yor, M. (2006) Asymptotic laws for compositions derived from transformed subordi-
nators. Ann. Probab., 34, 468492.
43 Goldenberg, A., Zheng, A., Fienberg, S. and Airoldi, E. (2010) A survey of statistical network models. Foundns
44 Trends Mach. Learn., 2, 129233.
45 Herlau, T., Schmidt, M. N. and Mrup, M. (2014) Innite-degree-corrected stochastic block model. Phys. Rev.
E, 90, 032819.
46 Herlau, T., Schmidt, M. N. and Mrup, M. (2016) Completely random measures for modelling block-structured
47 sparse networks. In Advances in Neural Information Processing Systems, vol. 29.
48 Hofert, M. (2011) Sampling exponentially tilted stable distributions. ACM Trans. Modlng Comput. Simuln, 22, 3.
Sparse Graphs 43
1 Hoff, P. D. (2009) Multiplicative latent factor models for description and prediction of social networks. Computnl
2 Math. Organizn Theory, 15, 261272.
Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002) Latent space approaches to social network analysis. J.
3 Am. Statist. Ass., 97, 10901098.
4 van der Hofstad, R. (2014) Random Graphs and Complex Networks, vol. I, Technical Report. Eindhoven: Eindhoven
5 University of Technology.
Hoover, D. N. (1979) Relations on probability spaces and arrays of random variables. Preprint. Institute for
6 Advanced Study, Princeton.
7 Hougaard, P. (1986) Survival models for heterogeneous populations derived from stable distributions. Biometrika,
8 73, 387396.
Jacobs, A. Z. and Clauset, A. (2014) A unied view of generative models for networks: models, methods, oppor-
9 tunities and challenges. Preprint arXiv:1411.4070.
10 James, L. F. (2002) Poisson process partition calculus with applications to exchangeable models and bayesian
11 nonparametrics. Preprint arXiv math/0205093.
James, L. F. (2005) Bayesian Poisson process partition calculus with an application to Bayesian Levy moving
12 averages. Ann. Statist., 17711799.
13 James, L. F., Lijoi, A. and Prunster, I. (2009) Posterior analysis for normalized random measures with independent
14 increments. Scand. J. Statist., 36, 7697.
Janson, S. (2011) Probability asymptotics: notes on notation. Preprint arXiv:1108.3924.
15 Kallenberg, O. (1990) Exchangeable random measures in the plane. J. Theoret. Probab., 3, 81136.
16 Kallenberg, O. (2005) Probabilistic Symmetries and Invariance Principles. New York: Springer.
17 Karlin, S. (1967) Central limit theorems for certain innite urn schemes. J. Math. Mech., 17, 373401.
Karrer, B. and Newman, M. E. (2011) Stochastic blockmodels and community structure in networks. Phys. Rev.
18 E, 83, 016107.
19 Kemp, C., Tenenbaum, J. B., Grifths, T. L., Yamada, T. and Ueda, N. (2006) Learning systems of concepts with
20 an innite relational model. In AAAI, vol. 21, pp. 381.
Khintchine, A. (1937) Zur theorie der unbeschrankt teilbaren Verteilungsgesetze. Mat. Sborn., 2, 79119.
21 Kingman, J. F. C. (1967) Completely random measures. Pacif. J. Math., 21, 5978.
22 Kingman, J. F. C. (1993) Poisson Processes, Vol. 3. New York: Oxford University Press.
23 Lauritzen, S. (2008). Exchangeable Rasch matrices. Rend. Mat. Ser. VII, 28, 8395.
Lee, M.-L. T. and Whitmore, G. A. (1993) Stochastic processes directed by randomized time. J. Appl. Probab.,
24 302314.
25 Lewis, P. A. and Shedler, G. S. (1979) Simulation of nonhomogeneous Poisson processes by thinning. Navl Res.
26 Logist. Q., 26, 403413.
Lijoi, A., Mena, R. H. and Prunster, I. (2007) Controlling the reinforcement in Bayesian non-parametric mixture
27 models. J. R. Statist. Soc. B, 69, 715740.
28 Lijoi, A. and Prunster, I. (2003) On a normalized random measure with independent increments relevant to
29 Bayesian nonparametric inference. In Proc. 13th Eur. Young Statisticians Meet., pp. 123134. Bernoulli Society.
Lijoi, A. and Prunster, I. (2010) Models beyond the Dirichlet process. In Bayesian Nonparametrics (eds. N. L.
30 Hjort, C. Holmes, P. Muller and S. G. Walker). Cambridge: Cambridge University Press.
31 Lijoi, A., Prunster, I. and Walker, S. G. (2008) Investigating nonparametric priors with Gibbs structure. Statist.
32 Sin., 18, 1653.
Lloyd, J., Orbanz, P., Ghahramani, Z. and Roy, D. (2012) Random function priors for exchangeable arrays with
33 applications to graphs and relational data. In Advances in Neural Information Processing Systems, vol. 25.
34 Lovasz, L. (2013) Large Networks and Graph Limits, vol. 60. American Mathematical Society.
35 Lovasz, L. and Szegedy, B. (2006) Limits of dense graph sequences. J. Combin. Theory B, 96, 933957.
McAuley, J. and Leskovec, J. (2012) Learning to discover social circles in ego networks. In Advances in Neural
36 Information Processing Systems, vol. 25, pp. 539547.
37 Miller, K., Grifths, T. and Jordan, M. (2009) Nonparametric latent feature models for link prediction. In Advances
38 in Neural Information Processing Systems, vol. 22.
Neal, R. M. (2011) MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (eds. S.
39 Brooks, A. Gelman, G. Jones and X.-L. Meng), vol. 2. Boca Raton: Chapman and HallCRC.
40 Nesetril, J. and Ossona de Mendez, P. (2012) Sparsity (Graphs, Structures, and Algorithms). Berlin: Springer.
41 Newman, M. E. J. (2001) The structure of scientic collaboration networks. Proc. Natn. Acad. Sci. USA, 98,
404409.
42 Newman, M. E. J. (2003) The structure and function of complex networks. SIAM Rev., 167256.
43 Newman, M. E. J. (2010) Networks: an Introduction. New York: Oxford University Press.
44 Newman, M. E. J., Strogatz, S. H. and Watts, D. J. (2001) Random graphs with arbitrary degree distributions and
their applications. Phys. Rev. E, 64, 26118.
45 Norros, I. and Reittu, H. (2006) On a conditionally Poissonian graph process. Adv. Appl. Probab., 38, 5975.
46 Nowicki, K. and Snijders, T. (2001) Estimation and prediction for stochastic blockstructures. J. Am. Statist. Ass.,
47 96, 10771087.
Ogata, Y. (1981) On Lewis simulation method for point processes. IEEE Trans. Inform. Theory, 27, 2331.
48
44 F. Caron and E. B. Fox
1 Olhede, S. C. and Wolfe, P. J. (2012) Degree-based network models. Preprint arXiv:1211.6537. University College
2 London: London.
Opsahl, T. and Panzarasa, P. (2009) Clustering in weighted networks. Socl Netwrks, 31, 155163.
3 Orbanz, P. and Roy, D. M. (2015) Bayesian models of graphs, arrays and other exchangeable random structures.
4 IEEE Trans. Pattn Anal. Mach. Intell., 37, 437461.
5 Palla, K., Knowles, D. A. and Ghahramani, Z. (2012) An innite latent attribute model for network data. In Proc.
Int. Conf. Machine Learning.
6 Penrose, M. (2003) Random Geometric Graphs, vol. 5. New York: Oxford University Press.
7 Pitman, J. (1995) Exchangeable and partially exchangeable random partitions. Probab. Theory Reltd Flds, 102,
8 145158.
Pitman, J. (1996) Some developments of the Blackwell-MacQueen urn scheme. Lect. Notes Monogr. Ser., 245267.
9 Pitman, J. (2003) Poisson-Kingman partitions. Lect. Notes Monogr. Ser., 134.
10 Pitman, J. (2006) Combinatorial stochastic processes. In Ecole dEte de Probabilites de Saint-Flour XXXII2002.
11 New York: Springer.
Prunster, I. (2002) Random probability measures derived from increasing additive processes and their application
12 to Bayesian statistics. PhD Thesis. University of Pavia, Pavia.
13 Regazzini, E., Lijoi, A. and Prunster, I. (2003) Distributional results for means of normalized random measures
14 with independent increments. Ann. Statist., 31, 560585.
Resnick, S. (1987) Extreme Values, Point Processes and Regular Variation. New York: Springer.
15 Rohe, K., Chatterjee, S. and Yu, B. (2011) Spectral clustering and the high-dimensional stochastic blockmodel.
16 Ann. Statist., 39, 18781915.
17 Todeschini, A. and Caron, F. (2016) Exchangeable random measures for sparse and modular graphs with over-
lapping communities. Preprint arXiv:1602.02114.
18 Veitch, V. and Roy, D. M. (2015) The class of random graphs arising from exchangeable random measures. Preprint
19 arXiv:1512.03099.
20 Watts, D. J. and Strogatz, S. H. (1998) Collective dynamics of small-world networks. Nature, 393, 440442.
Wolfe, P. J. and Olhede, S. C. (2013) Nonparametric graphon estimation. Preprint arXiv:1309.5936. University
21 College London, London.
22 Zhao, Y., Levina, E. and Zhu, J. (2012) Consistency of community detection in networks under degree-corrected
23 stochastic block models. Ann. Statist., 40, 22662292.
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Вам также может понравиться