1 views

Uploaded by kamakshi2

Benchmarks

- ASSESSMENT - Limits & Continuity
- Kristensen, P. (2018) - International Relations and the End; A Sociological Autopsy
- gephi_tutorial.pdf
- Graph Theory
- Sample 1 Sol
- MATHEMATICAL COMBINATORICS (INTERNATIONAL BOOK SERIES), Volume 1 / 2010
- Impact of Graphs and Network in Minimizing Project and Product Cost
- Community Structure
- Exam1 s16 Sol
- Transformation de Laplace
- Bessel Differential Equation
- Tower of Hanoi
- A Little Bit of Classics_ Dynamic Programming Over Subsets and Paths in Graphs - Codeforces
- An Efficient Algorithm based on Modularity Optimization
- CAT 2009 Qunat Test 8
- Advanced Pattern Part Test 5 Solutions
- Lecture 1
- MINLPSolver Tutorial
- 06JuppGeroJASIST.pdf
- previous years iit questions

You are on page 1of 8

Andrea Lancichinetti1 and Santo Fortunato1

1

Complex Networks Lagrange Laboratory (CNLL),

Institute for Scientific Interchange (ISI), Viale S. Severo 65, 10133, Torino, Italy

Many complex networks display a mesoscopic structure with groups of nodes sharing many links

with the other nodes in their group and comparatively few with nodes of different groups. This

feature is known as community structure and encodes precious information about the organization

and the function of the nodes. Many algorithms have been proposed but it is not yet clear how they

should be tested. Recently we have proposed a general class of undirected and unweighted bench-

mark graphs, with heterogenous distributions of node degree and community size. An increasing

arXiv:0904.3940v1 [physics.soc-ph] 24 Apr 2009

attention has been recently devoted to develop algorithms able to consider the direction and the

weight of the links, which require suitable benchmark graphs for testing. In this paper we extend the

basic ideas behind our previous benchmark to generate directed and weighted networks with built-in

community structure. We also consider the possibility that nodes belong to more communities, a

feature occurring in real systems, like, e. g., social networks. As a practical application, we show

how modularity optimization performs on our new benchmark.

Keywords: Networks, community structure, testing

of the same group and of different groups are pin and

Complex systems are characterized by a division in pout , respectively. This benchmark is a special case of

subsystems, which in turn contain other subsystems in a the planted `-partition model [10]. However, it has two

hierarchical fashion. Herbert A. Simon, already in 1962, drawbacks: 1) all nodes have the same expected degree;

pointed out that such hierarchical organization plays a 2) all communities have equal size. These features are

crucial role both in the generation and in the evolution unrealistic, as complex networks are known to be charac-

of complex systems [1]. Many complex systems can be terized by heterogeneous distributions of degree [2, 3, 11]

described as graphs, or networks, where the elementary and community sizes [9, 12, 13, 14, 15]. In a recent pa-

parts of a system and their mutual interactions are nodes per [16], we have introduced a new class of benchmark

and links, respectively [2, 3]. In a network, the subsys- graphs, that generalize the benchmark by Girvan and

tems appear as subgraphs with a high density of internal Newman by introducing power law distributions of de-

links, which are loosely connected to each other. These gree and community size. Most community detection al-

subgraphs are called communities and occur in a wide gorithms perform very well on the benchmark by Girvan

variety of networked systems [4, 5]. Communities reveal and Newman, due to the simplicity of its structure. The

how a network is internally organized, and indicate the new benchmark, instead, poses a much harder test to

presence of special relationships between the nodes, that algorithms, and makes it easier to disclose their limits.

may not be easily accessible from direct empirical tests. Most research on community detection focuses on the

Communities may be groups of related individuals in so- simplest case of undirected and unweighted graphs, as

cial networks [4, 6], sets of Web pages dealing with the the problem is already very hard. However, links of net-

same topic [7], biochemical pathways in metabolic net- works from the real world are often directed and carry

works [8, 9], etc. weights, and both features are essential to understand

For these reasons, detecting communities in networks their function [17, 18]. Moreover, in real graphs com-

has become a fundamental problem in network science. munities are sometimes overlapping [9], i. e. they share

Many methods have been developed, using tools and vertices. This aspect, frequent in certain types of sys-

techniques from disciplines like physics, biology, applied tems, like social networks, has received some attention

mathematics, computer and social sciences. However, in the last years [15, 19, 20, 21]. Finding communities

there is no agreement yet about a set of reliable algo- in networks with directed and weighted edges and possi-

rithms, that one can use in applications. The main rea- bly overlapping communities is highly non-trivial. Many

son is that current techniques have not been thoroughly techniques working on undirected graphs, for instance,

tested. Usually, when a new method is presented, it is ap- cannot be extended to include link direction. This im-

plied to a few simple benchmark graphs, artificial or from plies the need of new approaches to the problem. In any

the real world, which have a known community structure. case, once a method is designed, it is important to test it

The most used benchmark is a class of graphs introduced against reliable benchmarks. Since the new benchmark of

by Girvan and Newman [4]. Each graph consists of 128 Ref. [16] is defined for undirected and unweighted graphs,

nodes, which are divided into four groups of 32: the prob- we extend it here to the directed and weighted cases. For

2

any type of benchmark, we will include the possibility to smax = max{sξ } 6 N and νmax = max{νi } 6 nc ,

have overlapping communities. Sawardecker et al. have where N is the number of nodes and nc the number

recently proposed a different benchmark with overlap- of communities. At this point, we have to decide

ping communities where the probability that two nodes which communities each node should be included

are linked grows with the number of communities both into. This is equivalent to generating a bipartite

nodes belong to [22]. network where the two classes are the nc commu-

Our algorithms to create the benchmark graphs have a nities and the N nodes; each community ξ has sξ

computational complexity which grows linearly with the links, whereas each node has as many links as its

number of links and reduce considerably the fluctuations memberships νi (Fig. 1). The network can be eas-

of specific realizations of the graphs, so that they come as

close as possible to the type of structure described by the

input parameters. We use our benchmark to make some

testing of modularity optimization [23], which is well de-

fined in the case of directed and weighted networks [24].

In Section II we describe the algorithms to create the

new benchmarks. Tests are presented in Section III. Con-

clusions are summarized in Section IV.

benchmark for undirected graphs with overlaps between

communities. Then we extend it to the case of weighted

and directed graphs.

Nodes Communities

generate undirected and unweighted benchmark graphs,

where each node is allowed to have memberships in more FIG. 1: Schematic diagram of the bipartite graph used to

communities. The algorithm consists of the following assign nodes to their communities. Each node has as many

steps: stubs as the number of communities it belongs to, whereas

the number of stubs of each community matches the size of

1. We first assign the number νi of memberships of the community. The memberships are assigned by joining the

node i, i.e. the number of communities the node be- stubs on the left with those on the right.

longs to. Of course, if each node has only one mem-

bership, we recover the benchmark of Ref. [16]; in ily generated with the configuration model [25]. To

general we can assign the number of memberships build the graph, it is important to take into account

according to a certain distribution. Next, we assign the constraint

(in)

X

the degrees {ki } by drawing N random numbers sξ > ki , ∀i, (1)

from a power law distribution with exponent τ1 . i→ξ

We also introduce the topological mixing parameter

(in)

µt : ki = (1 − µt )ki is the internal degree of the where the sum is relative to the communities in-

node i, i. e. the number of neighbors of node i cluding node i. This condition means that each

which have at least one membership in common node cannot have an internal degree larger than

with i. In this way, the internal degree is a fixed the highest possible number of nodes it can be con-

fraction of the total degree for all the nodes. Of nected to within the communities it stays in. We

course, it is straightforward to generalize the algo- perform a rewiring process for the bipartite network

rithm to implement a different rule (one can intro- until the constraint is satisfied. For some choices of

duce a non linear functional dependence, individual the input parameters, it could happen that, after

mixing parameters, etc.). some iterations, the constraint is still unsatisfied.

In this case one can change the sizes of the com-

2. The community sizes {sξ } are assigned by draw- munities, by merging some of them, for instance.

ing random numbers from another power law with It turns out that this is not necessary in most sit-

exponent τ2 . Naturally, the sum of the commu- uations and that, when it is, the perturbations in-

nity sizes must equal

P the sum P of the node mem- troduced in the community size distributions are

berships, i. e. ξ sξ = i νi . Furthermore not too large. In general, it is convenient to start

3

(in) (in) A B

smin > kmin and smax > kmax .

So far we assigned an internal degree to each node

but it has not been specified how many links should

be distributed among the communities of the node.

Again, one can follow several recipes; we chose the

(in)

simple equipartition ki (ξ) = ki /νi , where ki (ξ)

is the number of links which i shares in community

ξ, provided that i holds membership in ξ. Some

adjustments may be necessary to assure

C D

which is the strong version of Eq. 1.

A B

erating nc subgraphs, one for each community. In

fact, our definition of community ξ is nothing but a

random subgraph of sξ nodes with degree sequence

{ki (ξ)}, which can be built via the configuration

model, with a rewiring procedure to avoid multiple

links. Note that Eq. 2 is necessary to generate the

configuration model, butPin general not sufficient.

For one thing, we need i ki (ξ) to be even. This

might cause a change in the degree sequence, which

is generally not appreciable. Once each subgraph

is built, we obtain a graph divided in components.

Note that because of the overlapping nodes, some

C D

components may be connected to each other, and

in principle the whole graph might be connected. FIG. 2: Scheme of the rewiring procedure necessary to build

Furthermore, if two nodes belong simultaneously the graph G (ext) , which includes only links between nodes of

to the same two (or more) communities, the proce- different communities. (Top) If two nodes (A and B) with

a common membership are neighbors, their link is rewired

dure may set more than one link between the nodes.

along with another link joining two other nodes C and D,

A rewiring strategy similar to that described below where C does not have memberships in common with A, and

suffices to avoid this problem. D is a neighbor of C not connected to B. In the final con-

figuration (bottom), the degrees of all nodes are preserved,

4. The last step of the algorithm consists in adding and the number of links between nodes with common mem-

the links external to the communities. To do this, berships has decreased by one (since A and B are no longer

(ext)

let us consider the degree sequence {ki }, where connected), or it has stayed the same (if B and D, which are

(ext) (in) now neighbors, have common memberships).

simply ki = ki −ki = µt ki . We want to insert

randomly these links in our already built network

without changing the internal degree sequences. In

order to do so, we build a new network G (ext) of N it cannot increase it. This means that after a few

nodes with degree sequence {ki

(ext)

}, and we per- sweeps over all the nodes we reach a steady state

form a rewiring process each time we encounter where the number of internal links is very close to

a link between two nodes which have at least one zero (if no node has ki ∼ N , the internal links of

membership in common (Fig. 2), since we are sup- G (ext) are just a few and one sweep is sufficient).

posed to join only nodes of different communities Fig. 3 shows how the number of internal links de-

at this stage. Let us assume that A and B are in creases during the rewiring procedure. Finally, we

the same community and that they are linked in have to superimpose G (ext) on the previous one.

G (ext) ; we pick a node C which does not share any

membership with A, and we look for a neighbor of In our previous work about benchmarking [16], we dis-

C (call it D) which is not neighbor of B. Next, we cussed the dispersion of the internal degree around the

(in)

replace the links A − B and C − D with the new fixed value ki . In this case, if the number of internal

(ext)

links A − C and B − D. This rewiring procedure links of G goes to zero, the only reason not to have a

can decrease the number of internal links of G (ext) perfectly sharp function for the distribution of the mix-

or leaving it unchanged (this happens only when ing parameters of the nodes in specific realizations of the

B and D have one membership in common) but new benchmark is a round-off problem, i.e. the problem

4

1

P

of rounding integer numbers. where h 2m ξ

iξ = 1/νij ξ 1/2mξ , and ξ runs only over

the common memberships of the nodes.

On the other hand, if i and j do not share any mem-

bership, the probability to have a link between them is:

4000 (ext) (ext)

N = 1000 , <k> = 50, µt = 0.8

ki kj ki kj

pij ' = µt (5)

2m(ext) 2m

3000 P (ext)

where 2m(ext) = i ki

P

= µt i ki is the number of

Internal degree

the rewiring process does not affect too much the prob-

2000

abilities, i.e. if the communities are small compared to

the size of the network.

These results are based on some assumptions which are

1000

likely to be not exactly, but only approximately valid.

Anyway, carrying out the right calculation is far from

trivial and surely beyond the scope of this paper.

0

0 500 1000 1500 2000 We conclude this section with a remark about the com-

Rewiring steps plexity of the algorithm. The configuration model takes

a time growing linearly with the number of links m of the

network. If the rewiring procedure takes only a few iter-

FIG. 3: Number of internal links of G (ext) as a function of the ations, like it happens in most instances, the complexity

rewiring steps. The network has 1000 nodes, and an average of the algorithm is O(m) (Fig. 4).

degree hki = 50. Since the mixing parameter is µt = 0.8

and there are 10 equal-sized communities, at the beginning

(in)

each node has an expected internal degree in G (ext) ki =

0.8 ∗ 50 ∗ 1/10 = 4, so the total internal degree is around

4000. After each rewiring step, the internal degree either 2

decreases by 2, or it does not change. In this case, less then N =1000, µt = 0.1

2100 rewiring steps were needed. N = 1000, µt= 0.5

Computational time (s)

N = 5000, µt = 0.1

1.5

N = 5000, µt = 0.5

Other benchmarks, like that by Girvan and Newman,

are based on a similar definition of communities, ex-

pressed in terms of different probabilities for internal and 1

external links. One may wonder what is the connection

between our benchmark and the others. It is not difficult

to compute an approximation of how the probability of 0.5

having a link between two nodes in the same community

depends on the mixing parameter µt .

In the configuration model, the probability to have a 0

15 20 25 30

connection between nodes i and j with ki and kj links <k>

ki kj

respectively is approximately pij = 2m , provided that

ki 2m and kj 2m. If the approximation holds,

our prescription to assign ki (ξ) allows us to compute the FIG. 4: Computational time to build the unweighted bench-

probability that i and j get a link in the community ξ: mark as a function of the average degree. We show the results

for networks of 1000 and 5000 nodes. µt was set equal to 0.1

ki (ξ)kj (ξ) 1 ki kj and 0.5 (the latter requires more time for the rewiring pro-

pij (ξ) ' = (1 − µt )2 , (3) cess). Note that between the two upper lines and the lower

2mξ νi νj 2mξ

ones there is a factor of about 5, as one would expect if com-

P plexity is linear in the number of links m.

where 2mξ = i ki (ξ) is the number of internal links

in the community (we recall that νi is the number of

memberships of node i). If i and j share a number νij

of memberships and all the respective pij (ξ) are small,

B. Weighted networks

the probability that they get a link somewhere can be

approximated with the sum over all the common com-

munities. The final result is In order to build a weighted network, we first generate

an unweighted network with a given topological mixing

νij 1 parameter µt and then we assign a positive real number

pij ' (1 − µt )2 ki kj h iξ , (4)

νi νj 2mξ to each link.

5

To do this we need to specify two other parameters, β value. With our procedure the value of Var({wij })

and µw . The parameter β is used to assign a strength si decreases at least exponentially with the number

to each node, si = kiβ ; such power law relation between of iterations, consisting in sweeps over all network

the strength and the degree of a node is frequently ob- links. (Fig. 5).

served in real weighted networks [18]. The parameter µw

(in) For the distribution of the weights wij , we expect the

is used to assign the internal strength si = (1 − µw ) si , (int) (in) P (in) (in)

which is defined as the sum of the weights of the links averages hwi i = 1/ki j wij κ(i, j) = si /ki

(ext) (ext) (ext)

between node i and all its neighbors having at least one and hwi i = si /ki . Note that these expressions

membership in common with i. The problem is equiv- can be related to the mixing parameters in a simple way

alent to finding an assignment of m positive numbers (Fig. 6):

{wij } such to minimize the following function:

X (in) (in) (ext) (ext) 2 (int) 1 − µw β−1 (ext) µw β−1

Var({wij }) = (si −ρi )2 +(si −ρi )2 +(si −ρi ) . hwi i = k and hwi i= k .

1 − µt i µt i

i

(6) (7)

(in,ext)

Here si and si indicate the strengths which we

(in)

would like to assign, i.e. si = kiβ , si = (1 − µw ) si ,

(out) ∗

si = µw si ; {ρi } are the total, internal and external

strengths of node i defined through its link weights, i.e. N = 5000 , <k> = 20, µt = 0.4, µw= 0.2, β = 1.5

P (in) P (out) P 1e+06

ρi = j wij , ρi = j wij κ(i, j), ρi = j wij (1 −

κ(i, j)), where the function κ(i, j) = 1 if nodes i and j

share at least one membership, and κ(i, j) = 0 otherwise. 1000

(in,ext) Var { w ij}

We have to arrange things so that si and si are

∗

consistent with the {ρi }. For that we need a fast algo- 1

rithm to minimize Var({wij }). We found that the greedy

algorithm described below can do this job well enough for

0.001

the cases of our interest.

zero. 0 5 10 15 20 25 30

Number of iterations

2. We take node i and increase the weight of each of its

links by an amount ui = sik−ρ i

i

, where ρi indicates

the sum of the links’ weights resulting from the FIG. 5: Value of Var({wij }) (Eq. 6) after each update. Each

previous step, i. e. before we increment them. In point corresponds to one sweep over all the nodes.

this way, since initially {ρ∗i } = 0, the weights of the

links of i after the first step take the (equal for all) Since Var({wij }) decreases exponentially, the number

value ksii , and ρi = si by construction, condition of iterations needed to reach convergence has a slow

that is maintained along the whole procedure. We dependence on the size of the network so it does not

update {ρ∗i } for the node i and its neighbors. contribute much to the total complexity, which remains

3. Still for node i we increase all the weights wij by an O(m) (Fig. 7).

(in) (in)

si −ρi

amount (in) if κ(i, j) = 1 and by an amount

ki

(in)

si −ρi

(in) C. Directed networks

− (ext) if κ(i, j) = 0. Again we update {ρ∗i }

ki

for the node i and its neighbors. These two steps It is quite straightforward to generalize the previous al-

assure to set the contribute of node i in Var({wij }) gorithms to generate directed networks. Now, we have an

to zero. indegree sequence {yi } and an outdegree sequence {zi }

but we can still go through all the steps of the construc-

4. We repeat steps (2) and (3) for all the nodes. Two tion of the benchmark for undirected networks with just

remarks are in order. First, we want each weight some slight modifications. In the following, we list what

wij > 0; so we update the weights only if this condi- to change in each point of the corresponding list in Sec-

tion is fulfilled. Second, the contribute of the neigh- tion II A.

bors of node i in Var({wij }) will change and, of

course, it can increase or decrease. For this reason, 1. We decided to sample the indegree sequence from

we need to iterate the procedure several times until a power law and the outdegree sequence from

P a δ-

a steady state is reached, or until we reach a certain distribution (with the obvious constraint i yi =

6

P P

be even is replaced by i yi (ξ) = i zi (ξ); be-

cause of this condition it might be necessary to

change yi (ξ) and/or zi (ξ). We decided to modify

N = 5000, <k> = 15, µt= 0.2, µw= 0.4 only zi (ξ), whenever necessary.

5

4 both distributions of indegree and outdegree, for

>

(int)

<wi

3 A points to C and D to B.

we use the following relation between the strength si of

2 3 4 5

β-1 a node and its in- and outdegree: si = (yi + zi )β . Given

(1 - µt)/(1-µw) k a node i, one considers all its neighbors, regardless of the

link directions (note that i may have the same neighbor

FIG. 6: The average weight of an internal link for a node counted twice if the link runs in both directions). Oth-

depends on its degree according to Eq. 7. The correlation plot erwise, the procedure to insert weights is equivalent.

in the figure, relative to a network of 5000 nodes, confirms the In directed networks, the directedness of the links may

result. reflect some interesting structural information that is not

present in the corresponding undirected version of the

graph. For instance there could be flows, represented

by many links with the same direction running from one

5 subgraph to another: such subgraphs might correspond

4.5 N = 1000, µt = 0.2, µw = 0.4 to important classifications of the nodes. Our directed

N = 5000, µt = 0.2, µw = 0.4 benchmark is based on the balance between the numbers

4

Computational time (s)

3.5 able to generate graphs with flows. However, this is not

3 true: flows can be generated by introducing proper con-

2.5 straints on the number of incoming and outgoing links of

2

the communities.

1.5 Suppose we want to generate a network with two com-

munities only, where the nodes of community 1 point

1

to nodes of community 2 but not vice versa and there

0.5 are a few random connections among nodes in the same

0 community. We could use our algorithm in this way:

20 25 30 35 40

<k> first we build separately the two subgraphs; then we set

(ext) (ext)

yi ' 0 for nodes in the community 1 and zi '0

(ext)

for nodes in community 2 and build G . If there are

FIG. 7: Computational time to build the weighted bench- more communities, one first builds as many subgraphs as

mark as a function of the average degree. We show the results necessary and then links them according to the desired

for networks of 1000 and 5000 nodes. µt was set equal to 0.2 flow patterns.

and µw to 0.4.

Methods based on mixture models [26, 27] may de-

tect this kind of structures. Methods based on a balance

P between internal and external links, like (directed) mod-

i zi ). We need to define the internal in- and out-

degrees yi (ξ) and zi (ξ) with respect to every com- ularity optimization may have problems. For example

munity ξ, which can be done by introducing two (Fig. 8), consider a network with three communities A,

mixing parameters. For simplicity one can set them B, C, with 10 nodes in each community, each node with

equal. 3 in-links and 3 out-links on average; nodes in A point

to 2 nodes in B, nodes in B point to 2 nodes in C, and

2. It is necessary that Eq. 2 holds for both {yi } and nodes in C point to 2 nodes in A; each node points to

{zi }. 1 node in its own community. The modularity of this

partition is Q = 0, therefore the optimization would give

3. We need to use the configuration model

P for directed a different partition, as the maximum modularity for a

networks, and the condition that i ki (ξ) should graph is usually positive.

7

coming publication.

1 1

0.95 0.95

0.9 0.9

γ=2, β=1, N=1000 γ=3, β=1

0.85 0.85 N=1000

<k>=15

<k>=20

0.8 <k>=25

0.8

0.75 0.75

0 0.2 0.4 0.6 0 0.2 0.4 0.6

1 1

0.95 0.95

FIG. 8: Example of directed graph with a flow running in a

0.9 0.9

cycle between three groups of nodes. The directedness of the γ=2, β=1 γ=3, β=1

links enables to distinguish the three groups, and there are 0.85 0.85

N=5000 N=5000

methods able to detect them. Standard community detection 0.8 0.8

methods, instead, are likely to fail. For instance, the value of 0.75 0.75

the directed modularity for the partitions in the three groups 0 0.2 0.4 0.6 0 0.2 0.4 0.6

Mixing parameter µw Mixing parameter µw

is zero, whereas the maximum modularity for the graph is

positive and corresponds to a different partition.

III. TESTS for weighted undirected networks. The topological mixing

parameter µt equals the strength mixing parameter µw . Each

point corresponds to an average over 100 graph realizations.

Here we present some tests of a method for community

detection on our benchmark graphs. We choose modu- To measure the similarity between the built-in modular

structure of the benchmark with the one delivered by the

algorithm we adopt the normalized mutual information,

a measure borrowed from information theory, which has

1 1

lately acquired some popularity [28]. Fig. 9 shows the

0.95 0.95

Normalized mutual information

0.9 0.9

γ=2, β=1, N=1000 γ=3, β=1 The plot shows a very similar pattern as that observed

0.85 0.85

<k>=15 N=1000 in the undirected case [16].

<k>=20

0.8 <k>=25

0.8 For the weighted benchmark we can tune two param-

0.75

0 0.2 0.4 0.6

0.75

0 0.2 0.4 0.6 eters, µt and µw . Fig. 10 refers to networks where we

1 1 set µt = µw , while in Fig. 11 we set µt = 0.5. Since,

0.95 0.95 for µw < 0.5, µt is smaller for the networks of Fig. 10

0.9

γ=2, β=1

0.9

γ=3, β=1

than for those in Fig. 11, we would expect to see bet-

0.85

N=5000 0.85 N=5000

ter performances of modularity optimization in Fig. 10

0.8 0.8 in the range 0 ≤ µw < 0.5. Instead, we get the opposite

0.75 0.75 result. The reason is that the links between communities

0 0.2 0.4 0.6 0 0.2 0.4 0.6

Mixing parameter µt Mixing parameter µt carry on average more weight when µt < µw than when

µt = µw , and this enhances the chance that merges be-

tween small communities occur, leading to higher values

FIG. 9: Test of modularity optimization on our benchmark of modularity [29]. Because of such merges, the parti-

for directed networks. The results get worse by increasing the tion found by the method can be quite different from the

number of nodes and/or decreasing the average degree, as we planted partition of the benchmark.

had found for the undirected case in Ref. [16]. Each point

corresponds to an average over 100 graph realizations.

IV. SUMMARY

larity optimization because it is one of very few meth-

ods that can be extended to the cases of directed and In this paper we have introduced new benchmark

weighted graphs [24]. The optimization was carried out graphs to test community detection methods on directed

by using simulated annealing [8]. In all the graphs used and weighted networks. The new graphs are suitable ex-

for testing, we have set to zero the amount of overlap tensions of the benchmark we have recently introduced

between the communities. Extensive tests of community in Ref. [16], in that they account for the fat-tailed distri-

detection algorithms, also including techniques to find butions of node degree and community size that are ob-

8

all our new benchmark graphs with the option of hav-

1 1 ing overlapping communities, an important feature of

0.95 0.95 community structure in real networks. With this work

Normalized mutual information

γ=2, β=1, N=1000 γ=3, β=1

0.85 0.85

detecting communities in graphs with a complete set of

<k>=15 N=1000

<k>=20

tools to make stringent objective tests of their algorithms,

0.8 0.8

<k>=25 something which is sorely needed in this field. We have

0.75 0.75

0 0.2 0.4 0.6 0 0.2 0.4 0.6 developed and carefully tested a software package for the

1 1 generation of each class of benchmark graphs, all of which

0.95 0.95 can be freely downloaded [30].

0.9 0.9

γ=2, β=1 γ=3, β=1

0.85 0.85

N=5000 N=5000

0.8 0.8

0.75 0.75

0 0.2 0.4 0.6 0 0.2 0.4 0.6

Mixing parameter µw Mixing parameter µw

Acknowledgments

FIG. 11: Test of modularity optimization on our benchmark

for weighted undirected networks. The topological mixing

parameter µt = 0.5. All other parameters are the same as in

Fig. 10. Each point corresponds to an average over 100 graph We thank F. Radicchi and J. J. Ramasco for useful

realizations. suggestions.

[1] H. A. Simon, Proc. Am. Phil. Soc. 6, 467 (1962). 118703 (2008).

[2] M. E. J. Newman, SIAM Review 45, 167 (2003). [18] A. Barrat, M. Barthélemy, R. Pastor-Satorras and A.

[3] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez and D.- Vespignani, Proc. Natl. Acad. Sci. USA 101, 3747 (2004).

U. Hwang, Phys. Rep. 424, 175 (2006). [19] J. Baumes, M. K. Goldberg, M. S. Krishnamoorthy,

[4] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. M. M. Ismail and N. Preston, in IADIS AC, Eds. N.

99, 7821 (2002). Guimaraes and P. T. Isaias, p. 97 (2005).

[5] S. Fortunato and C. Castellano, in Encyclopedia of Com- [20] S. Zhang, R.-S. Wang and X.-S. Zhang, Physica A 374,

plexity and System Science, ed. B. Meyers (Springer, Hei- 483 (2007).

delberg, 2009), arXiv:0712.2716 at www.arXiv.org. [21] T. Nepusz, A. Petróczi, L. Négyessy and F. Bazsó, Phys.

[6] D. Lusseau and M. E. J. Newman, Proc. R. Soc. London Rev. E 77, 016107 (2008).

B 271, S477 (2004). [22] E. N. Sawardecker, M. Sales-Pardo and L. A. N. Amaral,

[7] G. W. Flake, S. Lawrence, C. Lee Giles and F. M. Coet- Eur. Phys. J. B 67, 277 (2009).

zee, IEEE Computer 35(3), 66 (2002). [23] M. E. J. Newman, Phys. Rev. E 69, 066133 (2004).

[8] R. Guimerà and L. A. N Amaral, Nature 433, 895 (2005). [24] A. Arenas, J. Duch, A. Fernández and S. Gómez, New J.

[9] G. Palla, I. Derényi, I. Farkas and T. Vicsek, Nature 435, Phys. 9, 176 (2007).

814 (2005). [25] M. Molloy and B. Reed, Random Structures & Algo-

[10] A. Condon and R. M. Karp, Random Structures & Al- rithms 6, 161 (1995).

gorithms 18, 116 (2001). [26] M. E. J. Newman and E. A. Leicht, Proc. natl. Acad. Sci.

[11] R. Albert, H. Jeong and A.-L. Barabási, Nature 401, 130 USA 104, 9564 (2007).

(1999). [27] J. J. Ramasco and M. Mungan, Phys. Rev. E 77, 036122

[12] R. Guimerà, L. Danon, A. Dı́az-Guilera, F. Giralt and (2008).

A. Arenas, Phys. Rev. E 68, 065103(R) (2003). [28] L. Danon, A. Dı́az-Guilera, J. Duch and A. Arenas, J.

[13] L. Danon, J. Duch, A. Arenas and A. Dı́az-Guilera, in Stat. Mech., P09008 (2005).

Large Scale Structure and Dynamics of Complex Net- [29] S. Fortunato and M. Barthélemy, Proc. Natl. Acad. Sci.

works: From Information Technology to Finance and USA 104, 36 (2007).

Natural Science, eds. G. Caldarelli and A. Vespignani [30] The packages can be found in

(World Scientific, Singapore, 2007), pp 93–114. http://santo.fortunato.googlepages.com/inthepress2.

[14] A. Clauset, M. E. J. Newman and C. Moore, Phys. Rev. In each package there are instruction files that enable to

E 70, 066111 (2004). easily operate the software.

[15] A. Lancichinetti, S. Fortunato and J. Kertész, New J.

Phys. 11, 033015 (2009).

[16] A. Lancichinetti, S. Fortunato and F. Radicchi, Phys.

Rev. E 78, 046110 (2008).

[17] E. A. Leicht and M. E. J. Newman, Phys. Rev. Lett. 100,

- ASSESSMENT - Limits & ContinuityUploaded byShannon Burton
- Kristensen, P. (2018) - International Relations and the End; A Sociological AutopsyUploaded byFranco Vilella
- gephi_tutorial.pdfUploaded bySumeet Sharma
- Graph TheoryUploaded bywmmeyer
- Sample 1 SolUploaded byaniket
- MATHEMATICAL COMBINATORICS (INTERNATIONAL BOOK SERIES), Volume 1 / 2010Uploaded byAnonymous 0U9j6BLllB
- Impact of Graphs and Network in Minimizing Project and Product CostUploaded byThe Ijbmt
- Community StructureUploaded byDiegoPinheiro
- Exam1 s16 SolUploaded byVũ Ngọc
- Transformation de LaplaceUploaded byA.Benhari
- Bessel Differential EquationUploaded bylyquyen85
- Tower of HanoiUploaded byRichard Mcdaniel
- A Little Bit of Classics_ Dynamic Programming Over Subsets and Paths in Graphs - CodeforcesUploaded byCognizance_IITR
- An Efficient Algorithm based on Modularity OptimizationUploaded byIJMER
- CAT 2009 Qunat Test 8Uploaded byapi-19859643
- Advanced Pattern Part Test 5 SolutionsUploaded byAAVANI
- Lecture 1Uploaded bymcjack181
- MINLPSolver TutorialUploaded byscribd-ml
- 06JuppGeroJASIST.pdfUploaded byFelipe Pires
- previous years iit questionsUploaded byyuvaraj
- 1-s2.0-S1877705812031803-mainUploaded byErik Geovanni
- Exp-1Uploaded byDARTH VENIOUS
- _lesson04Uploaded byAsa Ka
- MathAnswer Key Ex3 S12Uploaded byАрхи́п
- Lecture 1Uploaded bybacalou
- Kertas 1Uploaded byjuliet84
- btpipelineprojectmath1210Uploaded byapi-289600986
- MATHS PRESENTATION.docxUploaded byAadil Pervez Bs IT
- 10.1.1.6Uploaded byShen Yin
- 1 Function and Function NotationUploaded byEla King

- Distributed Topology Control and Future Trends in WSNUploaded bykamakshi2
- AnimationUploaded bykamakshi2
- readme.txtUploaded bykamakshi2
- 新建 Microsoft Word 文档Uploaded bykamakshi2
- VLSI Design_Oct 30_4th Edition (1)Uploaded bykamakshi2
- Mcq in DAAUploaded bykamakshi2
- Two Marks for All UnitsUploaded bykamakshi2
- Bayson Machine LearningUploaded byShanmukh Sista
- 9780262028189_TOC11.pdfUploaded byvarun3dec1
- Animation V0Uploaded bykamakshi2
- Example 1Uploaded bykamakshi2
- eample5.pdfUploaded bykamakshi2
- Example 4Uploaded bykamakshi2
- Objective Type Q and AUploaded bykamakshi2
- M.C.a (Sem - III) Paper - I - Object Oriented ProgrammingUploaded bytovilas
- Q and A in CPPUploaded bykamakshi2
- cpp exercises1.docxUploaded bykamakshi2
- Balance SheetUploaded bykamakshi2
- eample5Uploaded bykamakshi2
- revocations.txtUploaded bykamakshi2
- Cert OverrideUploaded bykamakshi2
- I Year C and C LabUploaded bykamakshi2
- M.tech CSE and EMB Scheme OnlyUploaded bykamakshi2
- B.tech CSE Scheme - 2015 OnwardsUploaded bykamakshi2
- I year syllabus 2015-16.pdfUploaded bykamakshi2
- Lab01 SolutionsUploaded bykamakshi2
- lab-overview-slides.pdfUploaded bykamakshi2

- Bochner’s Theorem on the Fourier Transform on RUploaded byEdgar Marca
- On the Extension of Topological Local Groups With Local Cross SectionUploaded bySTATPERSON PUBLISHING CORPORATION
- Acm Cheat SheetUploaded byIsrael Marino
- Tutorial 1A (1)Uploaded byMohd Hafiz Maulad Sedi
- Calculus 7th edition_Ch04Uploaded byيقين عبدالرحمن
- 23-MATHSUploaded byAzeef Ajmal
- GATE Mathematics VaniUploaded bySanthos Kumar
- Modern Matrix AlgebraUploaded byElizabeth Ott
- Curl MatlabUploaded byNikhil Chaudhari
- A2 Unit 1 SequencesUploaded byDanielle
- Chiang - Chapter 4Uploaded bystraffor
- Mathematical AnalysisUploaded byLaurentiu Frusinoiu
- Functions(Domain and Range)Uploaded byDileep Naraharasetty
- Fuchs and the Theory of Differentian EquationsUploaded bySebastián Felipe Mantilla Serrano
- endomorphismUploaded byAbdo Sawaya
- Sample 7362Uploaded byAamir Khan
- Precalculus Tradler CarleyUploaded byAlex Dorosh
- FloydUploaded byJem Bartle
- Maths-3-Compiled-A.T.EshwarUploaded bysvc8670
- Gretton_Introduction to RKHS, And Some Simple Kernel AlgorithmsUploaded bygrzerysz
- Cs2201 Data StructuresUploaded byKarthik Sara M
- 1710__04217Uploaded byhuevonomar05
- Question Bank of Nag Rai DspUploaded bysnehilj_1
- Maths HarvardUploaded bypremnath.s
- Mathematical Methods in Engineering and ScienceUploaded byuser1972
- DMTH505_MEASURE_THEOREY_AND_FUNCTIONAL_ANALYSIS.pdfUploaded byJahir Uddin Laskar
- spcg6e ppt 4 1Uploaded byapi-334707195
- Question Bank on Limit Continuity and Differentiablity.pdfUploaded byUmang Dhandhania
- MAT3003 Complex Variables and Partial Differential Equations TH 1 AC40Uploaded byJohn Dias
- Laplace_Table.pdfUploaded bySaptarshiDutta