rr2012 Conclude Libre PDF

CONCLUDE: Complex Network Cluster
Detection for Social Applications

Emilio Ferrara1 , Alessandro Provetti2
1
Center for Complex Networks and Systems Research

School of Informatics and Computing, Indiana University Bloomington, USA
ferrarae@indiana.edu
2
Dept. of Physics, Informatics Section, University of Messina, IT
ale@unime.it
Abstract. The problem of clustering large complex networks in the context of knowledge discovery and data management is central in current
literature. In this paper, we present a method to unveil clusters present
in complex networks, baptized COmplex Network CLUster DEtection
(or, shortly, CONCLUDE). Our strategy relies on three steps: i) ranking
edges centrality by using a random walker; ii) calculating the distance
between each pair of connected nodes according to the ranking outcome;
iii) partitioning the network into clusters so to optimize a function called
network modularity exploiting both global and local information. The
algorithm is computationally efficient since its cost is near linear with
respect to the number of edges in the network. The adoption of our clustering method has been proved worthy in different contexts, such as for
studying real-world social networks and artificially-generated networks
with well-defined clusters.
Introduction
Network Analysis is attracting an unprecedented attention by the scientific community, during latest years, both in the context of the social knowledge management and in network science. All this attention is justified by the recent success
of online social networks and online communities. In this panorama, both from a
scientific perspective and from a commercial standpoint, the problem of identifying clusters inside large social networks recently acquired a relevant attention
[27, 29, 12, 33].
In this paper we propose a computationally efficient cluster detection method,
called COmplex Network CLUster DEtection (henceforth, CONCLUDE). It
works in three steps: first, it computes a ranking of the edges of the given network
by using a measure of edge centrality, called -path edge centrality3 , exploiting
a self-avoiding random walker [23] which performs walks of bounded length up
to a constant user-defined tunable value . Afterwards, CONCLUDE computes
3
The -path edge centrality is formalized in section 2.1.
a pairwise measure of distance among all the connected nodes of the networks,
exploiting the weights assigned to edges by the random walker. In the final step,
CONCLUDE discovers the clusters present in the network by using a network
modularity optimization strategy.
Methodology
In this section we present the ideas behind our cluster detection method. In
detail, we discuss the motivations for which we consider our approach feasible in
particular but not only to discover cluster (also commonly called communities)
in large social networks.
For example, let consider an online social network, in which users are connected
each other by means of virtual friendship relations. In this context, a message
(for example, a wall post on Facebook or a tweet on Twitter) represents the simplest piece of information that users of the network could exchange. Moreover,
users can exchange information only with their neighbors. These assumptions are
instrumental in order to define the concept of community in a social network.
In this context, we informally define a community as a group of users of the
network, whose interconnections are dense among each other and sparse outside
the group.
The aim of our clustering algorithm is to identify a partitioning of the network
such that each node is assigned to one and only one cluster (which means that
clusters can not overlap.) Several different measure of the goodness of a network
partitioning have been advanced. Our method relies on the well-know concept
of network modularity, which has been proposed by Girvan and Newman [13,
26]. According to the network modularity definition provided by Girvan and
Newman, we can measure the density of connections among nodes belonging
to a given cluster and establish whether this value significantly diverges from
what we would expect if edges were distributed randomly among nodes of the
network. This evaluation can be attained by taking as yardstick a null model
(for example, a Erd
os-Renyi random graph [7].) This intuition is captured by
the following definition of network modularity
Q=
m
X
s=1
"
ls
|E|
ds
2|E|
2 #
(1)
that holds assuming to consider an unweighted and undirected network G =

hV, Ei divided into m clusters, where ls represents the number of edges between
nodes belonging to the s-th cluster and ds the sum of the degrees of the nodes in
the s-th cluster. High values of Q imply high values of ls for each cluster. Thus,
clusters identified according to this criterion are dense within their structure
and weakly coupled among each other. This perfectly fits to our scope that was
previously informally defined.
Unfortunately, the problem of optimizing the network modularity has been

proven NP-hard [2], but several heuristic methods have been proposed as to
the date. Also our clustering strategy tries to overcome this problem using one
of these methods. To this purpose, the technique exploited in our framework
is a variant of the well-known Louvain method [1], a heuristic modularity optimization approach widely adopted in several fields, due to its computational
efficiency [19, 24, 14]. This technique has been proven to work better when dealing with weighted networks [17, 18], where weights are computed according to a
specific criterion that might reflect, for example, the centrality of nodes or the
ability of edges to spread information through the network.
In the light of this assumption, first of all our strategy will be to rank edges on
the basis of their ability of diffusing information over the network. We assume
that the higher the capacity of a given edge to propagating information, the
higher its importance in the network. In addition, it is well-know that the higher
the centrality of a edge, the more likely it connects different clusters [13, 26].
In principle, any centrality measure able to capture such an assumption, for
example the betweenness centrality [32] or the current-flow centrality [3], could
be adopted to this purpose. On the other hand, we would like to ensure the
applicability of our algorithm to a broader spectrum of networks (for example,
to very large networks). This implies the choice of a centrality measure for which
a computationally efficient (even if approximate) algorithm exists: this is not the
case, for example, of the two previously mentioned measures. To ensure broad
applicability, we will exploit a measure of centrality, called -path edge centrality,
that we recently defined [6], and for which we provided a heuristic near linear
cost algorithm (briefly recalled in section 3.1.) The outcome of this centrality
computation is adopted to compute a ranking of nodes, according to their ability
of spreading information in the network. Once established to the link ranking,
we can compute a measure of pairwise distance between nodes and finally the
partitioning of the network.
Two interesting aspects of our technique emerge: (i) combining global information provided by a random walker with local optimization attained by network
modularity maximization by means of a variant of the Louvain method, our
strategy can be seen as a glocal optimization algorithm; and, (ii) the algorithm
is proved to work well in different contexts, for example online social networks
(see section 4.1), or artificially-generated networks with a pre-defined cluster
structure (see section 1.)
2.1
-path edge centrality
In this section we briefly recall the concept of -path edge centrality, as recently proposed in [6]. This will be instrumental for further discussions of the
CONCLUDE algorithm.
Definition 1. (-path edge centrality) For each edge e of a graph G = hV, Ei,
the -path edge centrality L (e) of e is defined as the sum, over all possible source
nodes s, of the frequency with which a message originated from s traverses e,
assuming that the message traversals are only along random simple paths of at
most edges.
The -path edge centrality is formalized, for an arbitrary edge e, as follows
L (e) =
X (e)
s
s
(2)
sV
where s are all the possible source nodes, s (e) is the number of -paths originating from s passing through e and, finally, s is the number of -paths originating
from s.
The CONCLUDE Algorithm
In the following, we present CONCLUDE, a method to discover clusters in (possibly, large) complex networks. The three steps, already anticipated, constituting
the algorithm are discussed separately in the following.
3.1
Step 1:
-path edge centrality evaluation
The first step in CONCLUDE is ranking edges by using the -path edge centrality
previously formalized. The advantage of using the -path edge centrality instead
of other centrality measure (e.g., closeness, betweenness, etc. [32]) is that we can
provide with a heuristic algorithm for its efficiently computation in near linear
time.
This algorithm, originally presented in [6], is called ERW-Kpath - Edge Random
Walk -Path Centrality (hereafter, ERW-Kpath.) It consists of two steps: (i)
edge weights assignment, and (ii) information diffusion by using self-avoiding
bounded-length random walks.
We proved that the ERW-Kpath algorithm returns, as output, an approximate
value of the edge centrality index as described in Definition 1 and we provided
a quantitative assessment of such an approximation [6]. We briefly discuss the
two steps of the ERW-Kpath algorithm in the following.
Edge weights assignment The first stage of ERW-Kpath is the assignment
of some weights to the edges of the graph G = hV, Ei representing the given
network. These weights represent initial values of edge centrality and are updated
during the execution of the algorithm. In detail, for the initial weight assignment
we adopt the following function:
Definition 2. (Initial edge weight) Given an undirected graph G = hV, Ei, the
initial edge weight 0 (em ) of an edge em E is defined as
0 (em ) =
1
|E|
(3)
The rationale behind Equation 3 is the following. Initially, we manage a budget

consisting of |E| points, divided, in a democratic fashion, among all the edges of
the network. At the end of the information diffusion process (the next step), each
value 0 (em ) will reflects the ability of a given edge to propagate information
through the network. The final values will be adopted as weights for the edges
in the network.
Information diffusion on the network The second stage of ERW-Kpath is

to simulate a process of information diffusion over the network by using selfavoiding random walks [23] of bounded length up to . To do so, the algorithm
iterates the following sub-steps a number of times equal to a user-defined tunable
value (we further describe a practical rule to tune both and .) At each
iteration, the algorithm performs these operations:
1. A node vn V from which to start the information spreading is selected
according to a uniform probability distribution.
2. A random walker is invoked. It generates a self-avoiding random walk whose
length is not greater than .
The random walker procedure carries out a loop until both the following conditions hold true:
a The length of the current walk is no greater than .
b Assuming that the walk has reached the node vn , there must exist at least
an incident edge on vn which has not been already traversed. This, in order
to obtain a self-avoiding random walk.
If the conditions above are satisfied, the random walker selects an edge em with
a uniform probability computed among those edges not already traversed during
the current walk, given by
P (em ) = P
1
eI(vn )
1eI(v
n)
(4)
n ) the subset of I(vn ) of edges

being I(vn ) the set of edges incident on node vn , I(v
not already traversed during the current walk, and the characteristic function
1eI(v
n ) = 1 when the condition is true, 0 otherwise. Let em be the selected
edge at step l and vn+1 the node reached from vn by means of em : the random
walker awards a bonus to em , whose weight becomes l (em ) = l1 (em ) +
if 1 l . Then, em is set as traversed and the walk length counter l is

increased by 1. At this point, after checking that both conditions (a) and (b)
hold true, the random walker will continue its walk from vn+1 .
At the end of the whole process (i.e., after iterations of the random walk
simulations), each edge e E is assigned with a centrality index L (e). This is
used to set its final weight (e).
It is worth to underline that, in principle, the values of , and could be
arbitrarily fixed, but here we provide a simple practical rule to tune them. Both
from a theoretical and empirical evaluation [6], it emerges that it is convenient
1
. In fact, according to this selection it is simple to
to set |E| and = |E|
1
, 1] (to this purpose,
show that the edge centrality indexes always range in [ |E|
1
recall the initial weight assignment function 0 (em ) = |E| .) Ideally, the centrality
index of an edge will be equal to 1 if (and only if) it is always selected during
any process of information diffusion.
With regards to the choice of , from our empirical assessment we have shown
that optimal values range in the interval [5,20]. In particular, = 20 seems to
be a good trade-off between accuracy and computational cost for the most of
large networks [6]. With this configuration, the time complexity of ERW-Kpath
is O(), that is near linear in terms of edges in the given network.
3.2
Step 2: Distance computation
The second step of CONCLUDE is the computation of the distance between

each pair of connected nodes. This is done by using a variant of the Euclidean
norm, defined as
v
u n
uX (L (eik ) L (ekj ))2
rij = 1 t
d(k)
(5)
k=1
where L (eik ) (resp., L (ekj )) is the -path edge centrality of the edge eik (resp.,
ekj ) and d(k) is the degree of the node (introduced according to [28].)
In theory, this step could be computationally demanding, because it should require O(|V |2 ) iterations. By adopting some optimization, its cost becomes near
linear, in fact, O(d(v)2 |V |) operations are sufficient, where d(v) is the average
degree of the nodes of the network (and it is usually small in complex networks.) This algorithmic solution is captured by algorithm 1 (described adopting
a Python-like syntax.)
Algorithm 1 ComputePairwiseDistance() Input: a graph G

1: for all i in G.nodes() do
2:
for all k in G.neighbors(i) do
3:
degk len(G.neighbors(k))
4:
lik G[i][k][ weight ]
5:
for all j in G.neighbors(k) do
6:
lkj G[k][j][ weight ]
7:
dij + = square(lik lkj )/degk
8:
rij = 1 sqrt(dij )
3.3
Step 3: Clustering
The last step of our method consists in the network partitioning. CONCLUDE
adopts the paradigm of the network modularity maximization.
In detail, Equation 1 reveals a possible optimization strategy. In order to increase
the value of the first term of the equation (called coverage), the greatest possible
amount of edges should fall in each given cluster, whereas the minimization of
the second term is obtained by dividing the network in several clusters with
small total degrees. Unfortunately, this task is proven to be NP-hard [2].
To overcome this problem, CONCLUDE exploits an approximate technique inspired by the Louvain method (LM) [1]. LM is a strategy based only on local
information, thus it is well-suited for analyzing large networks. It consists of two
steps:
(i) first, each node is assigned to a cluster chosen in order to maximize the value
Q =
+kiC
2m
P
+ki
2m
2
"P
2m
P 2
2m
ki
2m
#
(6)
which
represents the gain derived from moving a node
P i to a cluster C, where
P
is
the
sum
of
the
weights
of
the
edges
inside
C,
is the sum of the weights
C
C
of the edges incident to nodes in C, ki is the sum of the weights of the edges
incident to node i, kiC is the sum of the weights of the edges from i to nodes in
C, m is the sum of the weights of all the edges in the network.
(ii) In the second step, it is built a meta-network whose nodes are those clusters
found at the previous step, collapsing all edges among nodes belonging to a pair
of cluster on one, whose weights is the sum of weights of edges at the previous
step.
This process iterates until an arbitrarily small improvement Q is attained at
each iteration. The cost of the whole process is O(|V |), where is the number
of iterations required by the algorithm to converge (in our experience, < 5.)
The advantage of our approach with respect to the original LM is twofold: first,
we obtain the splitting of clusters connected by edges with low distance, which
is a global feature, maximizing the network modularity, while LM only relies
on local information (i.e., node neighborhood;) second, our strategy is able to
produce a edge weighting, while the original LM and most of current clustering
algorithms can not infer edge weights in case of unweighted networks; this
aspect ensures better performance of our strategy in most of cases (as discussed
in the remainder of this paper).
Summarizing, by adopting efficient graph memorization techniques, the computational cost of CONCLUDE is near linear. In fact, it results from the three
previously described steps, i.e., O(|E| + d(v)2 |V | + |V |) = O( |E|).
Results
CONCLUDE has been experimented in different fields of application, for example for the analysis of online social networks and biological networks. In this
paper we propose a discussion of its application to the following two cases (see
section 4.1): (i) artificially-generated (henceforth, synthetic) networks with a
pre-defined clustering structure; (ii) online social network datasets from realworld applications.
In addition, we discuss the features of results provided by the ERW-Kpath centrality algorithm adopted by CONCLUDE to rank edges according to their ability of spreading information through the network (see section 4.2.)
4.1
Cluster Detection
In order to evaluate the performance of our clustering method, we carried out

two different types of experiments. The former, by using synthetic networks for
which a pre-built cluster structure was well defined. The latter, by considering
different graphs describing the structure of some real-world social networks.
Synthetic networks The first evaluation has been carried out by using the LFR
benchmark presented by Lancichinetti et al. [20]. A set of synthetic networks
has been generated, by using the same configuration reported in [20], i.e.,: (i)
N = 1000 nodes; (ii) the four pairs of networks identified by (, ) = (2,1),
(2,2), (3,1), (3,2), where represents the exponent of the power law distribution
of node degrees, the exponent of the power law distribution of the cluster sizes;
(iii) for each pair of exponents, three values of average degree hki = 15, 20, 25;
(iv) for each of the previous combinations, we generated six networks by varying
the mixing parameter 4 = 0.1, . . . , 0.6.
To compute the quality of the results, we adopted the measure called normalized
mutual information (NMI) [4]. Such a measure assumes that, given a graph G, a
ground truth is available to verify what are the clusters (said real clusters) in G
and what are their features. Let us denote as A the true cluster structure of G
and suppose that G consists of cA clusters. Let us consider a clustering algorithm
applied on G and assume that it identifies a clustering structure B consisting of
cB clusters. We define a cA cB matrix said confusion matrix CM such that
each row of CM corresponds to a cluster in A whereas each column of CM is
associated with a cluster in B. The generic element CMij is equal to the number
of elements of the real i-th cluster which are also present in the j-th cluster found
by the algorithm. Starting by this definition, the normalized mutual information
is defined as

N N
Nij log NiijNj

N M I(A, B) = P
P cB
N
cA
Ni
+ j=1 Nj log Nj
i=1 Ni log N
2
P c A P cB
i=1
j=1
(7)
being Ni (resp., Nj ) the sum of the elements in the i-th row (resp., j-th column) of the confusion matrix. If the considered clustering algorithm would work
perfectly, then for each discovered cluster j, it would exist a real cluster i exactly coinciding with j. In such a case, it is possible to show that N M I(A, B)
is exactly equal to 1 [4]. By contrast, if the clusters detected by the algorithm
are totally independent of the real communities then it is possible to show that
the NMI is equal to 0. The NMI, therefore, ranges from 0 to 1 and the higher
the value, the better the clustering algorithm works.
The performance provided by CONCLUDE, reported in Table 1, shows excellent
values of NMI considering the task of solving the LFR benchmark. For each
configuration, the partitioning provided by our algorithm is compared against
the ground-truth built by the LFR benchmark, measuring the goodness of the
partitioning according to the previously defined normalized mutual information
measure.
CONCLUDE provides in general high values of NMI for the setting of =
0.1, . . . , 0.3, which lead to the presence of strongly defined clusters in the synthetic networks. Moreover, it is worth to note that the values of NMI are stable
across different configurations: fixed the mixing parameter , we let the parameters hki, and vary, changing that reflects on the feature of the networks
generated by the benchmark. This means that our strategy works well and consistently according to different network structures and performance are indepen4
The value = 0.5 is the threshold beyond which clusters are no longer defined in
the strong sense (that is that each node has more neighbors in its own cluster than
in the others.)
dent of particular network features (such as the degree distribution or the size
of clusters present in the network.)
hki = 0.1 = 0.2 = 0.3 = 0.4 = 0.5 = 0.6

15
20
25
0.861
0.859
0.856
0.761
0.759
0.760
= 2, = 1
0.664
0.559
0.663
0.566
0.663
0.560
0.409
0.434
0.433
0.281
0.298
0.307
15
20
25
0.856
0.859
0.861
0.757
0.763
0.761
= 2, = 2
0.664
0.550
0.665
0.555
0.664
0.569
0.463
0.467
0.469
0.317
0.316
0.340
15
20
25
0.851
0.863
0.860
0.761
0.768
0.762
= 3, = 1
0.636
0.522
0.662
0.517
0.665
0.562
0.348
0.409
0.422
0.238
0.248
0.283
= 3, = 2
15
0.865
0.767
0.657
0.569
0.407
0.287
20
0.863
0.768
0.670
0.570
0.435
0.297
25
0.863
0.765
0.666
0.568
0.448
0.281
Table 1. Values of normalized mutual information provided by CONCLUDE resolving
the clusters in the networks with community structure artificially-generated according
to the LFR benchmark [20].
Real-world networks In this section we describe the results obtained by analyzing different graphs obtained by real-world social networks datasets [22, 31].
The details of these networks are summarized in Table 2. This experiment has
been designed to quantitatively evaluate the performance of our strategy in realworld applications. To configure the ERW-Kpath, the values of and have been
tuned as previously suggested. In addition, the value of the maximum length of
the self-avoiding random walks has been set equal to = 20.
The results are measured by means of the value of network modularity (formally defined by Equation 1) obtained by CONCLUDE, compared against those
attained by using two different techniques: (i) the already presented Louvain
method (LM) and, (ii) COPRA [16], which is a fast clustering detection algorithm based on the principle of label propagation [30].
From the analysis of results reported in Table 2, we can draw some consideration
about the performance of the proposed clustering method.
Considering these real-world scenarios, CONCLUDE outperforms both LM and
COPRA in terms of attained values of network modularity. In general, results
provided by CONCLUDE are better than those provided by the Louvain method
in average of 5%-15%, pushing the improvement up to 25% in the case of COPRA. This advantage can be explained considering two different motivations: (i)
our strategy aims at the maximization of a weighted network, producing weights
according to an intrinsic rationale, driven by the ERW-Kpath algorithm, while
neither the Louvain method nor COPRA (and, in general, most of the stateof-the-art clustering algorithms [12]) are able to produce a weighting for an
unweighted network.
In addition and equally important, our strategy relies both on local and global
information, an aspect which makes CONCLUDE what in recent literature is
called a glocal optimization algorithm. In fact, the first step of our strategy
exploits information on a long-range, carrying out a random walker that visits,
starting multiple times from each node, not only the neighborhood, but also those
regions of the graph far from the origin of the walk. This global information is
exploited within the second step, the computation of the distance among all pairs
of nodes. Finally, local information is exploited by the modularity optimization
strategy inspired by the Louvain method itself. The final result is a general
improvement of the performance of the clustering procedure of a non-negligible
factor, which comes at almost no cost (in fact, the quality of the partitioning
is proven very good by considering the values of NMI provided in the previous
experiment.)
N. Network
No. nodes No. edges CONCLUDE
LM
COPRA
1 CA-GrQc
5,242
28,980
0.883
0.860
2 CA-HepTh
9,877
51,971
0.806
0.772
3 CA-HepPh
12,008
237,010
0.760
0.656
4 CA-AstroPh
18,772
396,160
0.663
0.627
5 CA-CondMat
23,133
186,932
0.768
0.731
6 Facebook-links 63,731
1,545,684
0.664
0.626
Table 2. Values of network modularity provided by CONCLUDE in
cluster detection from different real-world social network datasets.
4.2
0.761
0.768
0.754
0.577
0.616
0.726
the context of
-path edge centrality
In addition to discuss the performance of the clustering feature, we here describe

some empirical evidence of the -path edge centrality measure as approximated
by means of the ERW-Kpath algorithm, which is adopted by CONCLUDE to
rank edges. This is instrumental to understanding the functioning of the glocal
optimization.
In particular, in the following we report an experiment aimed at discovering
how different values of impact on the final edge centrality. To this purpose, we
produce a probability distribution of -path values obtained varying the setting

for to understand its general behavior.
In detail, we consider the datasets presented in Table 2 separately and apply the
ERW-Kpath algorithm varying as = 5, 10, 20. After that, for a fixed value of
-path edge centrality L, we compute the probability P (L) of finding an edge
with such a centrality value. The corresponding results are plotted in Figure 1
for the top four largest datasets (namely, CA-HepPh, CA-AstroPh, CA-CondMat
and Facebook-links.) To show the scaling behavior of the distributions, for each
plot we adopt a logarithmic scale5 .
The analysis of these plots highlights some relevant facts. First of all, a power
law distribution in the edge centrality values emerges6 in presence of all different
values of . In other words, if we use different values of , the centrality indexes
may change; however, as emerges from the plots, the curves representing -path
centrality values resemble parallel lines. This implies that, for a fixed value of ,
say = 5, an edge e will have a particular centrality score. But, if is increased
from 5 to 10 and, then, from 10 to 20, the centrality of e will be increased always
by a constant factor.
This aspect reflects the ability of the ERW-Kpath algorithm to identify those
edges which are in fact central in the structure of the network, and rewarding
them with high weights. Intuitively, also those edges which are less relevant will
be still awarded but a smaller number of times, which will lead to lower values
of centrality (and, in the end, to the power law distribution). These weights,
which are computed according to a global rule (that is, discovering by means of a
random walker those edges which are more likely to be traversed during a process
of spreading information on the given network) are subsequently exploited to
compute overall distances among pairs of nodes, which in the end will lead to the
identification of clusters according to local optima discovered by the modularity
optimization strategy.
This summarizes the glocal optimization nature of CONCLUDE.
Conclusion and Future Work
In this paper we presented CONCLUDE, an efficient method for detecting clusters in complex networks which is proven to work well in different domains.
An early implementation of this algorithm has been already released7 and its
strengths have been already assessed by the some authors [8], independently
from the authors of this paper.
5
To this purpose, to exploit a logarithmic scale, note that, on the x-axis, values of
-path edge centrality have been re-normalized in the interval [1, |E|] instead of [0, 1].
In a log-log scale, a distribution that resembles a straight line depicts a scale-free
behavior.
http://www.emilio.ferrara.name/conclude
Fig. 1. The probability distribution of -path edge centrality, computed according to

different configurations of = 5, 10, 20 for the four largest networks considered in this
paper: CA-HepPh, CA-AstroPh, CA-CondMat and Facebook-links.
Our ongoing efforts which involve the adoption of CONCLUDE, span in different
directions, such as to investigate: (i) the emergence of a community structure in
large online social networks such as Facebook [10, 9] and assess different sociological conjectures which involve finding clusters according to importance of edges,
for example the strength of the weak ties theory [15, 11], and (ii) the possibility
of enhancing the performance of different state-of-the-art clustering algorithms
(such as COPRA [16] or OSLOM [21]) by pre-processing networks by means of
a random walk based measure of centrality like the -path edge centrality [5].
As for future work, we planned a long-term research evaluation of our method,
in order to cover different domains of application: for example, the application of CONCLIDE could be promising in the context of neuroinformatics, applied to the connectome (i.e., the human brain functional network) [24] or bioinformatics, to detect protein complexes in protein-interaction networks [25]. In
conclusion, further extensions of CONCLUDE could be advanced to face additional scientific challenges, such as the possibility of discovering overlapping
clusters [33].
References
1. Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics 2008, P10008 (2008)
2. Brandes, U., Delling, D., Gaertler, M., G
orke, R., Hoefer, M., Nikoloski, Z.,
Wagner, D.: On finding graph clusterings with maximum modularity. In: GraphTheoretic Concepts in Computer Science. pp. 121132 (2007)
3. Brandes, U., Fleischer, D.: Centrality measures based on current flow. STACS 2005
pp. 533544 (2005)
4. Danon, L., Daz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure
identification. Journal of Statistical Mechanics 2005, P09008 (2005)
5. De Meo, P., Ferrara, E., Fiumara, G., Provetti, A.: Enhancing community detection
using a network weighting strategy. Information Sciences (to appear)
6. De Meo, P., Ferrara, E., Fiumara, G., Ricciardello, A.: A novel measure of edge
centrality in social networks. Knowledge-based Systems 30, 136150 (2012)
7. Erd
os, P., Renyi, A.: On random graphs. Publicationes Mathematicae 6(26), 290
297 (1959)
8. Fatemi, M., Tokarchuk, L.: An empirical study on imdb and its communities based
on the network of co-reviewers. In: Proceedings of the First Workshop on Measurement, Privacy, and Mobility. p. 7. ACM (2012)
9. Ferrara, E.: A large-scale community structure analysis in facebook. Arxiv preprint
arXiv:1106.2503 (2011)
10. Ferrara, E.: Community structure discovery in facebook. International Journal of
Social Network Mining 1(1), 6790 (2012)
11. Ferrara, E., De Meo, P., Fiumara, G., Provetti, A.: The role of strong and weak
ties in facebook: a community structure perspective. Proceedings of Computational
Approaches to Social Modeling (ChASM) (2012)
12. Fortunato, S.: Community detection in graphs. Physics Reports 486, 75174 (2010)
13. Girvan, M., Newman, M.: Community structure in social and biological networks.
Proceedings of the National Academy of Sciences 99(12), 7821 (2002)
14. Good, B., de Montjoye, Y., Clauset, A.: Performance of modularity maximization
in practical contexts. Physical Review E 81(4), 046106 (2010)
15. Granovetter, M.: The strength of weak ties. American Journal of Sociology pp.
13601380 (1973)
16. Gregory, S.: An algorithm to find overlapping community structure in networks.
Knowledge Discovery in Databases pp. 91102 (2007)
17. Khadivi, A., Rad, A., Hasler, M.: Network community-detection enhancement by
proper weighting. Physical Review E 83(4), 046104 (2011)
18. Lambiotte, R.: Multi-scale modularity in complex networks. In: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), 2010 Proceedings
of the 8th International Symposium on. pp. 546553. IEEE (2010)
19. Lambiotte, R., Panzarasa, P.: Communities, knowledge creation, and information
diffusion. Journal of Informetrics 3(3), 180190 (2009)
20. Lancichinetti, A., Radicchi, F.: Benchmark graphs for testing community detection
algorithms. Physical Review E 78(4), 046110 (2008)
21. Lancichinetti, A., Radicchi, F., Ramasco, J.: Finding statistically significant communities in networks. PloS one 6(4), e18961 (2011)
22. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proc. of the 12th
International Conference on Knowledge Discovery and Data Mining. pp. 631636.
ACM (2006)
23. Madras, N., Slade, G.: The self-avoiding walk. Birkhauser (1996)
24. Meunier, D., Lambiotte, R., Fornito, A., Ersche, K., Bullmore, E.: Hierarchical
modularity in human brain functional networks. Frontiers in neuroinformatics 3
(2009)
25. Nepusz, T., Yu, H., Paccanaro, A.: Detecting overlapping protein complexes in
protein-protein interaction networks. nature methods (2012)
26. Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69(2), 26113 (2004)
27. Opsahl, T., Panzarasa, P.: Clustering in weighted networks. Social networks 31(2),
155163 (2009)
28. Pons, P., Latapy, M.: Computing communities in large networks using random
walks. Computer and Information Sciences pp. 284293 (2005)
29. Porter, M., Onnela, J., Mucha, P.: Communities in networks. Notices of the American Mathematical Society 56(9), 10821097 (2009)
30. Raghavan, U., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76(3), 036106 (2007)
31. Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user
interaction in facebook. In: Proc. of the 2nd Workshop on Social Networks. ACM
(2009)
32. Wasserman, S., Faust, K.: Social network analysis: Methods and applications. Cambridge Univ. Press (1994)
33. Xie, J., Kelley, S., Szymanski, B.: Overlapping community detection in networks:
the state of the art and comparative study. ACM Computing Surveys (2012)

rr2012 Conclude Libre PDF

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

rr2012 Conclude Libre PDF

Загружено:

Авторское право:

Доступные форматы

CONCLUDE: Complex Network Cluster

Detection for Social Applications

Center for Complex Networks and Systems Research

The -path edge centrality is formalized in section 2.1.

that holds assuming to consider an unweighted and undirected network G =

Unfortunately, the problem of optimizing the network modularity has been

-path edge centrality

The CONCLUDE Algorithm

-path edge centrality evaluation

The rationale behind Equation 3 is the following. Initially, we manage a budget

Information diffusion on the network The second stage of ERW-Kpath is

n ) the subset of I(vn ) of edges

if 1 l . Then, em is set as traversed and the walk length counter l is

Step 2: Distance computation

The second step of CONCLUDE is the computation of the distance between

Algorithm 1 ComputePairwiseDistance() Input: a graph G

In order to evaluate the performance of our clustering method, we carried out

hki = 0.1 = 0.2 = 0.3 = 0.4 = 0.5 = 0.6

No. nodes No. edges CONCLUDE

-path edge centrality

In addition to discuss the performance of the clustering feature, we here describe

produce a probability distribution of -path values obtained varying the setting

Conclusion and Future Work

Fig. 1. The probability distribution of -path edge centrality, computed according to

Вам также может понравиться