Академический Документы
Профессиональный Документы
Культура Документы
Anuraj
Mohan
13MZ01,CSED
Diversity of graphs
Directed vs. undirected, labeled vs. unlabeled (edges & vertices),
weighted, with angles & geometry (topological vs. 2-D/3-D)
complete or NP hard)
Aspirin
Internet
Co-author network
Element
Elements Attributes
Graph Instance
Vertex
Vertex Label
Relation Between
Two Elements
Edge
Type Of Relation
Edge Label
Relation between
a Set of Elements
Hyper Edge
Terminology-I
A graph is said to be connected if there is a
path between every pair of vertices
A graph Gs (Vs, Es) is a subgraph of another
graph G(V, E) iff
Vs is subset of V and Es is subset of E
Example of Graph
Isomorphism
Terminology-II
Subgraph isomorphism problem
Given two graphs G1(V1, E1) and
G2(V2, E2): find an isomorphism
between G2 and a subgraph of G1
There is a mapping from V1 to V2 such
that each edge in E1 is mapped to a
single edge in E2 and vice-versa
NP-complete problem
Reduction from max-clique or
hamiltonian cycle problem
9
Frequent subgraphs
A (sub)graph is frequent if its support
(occurrence frequency) in a given dataset is no
less than a minimum support threshold
Applications of graph pattern mining:
Mining biochemical structures
Program control flow analysis
Mining XML structures or Web communities
Building blocks for graph classification,
clustering, compression, comparison, and
correlation analysis
Input
Database of graph transactions.
Undirected simple graph
(no loops, no multiples edges).
Each graph transaction has labels
associated with its vertices and
edges.
Transactions may not be
connected.
Minimum support threshold .
Output
Frequent subgraphs that satisfy
the minimum support constraint.
Each frequent subgraph is
connected.
Support = 100%
Support = 66%
Support = 66%
DFS Approach
gSpan
Greedy Approach
Subdue
FSG Algorithm
[M. Kuramochi and G. Karypis. Frequent subgraph discovery. ICDM
2001]
size-k
subgraphs to get
(k+1)
Problem
K different size (k-1) subgraphs for a given size-k
graph
If we consider all possible subgraphs, we will end
up
Generating same candidates multiple times
Generating candidates that are not downward closed
Significant slowdown
20
3-candidates:
4-candidates:
frequent
1-subgraphs
frequent
2-subgraphs
3-candidates
frequent
3-subgraphs
4-candidates
...
...
frequent
4-subgraphs
Candidate pruning
To check downward closure property, we need
graph isomorphism.
Frequency counting
Subgraph isomorphism for checking
containment of a frequent subgraph.
Graph Search
Querying graph databases:
Given a graph database and a query graph, find
all the graphs containing this query graph
query graph
graph database
25
Graph Classification
SubStructure based-Basic idea
Extract graph substructures
F {g1,..., g n }
Represent a graph with a feature vector
x {x, 1 ,..., xn }
where xi
is the frequency
in that graph
g i of
Build a classification model
Basic idea:
Map each graph to some significant set of patterns
Define a kernel on the corresponding sets of
patterns
Graph clustering
Decompose a network into
subnetworks based on some
topological properties
Usually we look for dense
subnetworks
27
Graph clustering
Why?
Protein complexes in a PPI network
28
1
3
2
3
2
4
k groups
of
non-overlapping
vertices
k-Spanning
Tree
k
STEPS:
Obtains the Minimum Spanning Tree (MST) of input graph G
Removes k-1 edges from the MST
Results in k clusters
29
7 3
2
4
5
2
4
Note: maximum
possible sum of
edge weights, if
the edge weights
represent
similarity
2
1
4
2
3
2
1
7
2
3 6
2
4
5
4
Weight = 1730
Weight = 13
k-Spanning Tree
2
1
3
2
3
2
4
5
4
E.g., k=3
1
5
2
3 Clusters
3
2
4
5
4
31
Note: k is
the number
of clusters
2 2
1
2 3 4
3 1
Groups
of
non-overlapping
vertices
Shared
Nearest
Neighbor
Clusterin
g
STEPS:
Obtains the Shared Nearest Neighbor Graph (SNN) of input graph G
Removes edges from the SNN with weight less than
32
33
SNN
G
0
0
4
2 2
1
34
2 3 4
3 1
2 2
1
2 3 4
3 1
E.g., =3
0
2
4
3
35
36
Vertex Betweenness
The number of shortest paths in the
graph G that pass through a given node
S
G
Edge Betweenness
The number of shortest paths in the
graph G that pass through given edge
E.g., Sharon and
(S, B)
NCSU
Vertex Betweenness
Clustering
Repeat until
highest vertex
betweenness
1. Disconnect graph
at selected vertex
(e.g., vertex 3 )
2. Copy vertex to
both Components
Select vertex v
with the highest
betweenness
E.g., Vertex 3 with
value 0.67
40
Edge-Betweenness Clustering
Girvan and Newman Algorithm
Given Input Graph G
Repeat until
highest edge
betweenness
Disconnect graph at
selected edge
(E.g., (3,4 ))
41
41
Interpretation of measures
Centrality measureInterpretation in social networks
Degree
Betweenness
Closeness
Eigenvector
44
High linkage
10-20 links/page on average
Power-law degree distribution
47
Bow-tie Structure
THANK YOU