Академический Документы
Профессиональный Документы
Культура Документы
Aspirin
An Internet Web
Co-author network
Outline
Graph Isomorphism, Subgraph Isomorphism Mining frequent graph patterns Graph indexing methods Similairty search in graph databases Biological network analysis
July 22, 2010 4
Motivation
Graph, Subgraph isomorphism is important and very general form of pattern matching that finds practical application in areas such as: pattern recognition and computer vision, image processing, computer-aided design, graph grammars, graph transformation, biocomputing, search operation in chemical database, numerous others.
July 22, 2010 5
A hierarchy of pattern matching problems Graph isomorphism Subgraph isomorphism Maximum common subgraph Approximate subgraph isomorphism Graph edit distance
Isomorphic Graphs
Graph Isomorphism
Subgraph Isomorphism
10
11
Outline
Graph Isomorphism, Subgraph Isomorphism Mining frequent graph patterns Graph indexing methods Similairty search in graph databases Biological network analysis
July 22, 2010 12
(A)
(B)
(C)
O N
(1)
July 22, 2010
(2)
14
tree
graph
16
Outline
Mining frequent graph patterns Graph indexing methods Similairty search in graph databases Biological network analysis
17
HO
O N N OH
N N
O
O S N N N O S HO O N O OH
N+
NH
O OH
query graph
July 22, 2010
graph database
18
Scalability Issue
Sequential scan Disk I/O
N
N N O OH
OH O
NH O N N O
O
N
S O N N N O
N+
HO
OH
HO
S O
Subgraph isomorphism (a) testing An indexing mechanism is needed DayLight: Daylight.com (commercial) GraphGrep: Dennis Shasha, et al. PODS'02 Grace: Srinath Srinivasa, et al. ICDE'03
July 22, 2010
(b)
(c)
Query graph
N N
Sample database
19
Indexing Strategy
Query graph (Q) Graph (G) If graph G contains query graph Q, G should contain any substructure of Q Substructure
Remarks Index substructures of a query graph to prune graphs that do not contain these substructures
July 22, 2010 20
Framework
Two steps in processing graph queries
Step 1. Index Construction
Enumerate structures in the graph database, build an inverted index between structures and graphs
Step 2. Query Processing
Enumerate structures in the query graph Calculate the candidate graphs containing these structures Prune the false positive answers by performing subgraph isomorphism test
July 22, 2010 21
Outline
Mining frequent graph patterns Graph indexing methods Similairty search in graph databases Biological network analysis Some recent progress on graph mining
July 22, 2010 22
Graph Clustering
Graph similarity measure Feature-based similarity measure
Each graph is represented as a feature vector The similarity is defined by the distance of their corresponding vectors Frequent subgraphs can be used as features
Graph Classification
Local structure based approach Local structures in a graph, e.g., neighbors surrounding a vertex, paths with fixed length Graph pattern-based approach Subgraph patterns from domain knowledge Subgraph patterns from data mining Kernel-based approach Random walk (Grtner 02, Kashima et al. 02, ICML03, Mah et al. ICML04)
July 22, 2010
24
(a) caffeine
(b) diurobromine
(c) viagra
QUERY GRAPH
25
Select small structures as features in a graph database, and build the featuregraph matrix between the features and the graphs in the database
30
Framework (cont.)
Step 2. Feature Miss Estimation Determine the indexed features belonging to the query graph Calculate the upper bound of the number of features that can be missed for an approximate matching, denoted by J On the query graph, not the graph database
31
Framework (cont.)
Step 3. Query Processing Use the feature-graph matrix to calculate the difference in the number of features between graph G and query Q, FG FQ If FG FQ > J, discard G. The remaining graphs constitute a candidate answer set
July 22, 2010 32
Outline
Mining frequent graph patterns Graph indexing methods Similairty search in graph databases Biological network analysis
33
Biological Networks
Protein-protein interaction network Metabolic network Transcriptional regulatory network Co-expression network Genetic Interaction network
34
f a c e b d g i b k d j h a c e
f h
f j a c e k b d g i k j h
35
f a c e b d g i b k d j h a c e
f h
f j a c e k b d g i k j h
36
c1 c2 cm g1 .1 .2 .2 g2 .4 .3 .4
h j k
f a b d g c e
h j k
b d g i
c1 c2 cm g1 .8 .6 .2 g2 .2 .3 .4
a b
c d
f e
j h k i
a b
f c e h
j k i
d g
. . .
. . .
a c b d g i f e k b h j a c e
. . .
f h i j k d g
c1 c2 cm g1 .9 .4 .1 g2 .7 .3 .5
c1 c2 cm g1 .2 .5 .8 g2 .7 .1 .3
a b
c d
h e
j k
f a b d g i c h e
j k
37
G1
G2
G3
summary graph
G4
G5
G6
38
Step 2
e b d i
MODES
summary graph
Sub()
Observation: If a frequent subgraph is dense, it must be a dense subgraph in the summary graph. However, the reverse is not true.
July 22, 2010 39
c1 c2 cm g1 .1 .2 .2 g2 .4 .3 .4
h j k
f a b d g c e
h j k
b d g i
c1 c2 cm g1 .8 .6 .2 g2 .2 .3 .4
a b
c d
f e
j h k i
a b
f c e d g h
j k i
c1 c2 cm g1 .9 .4 .1 g2 .7 .3 .5
a c b
f e
h i
j k
a c b d g
f e
h i
j k
d g
c1 c2 cm g1 .2 .5 .8 g2 .7 .1 .3
a b
c d
h e
j k
f a b d g i c h e
j k
40
YDR115W
MRP49
MRPL51 ATP12
MRPL37
MRPL38
ACN9
MRPL32 MRPS18
July 22, 2010
MRPL39 FMC1
41
PHB1 PET100
ATP12
PET100 YDR115W
MRPL38
ACN9
MRPL32 MRPS18
MRPL39
FMC1
Brown: YDR115W, FMC1, ATP12, MRPL37, MRPS18 GO:0019538 (protein metabolism; pvalue = 0.001122)
July 22, 2010 42
PHB1 PET100
MRPL51 ATP12
ATP17
MRPL37
MRPL38
ACN9
MRPL39
Red:PHB1,ATP17,MRPL51,MRPL39, MRPL49, MRPL51,PET100 GO:0006091 (generation of precursor metabolites and energy; pvalue=0. 001339)
July 22, 2010 43
Outline
Mining frequent graph patterns Graph indexing methods Similairty search in graph databases Biological network analysis
44
Conclusions
Graph mining has wide applications Frequent and closed subgraph mining methods gSpan and CloseGraph: pattern-growth depth-first search approach Graph indexing techniques: Frequent and discirminative subgraphs as indexing fatures Similairty search in graph databases Indexing and approximate matching help similar subgraph search Biological network analysis Mining coherent, dense, multiple biological networks Many new developments along the line of graph pattern mining
July 22, 2010 45
46