Академический Документы
Профессиональный Документы
Культура Документы
ECML-PKDD 2009
Bled, september 8th, 2009
Pattern Mining in Data Streams
Our setting
Patterns: Objects where
“p subpattern of q”
2 / 24
XML Trees
<?xml version= " 1 . 0 " ?>
<?xml−s t y l e s h e e t t y p e = " t e x t / x s l " h r e f = " movie1 . x s l " ?>
<filmlibrary>
<name> C l a s s i c F i l m s < / name>
<movie>
< t i t l e >The B i c y c l e T h i e f < / t i t l e >
<genre> S o c i a l Drama< / genre>
<language> I t a l i a n < / language>
<year >1948< / year >
< l e n g t h >90 Min . < / l e n g t h >
< f i l m t y p e >BW< / f i l m t y p e >
<credits>
< d i r e c t o r > V i t t o r i o de Sica < / d i r e c t o r >
< s t o r y >Gennarino B a r t o l i n i < / s t o r y >
<cinematography> C a r l o M o n t u o r i < / cinematography>
</ credits>
< / movie>
</ filmlibrary>
3 / 24
Issues
Pattern Classification
Mapping patterns → features
Frequent patterns, closed patterns, generators, . . .
4 / 24
Our Work
5 / 24
Tree (Pattern) Mining
6 / 24
Previous Work
7 / 24
Closure Operator on Trees
Definition
We define the following Galois connection pair:
For finite A ⊆ D
σ (A) is the set of subtrees of the A trees in T
σ (A) = {t ∈ T ∀ t 0 ∈ A (t t 0 )}
For finite B ⊂ T
τD (B) is the set of supertrees of the B trees in D
τD (B) = {t 0 ∈ D ∀ t ∈ B (t t 0 )}
8 / 24
Closure Operator on Trees (2)
Closure Operator
The composition CD = σ ◦ τD is a closure operator
9 / 24
Closure Operator on Trees (3)
Theorem
Let D be a pattern dataset. A pattern t is closed for D if and
only if the intersection of all its closed superpatterns is t.
10 / 24
Incremental Algorithm
Computing the lattice of frequent trees
Construct empty lattice L;
Repeat
Collect batch of B trees;
Build closed tree lattice for B, L2 ;
L := merge(L,L2 ) (using addition rule)
11 / 24
Incremental Algorithm
Computing the lattice of frequent trees
Construct empty lattice L;
Repeat
Collect batch of B trees;
Build closed tree lattice for B, L2 ;
L := merge(L,L2 ) (using addition rule)
11 / 24
Dealing with time changes
12 / 24
Miner is interesting in itself
13 / 24
Maximal Trees
Maximal Trees
A tree is maximal if no supertree of t is frequent
All maximal trees are closed
14 / 24
XML Tree Classification
on evolving data streams
D D D D
B B B B B B B
C C C C C C
A A
15 / 24
XML Tree Classification
on evolving data streams
Closed Maximal
Trees Trees
Id Tree c1 c2 c3 c4 c1 c2 c3 Class
1 1 1 0 1 1 1 0 C LASS 1
2 0 0 1 1 0 0 1 C LASS 2
3 1 0 1 1 1 0 1 C LASS 1
4 0 1 1 1 0 1 1 C LASS 2
16 / 24
XML Tree Framework on evolving data
streams
Two components:
17 / 24
WEKA: the bird
18 / 24
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
19 / 24
MOA: the software
20 / 24
Experiments: Synthetic datasets
21 / 24
Experiments: Real dataset
Maximal Closed
# Trees Att. Acc. Mem. Att. Acc. Mem.
CSLOG12 15483 84 79.64 1.2 228 78.12 2.54
CSLOG23 15037 88 79.81 1.21 243 78.77 2.75
CSLOG31 15702 86 79.94 1.25 243 77.60 2.73
CSLOG123 23111 84 80.02 1.7 228 78.91 4.18
22 / 24
Conclusions
23 / 24
Future Work
24 / 24