Group 4 Advanced Pattern Mining: Vu Manh Cam Nguyen Quy Ky Nguyen Luong Anh Tuan Nguyen Kim Chinh

Group 4
Advanced Pattern
Mining
Vu Manh Cam
Nguyen Quy Ky Nguyen
Luong Anh Tuan
Nguyen Kim Chinh
OutLine
Pattern Pruning
Data Pruning
Pattern Fusion
Patter clustering
Pruning Pattern Space with Pattern Pruning

Constraints
Basis: Apriori Property
All nonempty subsets of a frequent itemset must also be

frequent

Constraints
Categories of pattern mining constraints:
Anti-monotonic Constraints
Monotonic Constraints
Succinct Constraints
Convertible Constraints

Constraints
Anti-monotonic Constraints
A constraint Ca is anti-monotonic if for any pattern S not

satisfying Ca, none of the super-patterns of S can satisfy Ca.
Example: sum(S.Price) <= value
Monotonic Constraints
A constraint Cm is monotonic if. for any pattern S satisfying

Cm, every super-pattern of S also satisfies it.
Example : sum(S.Price) >= value

Constraints
Succinct Constraints
A constraint Cs is succinct if all and only those patterns that

satisfy the Cs can be precisely generated, even before support
counting begins.
Example : min(S.Price) <= value

Constraints
Convertible Constraints
Constraints which are not anti-monotonic, monotonic or

succinct.
Become anti-monotonic or monotonic if the items in the

pattern are arranged in a particular order.
Example:
If the items are arranged in ascending order then avg(I.price) <=

100 is a convertible anti-monotonic constraint.
If the items are arranged in descending order then avg(I.price) <=

100 is a convertible monotonic constraint.

Constraints
Data-space
pruning
Prunes pieces of data if they will not contribute to the

subsequent generation of satisfiable patterns in the
mining process.
2 properties:
Data Succinctness
Data Anti-monotonicity
Data-Succinctness
A constraint is data-succinct if it can be used at the

beginning of a pattern mining process
Example: All pattern must contain Digital Camera
Any transaction that does not contain digital camera can

be pruned at the beginning of the mining process
Effectively reduces the data set to be examined.
A constraint is data-antimonotonic if during the

mining process, if a data entry cannot satisfy a
data-antimonotonic constraint based on the
current pattern, then it can be pruned
Example
Constraint C1 (monotonic): sum(I .price) $100
Current Itemset S: $50
Current Transaction Ti: {i2.price = $5, i5.price = $10, i8.price =

$20}
Ti + S not satisfy C1
Ti can be pruned
Note that we cannot use this technique in the beginning of the

mining process. Instead, they should be done at each iteration.
Example
Constraint C2 (anti-monotonic): sum(I .price) $100
If S > $100, it can be pruned.
Transaction Ti can be pruned, too
Prune both pattern and data

More powerful than Monotonic Constraint
Example
Constraint C3 (neither pattern monotonic nor antimonotonic): avg(I.price) 10
C3 could be data anti-monotonic depends on Ti

=> Ti could be pruned as well.
Note that Data anti-monotonicity is confined to

pattern growth-based Algorithm
It cannot be used for pruning the data space if the

Apriori algorithm is used.
Minning Colossal Pattern by Pattern

Fusion
Data Mining: Concepts and Techniques
3/7/15
17
Introductions
Bioinformatics : DNA or microarray data analysis -> colossal pattern [# large

support set]
Challenging : mining tends to get trapped when explosive midsize

patterns
D[m, n] when n very large but m is 100 -> 1000 => New mining strategy
Pattern-Fusion
18
Pattern- Fusion method
Traverses the tree in a bounded-breadth way
Only a fixed number of patterns in a bounded-size
Avoid problem of exponential search space
Designed to give an approximation to the colossal patterns
19
Core pattern
Core Patterns
Intuitively, for a pattern , a subpattern is a -core pattern of if
shares a similar support set with , i.e.,
| D |
| D |
0 1
where is called the core ratio
Robustness
Pattern apha is (d,t) robust if d is
20
Example: Core Patterns
Transaction (# of
Ts)
Core Patterns ( = 0.5)
(abe) (100)
(abe), (ab), (be), (ae), (e)
(bcf) (100)
(bcf), (bc), (bf)
(acf) (100)
(acf), (ac), (af)
(abcef) (100)
(ab), (ac), (af), (ae), (bc), (bf), (be) (ce), (fe), (e),
(abc), (abf), (abe), (ace), (acf), (afe), (bcf), (bce),
(bfe), (cfe), (abcf), (abce), (bcfe), (acfe), (abfe),
(abcef)
21
Idea of Pattern-Fusion Algorithm
Generate a complete set of frequent patterns up to a

small size
Randomly pick a pattern , and has a high probability

to be a core-descendant of some colossal pattern
Identify all s descendants in this complete set, and

merge all of them This would generate a much larger
core-descendant of
In the same fashion, we select K patterns. This set of

larger core-descendants will be the candidate pool for
the next iteration
22
Pattern-Fusion: The Algorithm
Initialization (Initial pool): Use an existing algorithm to mine

all frequent patterns up to a small size, e.g., 3
Iteration (Iterative Pattern Fusion):
At each iteration, k seed patterns are randomly picked

from the current pattern pool
For each seed pattern thus picked, we find all the patterns
within a bounding ball centered at the seed pattern
All these patterns found are fused together to generate a

set of super-patterns. All the super-patterns thus
generated form a new pool for the next iteration
Termination: when the current pool contains no more than K

patterns at the beginning of an iteration
23
Why Is Pattern-Fusion Efficient?
A bounded-breadth pattern
tree traversal
It avoids explosion in
mining mid-sized ones
Randomness comes to help

to stay on the right path
Ability to identify short-cuts

and take leaps
fuse small patterns

together in one step to
generate new patterns of
significant sizes
Efficiency
24
Mining Compressed Frequent-Pattern

Sets By Pattern clustering
Introduction
Frequent Pattern Mining
Minimum Support: 2
(b) : 3
(a, b, c, d)
(a) : 2
(a, b) : 2
(a, b, d, e)
(a, d) : 2
(b, e, f)
(d) : 2
(b, d) : 2
(e) : 2
(b, e) : 2
(a, b, d) : 2
26
Compressing Frequent
Patterns
Our compressing framework
Clustering frequent patterns by pattern similarity
Pick a representative pattern for each cluster
Key Problems
Need a distance function to measure the similarity between patterns
The quality of the clustering needs to be controllable
The representative pattern should be able to describe both

expressions and supports of other patterns
Efficiency is always desirable
27
Distance Measure
Let P1 and P2 are two closed frequent patterns, T(P) is

the set of raw data which contains P, the distance
between P1 and P2 is:
Let T(P1)={t1,t2,t3,t4,t5}, T(P2)={t1,t2,t3,t4,t6}, then

D(P1,P2)=1-4/6=1/3
D is a valid distance metric
D characterizes the support, but ignore the expression
28
Clustering Criterion
General clustering approach (i.e., k-means):
Directly apply the distance measure
No guarantee on the quality of the clusters
The representative pattern may not exist in a cluster
-clustering
For each pattern P, Find all patterns which can be

expressed by P and their distance to P are within (cover)
All patterns in the cluster can be represented by P
29

Group 4 Advanced Pattern Mining: Vu Manh Cam Nguyen Quy Ky Nguyen Luong Anh Tuan Nguyen Kim Chinh

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Group 4 Advanced Pattern Mining: Vu Manh Cam Nguyen Quy Ky Nguyen Luong Anh Tuan Nguyen Kim Chinh

Загружено:

Авторское право:

Доступные форматы

Group 4

Pruning Pattern Space with Pattern Pruning

Basis: Apriori Property

All nonempty subsets of a frequent itemset must also be

Pruning Pattern Space with Pattern Pruning

Categories of pattern mining constraints:

Pruning Pattern Space with Pattern Pruning

Categories of pattern mining constraints:

A constraint Ca is anti-monotonic if for any pattern S not

Example: sum(S.Price) <= value

A constraint Cm is monotonic if. for any pattern S satisfying

Example : sum(S.Price) >= value

Pruning Pattern Space with Pattern Pruning

Categories of pattern mining constraints:

A constraint Cs is succinct if all and only those patterns that

Example : min(S.Price) <= value

Pruning Pattern Space with Pattern Pruning

Categories of pattern mining constraints:

Constraints which are not anti-monotonic, monotonic or

Become anti-monotonic or monotonic if the items in the

If the items are arranged in ascending order then avg(I.price) <=

If the items are arranged in descending order then avg(I.price) <=

Pruning Pattern Space with Pattern Pruning

Prunes pieces of data if they will not contribute to the

A constraint is data-succinct if it can be used at the

Example: All pattern must contain Digital Camera

Any transaction that does not contain digital camera can

Effectively reduces the data set to be examined.

A constraint is data-antimonotonic if during the

Constraint C1 (monotonic): sum(I .price) $100

Current Itemset S: $50

Current Transaction Ti: {i2.price = $5, i5.price = $10, i8.price =

Note that we cannot use this technique in the beginning of the

Constraint C2 (anti-monotonic): sum(I .price) $100

If S > $100, it can be pruned.

Transaction Ti can be pruned, too

Prune both pattern and data

Constraint C3 (neither pattern monotonic nor antimonotonic): avg(I.price) 10

C3 could be data anti-monotonic depends on Ti

Note that Data anti-monotonicity is confined to

It cannot be used for pruning the data space if the

Minning Colossal Pattern by Pattern

Data Mining: Concepts and Techniques

Bioinformatics : DNA or microarray data analysis -> colossal pattern [# large

Challenging : mining tends to get trapped when explosive midsize

Pattern- Fusion method

Traverses the tree in a bounded-breadth way

Only a fixed number of patterns in a bounded-size

Avoid problem of exponential search space

Designed to give an approximation to the colossal patterns

where is called the core ratio

Example: Core Patterns

Core Patterns ( = 0.5)

(abe), (ab), (be), (ae), (e)

(bcf), (bc), (bf)

(acf), (ac), (af)

Idea of Pattern-Fusion Algorithm

Generate a complete set of frequent patterns up to a

Randomly pick a pattern , and has a high probability

Identify all s descendants in this complete set, and

In the same fashion, we select K patterns. This set of

Pattern-Fusion: The Algorithm