Академический Документы
Профессиональный Документы
Культура Документы
OVERVIEW
Basic concepts and a roadmap Scalable frequent itemset mining methods APRIORI ALGORITHM: A Candidate generation test and approach PATTERN GROWTH APPROACH: Mining Frequent Patterns Without Candidate Generation CHARM / ECLAT: Mining by Exploring Vertical Data Format Summary and Conclusions
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis.
itemset: A set of one or more items k-itemset X = {x1, , xk} (absolute) support, or, support count of X: Frequency or occurrence of an itemset X (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X) An itemset X is frequent if Xs support is no less than a minsup threshold
Freq. Pat.: Tea:3, Nuts:3, Napkins:4,Eggs:3, {Tea, Napkins}:3 Association rules: (many more!)
Tea Napkins (60%, 100%) Napkins Tea (60%, 75%)
Major philosophy: Grow long patterns from short ones using local frequent items only abc is a frequent pattern
Get all transactions having abc, i.e., project DB on abc: DB|abc d is a local frequent item in DB|abc abcd is a frequent pattern
Consider the following set of transactions Let the minimum Support count be 3
Scan DB once, find frequent 1-itemset (single item pattern) Sort frequent items in frequency descending order, f-list Scan DB again, construct FP-tree
FP-Tree for the Example transaction: Header Table Item head f c a b m p frequency 4 4 3 3 3 3 f:4 c:3 a:3 m:2 p:2 b:1 m:1
F-list=f-c-a-b-m-p
{} f:4 c:3 a:3 m:2 p:2 b:1 m:1 b:1 c:1 b:1 p:1
All frequent patterns relate to m m, fm, cm, am, fcm, fam, cam, fcam -> associations
m-conditional FP-tree
ALGORITHM FOR MINING FREQUENT ITEMSETS USING FP-TREE BY PATTERN FREQUENT GROWTH
Input: D, a transaction database; minsup, the minimum support count threshold. Output: The complete set of frequent patterns. Method: 1. The FP-tree is constructed in the following steps:
a)
Scan the transaction database D once. Collect F, the set of frequent items, and their support counts Sort F in support count descending order as L, the list of frequent items.
b)
Create the root of an FP-tree, and label it as null. For each transaction Trans in D do the following:
Select and sort the frequent items in Trans according to the order of L. Let the sorted frequent item list in Trans be[p|P], where p is the first element and P is the remaining list.
Call insert tree([p|P], T which is performed as follows: If T has a child N such that N.item-name=p.item-name, then increment Ns count by 1; else create a new node N, and let its count be 1, its parent link be linked to T, and node-link to the nodes with the same item-name via the node-link structure. If P is nonempty, call insert tree(P, N)recursively.
ALGORITHM FOR MINING FREQUENT ITEMSETS USING FP-TREE BY PATTERN FREQUENT GROWTH
2. The FP-tree is mined by calling FPgrowth(FPtree, null), which is implemented as follows. procedure FPgrowth(Tree , ) if Tree contains a single path P then { for each combination (denoted as ) of the nodes in the path P { generate pattern U with support count=minimum support_count of nodes in ; } }
else for each ai in the header of Tree { generate pattern = ai U with support_count = ai:support_count; Construct s conditional pattern base and then s conditional FP_tree Tree ; if Tree then call FPgrowth(Tree, ); }
Other factors
No candidate generation, no candidate test Compressed database: FP-tree structure No repeated scan of entire database Basic ops: counting local freq items and building sub FP-tree, no pattern search and matching