Академический Документы
Профессиональный Документы
Культура Документы
Repoussis Panagiotis
Overview
• Given:
― a set I of all the items;
― a database D of transactions;
― minimum support s;
― minimum confidence c;
• Find:
― all association rules X Y with a minimum
support s and confidence c.
Problem Decomposition
L1 = {frequent items};
for (k = 1; Lk !=; k++) do
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in
Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
return k Lk;
The Apriori Algorithm — Example
Min support =50%
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
How to Generate Candidates
Input: Li-1 : set of frequent itemsets of size i-1
Output: Ci : set of candidate itemsets of size i
Ci = empty set;
for each itemset J in Li-1 do
for each itemset K in Li-1 s.t. K<> J do
if i-2 of the elements in J and K are equal then
if all subsets of {K J} are in Li-1 then
Ci = Ci {K J}
return Ci;
Example of Generating Candidates
• L3={abc, abd, acd, ace, bcd}
• Generating C4 from L3
– abcd from abc and abd
– acde from acd and ace
• Pruning:
– acde is removed because ade is not in L3
• C4={abcd}
Example of Discovering Rules
Let use consider the 3-itemset {I1, I2, I5}:
I1 I2 I5
I1 I5 I2
I2 I5 I1
I1 I2 I5
I2 I1 I5
I5 I1 I2
Discovering Rules
p( X Y )
conf ( X Y ) p( X ) p( X Y )
lift ( X Y )
p(Y ) p (Y ) p ( X ) p (Y )
• But, the lift measure is symmetric; i.e., it does not take into
account the direction of implications!
Alternative Measures for Association
Rules
• The conviction measure indicates the departure from
independence of X and Y taking into account the
implication direction. The conviction of X Y is :
p( X ) p(Y )
conv( X Y )
p( X Y )
Summary
1. Association Rules form an very applied data mining
approach.
2. Association Rules are derived from frequent itemsets.
3. The Apriori algorithm is an efficient algorithm for
finding all frequent itemsets.
4. The Apriori algorithm implements level-wise search
using frequent irem property.
5. The Apriori algorithm can be additionally optimised.
6. There are many measures for association rules.