Arm

ASSOCIATION RULE MINING
Presented By :
Prashant Saive Aditya Phatak
WHAT IS ASSOCIATION RULE MINING?
Finding frequent patterns.

TDI
1 2 3
ITEMS
Bread, Peanuts, Milk, Fruit, Jam Bread, Jam, Soda, Chips, Milk, Fruit Steak, Jam, Soda, Chips, Bread Examples : {bread} {milk} {soda} {chips} {bread} {jam}
4
5 6 7
Jam, Soda, Peanuts, Milk, Fruit

Jam, Soda, Chips, Milk, Bread Fruit, Soda, Chips, Milk Fruit, Soda, Peanuts, Milk
Fruit, Peanuts, Cheese, Yogurt
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
APRIORI ALGORITHM
Step 1:
Frequent Itemset Generation.

The Join Step. The Prune Step.
Step 2:
Generating Association Rules from the frequent Itemsets.
TERMINOLOGY
Itemset
A collection of one or more items, e.g., {milk, bread, jam} k-itemset, an itemset that contains k items TDI ITEMSETS
1
2 3 4 5 6 7
Bread, Peanuts, Milk, Fruit, Jam

Bread, Jam, Soda, Chips, Milk, Fruit Steak, Jam, Soda, Chips, Bread Jam, Soda, Peanuts, Milk, Fruit Jam, Soda, Chips, Milk, Bread Fruit, Soda, Chips, Milk Fruit, Soda, Peanuts, Milk
Support count ()
Frequency of occurrence of an itemset ({Milk, Bread}) = 3 ({Soda, Chips}) = 4
Support
Fraction of transactions that contain an itemset s({Milk, Bread}) = 3/8 s({Soda, Chips}) = 4/8
Fruit, Peanuts, Cheese, Yogurt
Frequent Itemset
An itemset whose support is greater than or equal to a minsup threshold
CONTD
Confidence:
The number of times Y appears when X appears. Confidence = ({Bread,Milk}) / ({Bread}) = 75%
Lk :
the set of all large k-itemsets in the DB.
Ck :
a set of candidate large k-itemsets. In the algorithm we will look at, it generates this set, which contains all the k-itemsets that might be large, and then eventually generates the set above.
Minsup & Mincon :

Set by users or domain experts.
APRIORI ADV/DISADV
Advantages: Uses large itemset property. Easily parallelized Easy to implement. Disadvantages: Assumes transaction database is memory resident. Requires up to m database scans.
THE APRIORI ALGORITHM - EXAMPLE

TI D 1 2 3 List of Items I1, I2, I5 I2, I4 I2, I3 Suppose min support count required is 2. Let min confidence is 70%. We have to first find out the frequent itemsets. Then association rules will be generated using min support and min confidence. Consider a database, TID consisting of 9 transactions.
4
5 6 7 8 9
I1, I2, I4
I1, I3 I2, I3 I1, I3 I1, I2 ,I3, I5 I1, I2, I3
STEP 1: GENERATING 1-ITEMSET FREQUENT PATTERN
Scan TID for count of each candidate. Compare candidate support count with minimum support count. The set of frequent 1-itemsets, L1 , consists of the candidate 1- itemsets satisfying minimum support.
In the first iteration of the algorithm, each item is a member of the set of candidate.
Generate C2 candidates from L1.
The algorithm uses L1 Join L1 to generate C2.
Scan TID for the count of each candidate. Compare candidate support count with the min suport count.
Select the candidates that satisfy the min support count.
The generation of the set of candidate 3-itemsets, C3, involves use of the Apriori Property. In order to find C3, we compute L2JoinL2. C3= L2 JoinL2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}. Now, Join stepis complete and Prune stepwill be used to reduce the size of C3. Prune step helps to avoid heavy computation due to large Ck. Apriori Property : If an itemset is not frequent,the superset is not frequent.
The algorithm uses L3 JoinL3to generate a candidate set of 4itemsets, C4. Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}}is not frequent. Thus, C4= , and algorithm terminates, having found all of the frequent items. This completes our Apriori Algorithm. Whats Next ? These frequent itemsets will be used to generate strong association rules( where strong association rules satisfy both minimum support & minimum confidence).
STEP 5:GENERATING ASSOCIATION RULES FROM FREQUENT ITEMSETS
Procedure:
For each frequent itemset l, generate all nonempty subsets of l. For every nonempty subset s of l, output the rule s ->(l-s) if support_count(l) / support_count(s) >= min_conf where min_conf is minimum confidence threshold.
Back To Example:
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4}, {I2,I5}, {I1,I2,I3}, {I1,I2,I5}}. Lets take l = {I1,I2,I5}. Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
CONTD

Let minimum confidence threshold is , say 70%. The resulting association rules are shown below, each listed with its confidence.
R1: (I1 , I2)->I5

Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50% R1 is Rejected. Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100% R2 is Selected. Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100% R3 is Selected.
R2: (I1 , I5)->I2

R3: (I2 , I5)->I1

CONTD
R4: I1->(I2 , I5)

Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33% R4 is Rejected. Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29% R5 is Rejected. Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100% R6 is Selected.
R5: I2->(I1 , I5)

R6: I5->(I1 , I2)

In this way, We have found three strong association rules.
METHODS TO IMPROVE APRIORIS EFFICIENCY
Hash-based itemset counting: Transaction reduction:
Partitioning
Sampling Dynamic itemset counting:

Arm

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Arm

Загружено:

Авторское право:

Доступные форматы

ASSOCIATION RULE MINING

Prashant Saive Aditya Phatak

WHAT IS ASSOCIATION RULE MINING?

Finding frequent patterns.

Jam, Soda, Peanuts, Milk, Fruit

Fruit, Peanuts, Cheese, Yogurt

Frequent Itemset Generation.

Generating Association Rules from the frequent Itemsets.

Bread, Peanuts, Milk, Fruit, Jam

Fruit, Peanuts, Cheese, Yogurt

Minsup & Mincon :

THE APRIORI ALGORITHM - EXAMPLE

STEP 1: GENERATING 1-ITEMSET FREQUENT PATTERN

STEP 2: GENERATING 2-ITEMSET FREQUENT PATTERN

Generate C2 candidates from L1.

The algorithm uses L1 Join L1 to generate C2.

Select the candidates that satisfy the min support count.

STEP 3: GENERATING 3-ITEMSET FREQUENT PATTERN

STEP 4: GENERATING 4-ITEMSET FREQUENT PATTERN

STEP 5:GENERATING ASSOCIATION RULES FROM FREQUENT ITEMSETS

R2: (I1 , I5)->I2

R3: (I2 , I5)->I1

R5: I2->(I1 , I5)

R6: I5->(I1 , I2)

In this way, We have found three strong association rules.

METHODS TO IMPROVE APRIORIS EFFICIENCY

Hash-based itemset counting: Transaction reduction:

Вам также может понравиться