Академический Документы
Профессиональный Документы
Культура Документы
Today Objective
If the minimum confidence is 50%, then the only two rules generated
from this 2-itemset, that have confidence greater than 50%, are:
Trouser Jacket Support=50%, Confidence=66%
Jacket Trouser Support=50%, Confidence=100%
Indian Institute of Management (IIM),Rohtak
Association Rule a concept of Mining
• Transaction T: T is subset of I
• Problem: Find rules that have support and confidence greater that
user-specified minimum support and minimun confidence
Key Concepts :
• Frequent Itemsets: The sets of item
which has minimum support (denoted
by Li for ith-Itemset).
• Join Operation: To find Lk , a set of
candidate k-itemsets is generated by
joining Lk with itself.
• Apriori Property: Any subset of
frequent itemset must be frequent.
Indian Institute of Management (IIM),Rohtak
Understanding Apriori through an Example
C1 L1
C2
Indian Institute of Management (IIM),Rohtak
Step 3: Generating 3-itemset Frequent Pattern
Itemset Sup
Generate candidate set C3 using L2 (join step).
Count
{I1, I2} 4
Condition of joining Lk-1 and Lk-1 is that it should
{I1, I3} 4 have (K-2) elements in common. So here, for L2,
{I1, I5} 2 first element should match.
{I2, I3} 4 •The generation of the set of candidate 3-
{I2, I4} 2 itemsets, C3 , involves use of the Apriori
{I2, I5} 2
Property.
•C3 = L2 Join L2={{I1, I2, I3},{I1, I2, I5},{I1, I3, I5},{I2, I3, I4}, {I2, I3, I5},{I2, I4,I5}}.
If we go for all
•C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I2, I4},
{I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
•Now, Join step is complete and Prune step will be used to
reduce the size of C3. Prune step helps to avoid heavy
computation due to large Ck.
Indian Institute of Management (IIM),Rohtak
Step 3: Generating 3-itemset Frequent Pattern [Cont.]
• Based on the Apriori property that all subsets of a frequent Itemset Sup
itemset must also be frequent, we can determine that four Count
candidates cannot possibly be frequent. How ? {I1, I2} 4
• For example , lets take {I1, I2, I3}. The 2-item subsets of it {I1, I3} 4
are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I5} 2
{I1, I2, I3} are members of L2, We will keep {I1, I2, I3} in {I2, I3} 4
C3. {I2, I4} 2
• Lets take another example of {I2, I3, I5} which shows how {I2, I5} 2
the pruning is performed. The 2-item subsets are {I2, I3}, {I2, Itemset Sup.
I5} & {I3,I5}. Count
{I1, I2} 4
• BUT, {I3, I5} is not a member of L2 and hence it is not
{I1, I3} 4
frequent violating Apriori Property. Thus We will have to
{I1, I4} 1
remove {I2, I3, I5} from C3. {I1, I5} 2
• Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for {I2, I3} 4
all members of result of Join operation for Pruning. {I2, I4} 2
{I2, I5} 2
• Now, the transactions in D are scanned in order to determine
{I3, I4} 0
L3, consisting of those candidates 3-itemsets in C3 having
{I3, I5} 1
minimum support. {I4, I5} 0
Compare
Scan D for Scan D for Itemset Sup. candidate Itemset Sup
count of Itemset count of support count
Count with min
Count
each each
candidate {I1, I2, I3} candidate {I1, I2, I3} 2 support count {I1, I2, I3} 2
{I1, I2, I5} {I1, I2, I5} 2
{I1, I2, I5} 2
C3 C3 L3
• Back To Example:
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4},
{I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
– Lets take l = {I1,I2,I5}.
– Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
Therefor ,the set of all frequent item sets are {A},{B},{D},{A B},{A
D},{B D},{A B D}
Sugar->egg
milk->bread
Bread->milk
Milk,egg->bread
Egg,bread->milk
Indian Institute of Management (IIM),Rohtak
Minimum support count = 2, minimum confidence threshold = 80%,
Assume that the confidence of the decision rule, 1-> 2, is 100%. Is the
confidence of the decision rule, 2-> 1, also 100%? Give an example
of data to justify your answer.
Transaction Items
1 1,2
2 1,2,3
3 2,3
4 1,2,4
5 2,3,4
Assume that the confidence of the decision rule, 1-> 2, is 100%. Is the
confidence of the decision rule, 2-> 1, also 100%? Give an example
of data to justify your answer.
Transaction Items
1 1,2
Confidence for 1->2 is 100%
2 1,2,3 while confidence for 2->1 is 60%
3 2,3
4 1,2,4
5 2,3,4
End