Вы находитесь на странице: 1из 93

Unsupervised Learning

Clustering

Unsupervised classification, that is,


without the class attribute
Want to discover the classes

Association Rule Discovery

Discover correlation
Data Mining and Kno

The Clustering Process

Pattern representation

Definition of pattern proximity


measure

Clustering

Data abstraction

Cluster validation
Data Mining and Kno

Pattern Representation

Number of classes
Number of available patterns

Feature selection

Circles, ellipses, squares, etc.


Can we use wrappers and filters?

Feature extraction

Produce new features


E.g., principle component analysis (PCA)
Data Mining and Kno

Pattern Proximity

Want clusters of instances that are


similar to each other but dissimilar to
others
Need a similarity measure
Continuous case

Euclidean measure (compact isolated clusters)


1
T
The squared
Mahalanobis
distance
d M ( x i , x j ) ( x i x j ) ( x i x j )
alleviates problems with correlation
Many more measures

Data Mining and Kno

Pattern Proximity

Nominal attributes
nx
d (xi , x j )
n
n Number of attributes
x Number of attributes that are the same

Data Mining and Kno

Clustering Techniques
Clustering

Hierarchical

Single
Link

Partitional

Complete
Link

CobWeb

Square
Error

Mixture
Maximization

K-means

Expectation
Maximization

Data Mining and Kno

Technique Characteristics

Agglomerative vs Divisive

Agglomerative: each instance is its own


cluster and the algorithm merges clusters

Divisive: begins with all instances in one


cluster and divides it up

Hard vs Fuzzy

Hard clustering assigns each instance to one


cluster whereas in fuzzy clustering assigns
degree of membership

Data Mining and Kno

More Characteristics

Monothetic vs Polythetic

Polythetic: all attributes are used simultaneously,


e.g., to calculate distance (most algorithms)

Monothetic: attributes are considered one at a time

Incremental vs Non-Incremental

With large data sets it may be necessary to consider


only part of the data at a time (data mining)

Incremental works instance by instance

Data Mining and Kno

Hierarchical Clustering
Dendrogram
S
F

C
B

DE

G
i
m
i
l
a
r
i
t
y
A B

Data Mining and Kno

C D E F

Hierarchical Algorithms

Single-link

Distance between two clusters set equal to the


minimum of distances between all instances
More versatile
Produces (sometimes too) elongated clusters

Complete-link

Distance between two clusters set equal to


maximum of all distances between instances in
the clusters
Tightly bound, compact clusters
Often more useful in practice

Data Mining and Kno

10

Example: Clusters Found


Single-Link

Complete-Link

1 1
1 1 1
1
*
1
1
1 11

1 1
1 1 1
1
*
1
1
1 11

2
*

2* 2
2

Data Mining and Kno

2* 2
2

2
2

2
2

2
2 2

2
*

2
2

2
2

2
2 2

11

Partitional Clustering

Output a single partition of the


data into clusters
Good for large data sets
Determining the number of
clusters is a major challenge

Data Mining and Kno

12

K-Means
Predetermined
number of clusters
Start with seed
clusters of one
element

Seeds
Data Mining and Kno

13

Assign Instances to
Clusters

Data Mining and Kno

14

Find New Centroids

Data Mining and Kno

15

New Clusters

Data Mining and Kno

16

Discussion: k-means

Applicable to fairly large data sets


Sensitive to initial centers

Use other heuristics to find good


initial centers

Converges to a local optimum


Specifying the number of centers
very subjective
Data Mining and Kno

17

Clustering in Weka

Clustering algorithms in Weka


K-Means
Expectation Maximization (EM)
Cobweb

hierarchical, incremental, and


agglomerative

Data Mining and Kno

18

CobWeb

Algorithm (main) characteristics:

Hierarchical and incremental


Uses category utility

The k clusters

CU C1 , C2 ,..., Ck

Improvement in probability estimate


because of instance cluster assigment


2
2

Pr
C
Pr
a

v
|
C

Pr
a

v
l i ij l
i
ij
l

Why divide by k?
Data Mining and Kno

All possible values


for attribute ai

19

Category Utility

If each instance in its own cluster


1 vij actual value of instance
Pr ai vij | Cl
otherwise
0
Category utility function becomes

n Pr ai vij

CU C1 , C2 ,..., Ck
Without k it would always kbe best for each
instance to have its own cluster,
overfitting!
i

Data Mining and Kno

20

The Weather Problem


Outlook Temp. Humidity Windy
Sunny
Hot
High
FALSE
Sunny
Hot
High
TRUE
Overcast
Hot
High
FALSE
Rainy
Mild
High
FALSE
Rainy
Cool
Normal FALSE
Rainy
Cool
Normal TRUE
Overcast Cool
Normal TRUE
Sunny
Mild
High
FALSE
Sunny
Cool
Normal FALSE
Rainy
Mild
Normal FALSE
Sunny
Mild
Normal TRUE
Overcast Mild
High
TRUE
Overcast
Hot
Normal FALSE
Rainy
Mild
High
TRUE

Data Mining and Kno

Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No

21

Weather Data (without


Play)

Label instances: a,b,.,n

Start by putting
the first instance
in its own cluster

Add another instance


in its own cluster

Data Mining and Kno

22

Adding the Third Instance


Evaluate the category utility of adding the instance to one
of the two clusters versus adding it as its own cluster

b
a

a
b

c
Highest utility

Data Mining and Kno

23

Adding Instance f
First instance not to get
its own cluster:
a

d
e

Look at the instances:


Rainy Cool Normal FALSE
Rainy Cool Normal TRUE
Quite similar!
Data Mining and Kno

24

Add Instance g
Look at the instances:
E) Rainy Cool Normal FALSE
F) Rainy Cool Normal TRUE
G) Overcast Cool Normal TRUE

d
e

Data Mining and Kno

g
25

Add Instance h
Look at the instances:
A) Sunny Hot High FALSE
D) Rainy Mild High FALSE
H) Sunny Mild High FALSE
Rearrange:
Merged into a
single cluster
before h is added

b
a

Runner up

Best matching node

c
e

g
(Splitting is also possible)

Data Mining and Kno

26

Final Hierarchy

g
a

c
b

i
What next?

Data Mining and Kno

27

Dendrogram Clusters

g
a

c
b

What do a, b, c, d, h, k, and l
have in common?

Data Mining and Kno

28

Numerical Attributes

Assume normal distribution

1
1
1
l Pr Cl 2 i
il
i

CU C1 , C2 ,..., Ck
k

Problems with zero variance!


The acuity parameter imposes a minimum
variance

Data Mining and Kno

29

Hierarchy Size (Scalability)

May create very large hierarchy

The cutof parameter is uses to


suppress growth
If
CU C1 , C2 ,..., Ck Cutoff
cut node off.

Data Mining and Kno

30

Discussion

Advantages

Incremental scales to large number of instances


Cutoff limits size of hierarchy
Handles mixed attributes

Disadvantages

Incremental sensitive to order of instances?


Arbitrary choice of parameters:

divide by k,
artificial minimum value for variance of numeric
attributes,
ad hoc cutoff value

Data Mining and Kno

31

Probabilistic Perspective

Most likely set of clusters given data


Probability of each instance belonging
to a cluster
Assumption: instances are drawn from
one of several distributions
Goal: estimate the parameters of these
distributions
Usually: assume distributions are normal
Data Mining and Kno

32

Mixture Resolution

Mixture: set of k probability distributions


Represent the k clusters
Probabilities that an instance takes
certain attribute values given it is in the
cluster
What is the probability an instance
belongs to a cluster (or a distribution)
Data Mining and Kno

33

One Numeric Attribute


Two cluster mixture model:
Cluster B
Cluster A

Attribute
Given some data, how can you determine the parameters:

A Mean for Cluster A


A Standard deviation for Cluster A
B Mean for Cluster B
B Standard deviation for Cluster B
p A Probability of being in Cluster A

Data Mining and Kno

34

Problems

If we knew which instance came from


each cluster we could estimate these
values
If we knew the parameters we could
calculate the probability that an
Pr x | A Pr[ A] f ( x; A , A ) p A
Pr A | x belongs

cluster
instance
to each
Pr[ x]

1
f ( x; A , A )
e
2

Pr[ x]

( x )2
2 2

Data Mining and Kno

35

EM Algorithm

Expectation Maximization (EM)

Start with initial values for the parameters


Calculate the cluster probabilities for each
instance
Re-estimate the values for the parameters
Repeat

General purpose maximum likelihood


estimate algorithm for missing data

Can also be used to train Bayesian networks


(later)

Data Mining and Kno

36

Beyond Normal Models

More than one class:

More than one numeric attribute

Straightforward
Easy if assume attributes independent
If dependent attributes, treat them
jointly using the bivariate normal

Nominal attributes

No more normal distribution!


Data Mining and Kno

37

EM using Weka

Options

numClusters: set number of clusters.

Default = -1 selects it automatically

maxIterations: maximum number of


iterations
seed -- random number seed
minStdDev -- set minimum allowable
standard deviation
Data Mining and Kno

38

Other Clustering

Artificial Neural Networks (ANN)


Random search

Genetic Algorithms (GA)

GA used to find initial centroids for k-means

Simulated Annealing (SA)


Tabu Search (TS)

Support Vector Machines (SVM)


Will discuss GA and SVM later
Data Mining and Kno

39

Applications

Image segmentation

Object and Character Recognition

Data Mining:

Stand-alone to gain insight into the data

Preprocess before classification that


operates on the detected clusters
Data Mining and Kno

40

DM Clustering Challenges

Data mining deals with large databases


Scalability with respect to number of
instance

Dealing with mixed data

Use a random sample (possible bias)


Many algorithms only make sense for numeric
data

High dimensional problems

Can the algorithm handle many attributes?


How do we interpret a cluster in high dimensions?

Data Mining and Kno

41

Other (General)
Challenges

Shape of clusters
Minimum domain knowledge (e.g.,
knowing the number of clusters)
Noisy data
Insensitivity to instance order
Interpretability and usability
Data Mining and Kno

42

Clustering for DM

Main issue is scalability to large


databases

Many algorithms have been developed


for scalable clustering:

Partitional methods: CLARA, CLARANS

Hierarchical methods: AGNES, DIANA,


BIRCH, CURE, Chameleon
Data Mining and Kno

43

Practical Partitional
Clustering Algorithms

Classic k-Means (1967)


Work from 1990 and later:
k-Medoids

Uses the mediod instead of the centroid


Less sensitive to outliers and noise
Computations more costly
PAM (Partitioning Around Mediods)
algorithm
Data Mining and Kno

44

Large-Scale Problems

CLARA: Clustering LARge Applications

Select several random samples of instances


Apply PAM to each
Return the best clusters

CLARANS:

Similar to CLARA
Draws samples randomly while searching
More effective than PAM and CLARA

Data Mining and Kno

45

Hierarchical Methods

BIRCH: Balanced Iterative Reducing


and Clustering using Hierarchies

Clustering feature: triplet summarizing


information about subclusters

Clustering feature tree: height-balanced


tree that stores the clustering features

Data Mining and Kno

46

BIRCH Mechanism

Phase I:

Phase II:

Scan database to build an initial CF


tree
Multilevel compression of the data
Apply a selected clustering algorithm
to the leaf nodes of the CF tree

Has been found to be very scalable


Data Mining and Kno

47

Conclusion

The use of clustering in data mining


practice seems to be somewhat
limited due to scalability problems
More commonly used unsupervised
learning:
Association Rule Discovery
Data Mining and Kno

48

Association Rule Discovery

Aims to discovery interesting correlation


or other relationships in large databases

Finds a rule of the form


if A and B then C and D

Which attributes will be included in the


relation is unknown
Data Mining and Kno

49

Mining Association Rules

Similar to classification rules


Use same procedure?

Every attribute is the same


Apply to every possible expression on right
hand side
Huge number of rules Infeasible

Only want rules with high


coverage/support
Data Mining and Kno

50

Market Basket Analysis

Basket data: items purchased on pertransaction basis (not cumulative, etc)

How do you boost the sales of a given product?


What other products does discontinuing a
product impact?
Which products should be shelved together?

Terminology (market basket analysis):

Item - an attribute/value pair


Item set - combination of items with min.
coverage

Data Mining and Kno

51

How Many k-Item Sets


Have Minimum Coverage?
Outlook
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy

Temp. Humidity Windy


Hot
High FALSE
Hot
High
TRUE
Hot
High FALSE
Mild
High FALSE
Cool Normal FALSE
Cool Normal TRUE
Cool Normal TRUE
Mild
High FALSE
Cool Normal FALSE
Mild
Normal FALSE
Mild
Normal TRUE
Mild
High
TRUE
Hot
Normal FALSE
Mild
High
TRUE

Data Mining and Kno

Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No

52

Item Sets
1-Item

2-Item

3-Item

4-Item

Outlook=sunny
(5)

Outlook=sunny
temp=mild (2)

Outlook=
overcast (4)

Outlook=sunny
temp=hot (2)

Outlook=sunny
temp=hot
humidity=high
(2)
Outlook=sunny
temp=hot
play=no (2)

Outlook=rainy
(5)

Outlook=sunny
humidity=norm
(2)

Outlook=sunny
humidity=norm
play=yes (2)

Temp=cool (4)

Outlook=sunny
windy=true (2)

Outlook=sunny
humidity=high
windy=false (2)

Temp=mild (6)

Outlook=sunny
windy=true (2)

Outlook=sunny
humidity=high
play=no (3)

Outlook=sunny
temp=hot
humidity=high
play=no (2)
Outlook=sunny
humidity=high
windy=false
play=no (2)
Outlook=over
temp=hot
windy=false
play=no (2)
Outlook=rainy
temp=mild
windy=false
play=yes (2)
Outlook=rainy
humidity=norm
windy=false
play=yes (2)

Data Mining and Kno

53

From Sets to Rules


3-Item Set w/coverage 4:
Humidity = normal, windy = false, play = yes
Association Rules:

Accuracy

If humidity = normal and windy = false then play = yes


If humidity = normal and play = yes then windy = false
If windy = false and play = yes then humidity = normal
If humidity = normal then windy = false and play = yes
If windy = false then humidity = normal and play = yes
If play = yes then humidity = normal and windy = false
If - then humidity = normal and windy = false and play=yes

Data Mining and Kno

4/4
4/6
4/6
4/7
4/8
4/9
4/12

54

From Sets to Rules


(continued)

4-Item Set w/coverage 2:


Temperature = cool, humidity = normal,
windy = false, play = yes
Association Rules:

Accuracy

If temperature = cool, windy = false humidity = normal, play = yes


If temperature = cool, humidity = normal, windy = false play = yes
If temperature = cool, windy = false, play = yes humidity = normal

Data Mining and Kno

2/2
2/2
2/2

55

Overall

Minimum coverage (2):

12 1-item sets, 47 2-item sets, 39 3-item sets, 6


4-item sets

Minimum accuracy (100%):

58 association rules

Best Rules (Coverage = 4, Accuracy = 100%)


If humidity = normal and windy = false
If temperature = cool
If outlook = overcast

play = yes
humidity = normal
play = yes

Data Mining and Kno

56

Association Rule Mining


STEP 1: Find all item sets that meet
minimum coverage
STEP 2: Find all rules that meet
minimum accuracy
STEP 3: Prune
Data Mining and Kno

57

Generating Item Sets

How do we generate minimum coverage


item sets in a scalable manner?

Need an efficient algorithm:

Total number of item set is huge


Grows exponentially in the number of attributes
Start by generating minimum coverage 1-item
sets
Use those to generate 2-item sets, etc

Why do we only need to consider minimum


coverage 1-item sets?
Data Mining and Kno

58

Justification
Item Set 1: {Humidity = high}
Coverage(1) = Number of times humidity is high
Item Set 2: {Windy = false}
Coverage (2) = Number of times windy is false
Item Set 3: {Humidity = high, Windy = false}
Coverage (3) = Number of times humidity is high and
windy is false
Coverage (3) Coverage(1)
Coverage (3) Coverage(2)

If Item Set 1 and 2 do not


both meet min. coverage
Item Set 3 cannot either
Data Mining and Kno
59

Generating Item Sets


Start with all
3-item sets
that meet min.
coverage

(A B C)
(A B D)
(A C D)
(A C E)

Merge to
generate
4-item sets

There are only two 4item sets that could


possibly work
(Consider only
sets that start
with the same
two attributes)

(A B C D)
(A C D E)

Candidate 4-item sets with minimum


coverage (must be checked)

Data Mining and Kno

60

Algorithm for Generating


Item Sets

Build up from 1-item sets so that we


only consider item sets that is found
by merging two minimum coverage
sets
Only consider sets that have all but
one item in common
Computational efficiency further
improved using hash tables
Data Mining and Kno

61

Generating Rules
Meets min.
If windy = false and play = no then
coverage
and accuracy outlook = sunny and humidity = high

Meets min.
coverage
and accuracy

If windy
then
If windy
then

= false and play = no


outlook = sunny
= false and play = no
humidity = high

Data Mining and Kno

62

How Many Rules?

Want to consider every possible subset


of attributes as consequent
Have 4 attributes:

Four single consequent rules


Six double consequent rules
Two triple consequent rules
Twelve possible rules for single 4-item set!

Exponential explosion of possible rules


Data Mining and Kno

63

Must We Check All?


If A and B then C and D
Coverage Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy
Number of times A and B are true

If A,B and C then D


Coverage Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy
Number of times A, B, and C are true

Data Mining and Kno

64

Efficiency Improvement

A double consequent rule can only be OK


if both single consequent rules are OK
Procedure:

Start with single consequent rules


Build up double consequent rules, etc.

candidate rules
check for accuracy

In practice: need to check far fewer rules

Data Mining and Kno

65

Apriori Algorithm

This is a simplified description of


the Apriori algorithm
Developed in early 90s and is the
most commonly used approach
New developments focus on

Generating item sets more efficiently


Generating rules from item sets more
efficiently
Data Mining and Kno

66

Association Rule Discovery


using Weka

Parameters to be specified in Apriori:

upperBoundMinSupport: start with this value of


minimum support
delta: in each step decrease the minimum
support required by this value
lowerBoundMinSupport: final minimum support
numRules: how many rules are generated
metricType: confidence, lift, leverage, conviction
minMetric: smallest acceptable value for a rule

Handles only nominal attributes


Data Mining and Kno

67

Difficulties

Apriori algorithm improves performance


by using candidate item sets
Still some problems

Costly to generate large number of item


sets

To generate a frequent pattern of size 100 need


>21001030 candidates!

Requires repeated scans of database to


check candidates

Again, most problematic for long patterns

Data Mining and Kno

68

Solution?

Can candidate generation be avoided?


New approach:

Create a frequent pattern tree (FP-tree)

stores information on frequent patterns

Use the FP-tree for mining frequent


patterns

partitioning-based
divide-and-conquer
(as opposed to bottom-up generation)

Data Mining and Kno

69

TID
100
200
300
400
500

Database
Items
Frequent Items
Tree
F,A,C,D,G,I,M,P
F,C,A,M,P
A,B,C,F,L,M,O
B,F,H,J,O
B,C,K,S,P
A,F,C,E,L,P,M,N

(Min. support = 3)

FP-

F,C,A,B,M
F,B
C,B,P
F,C,A,M,P

Item
F
C
A
B
M
P

Head of
node links

Root

F:4
C:3

C:1
B:1

A:3

P:1

M:2

B:1

P:2

M:1

Data Mining and Kno

B:1

70

Computational Effort

Each node has three fields

Also a header table with

item name
count
node link
item name
head of node link

Need two scans of the database

Collect set of frequent items


Construct the FP-tree

Data Mining and Kno

71

Comments

The FP-tree is a compact data structure

The FP-tree contains all the information


related to mining frequent patterns (given the
support)

The size of the tree is bounded by the


occurrences of frequent items

The height of the tree is bounded by the


maximum number of items in a transaction
Data Mining and Kno

72

Mining Patterns

Mine complete set of frequent


patterns

For any frequent item A, all possible


patterns containing A can be
obtained by following As node links
starting from As head of node links
Data Mining and Kno

73

Example Root
Item
F
C
A
B
M
P

Head of
node links

F:4

C:1

C:3

B:1

A:3

B:1
P:1

M:2

B:1

P:2

M:1

Occurs twice

Frequent Pattern
(P:3)

Paths
<F:4, C:3, A:3, M:2, P:2>
<C:1, B:1, P:1>
Occurs ones

Data Mining and Kno

74

Rule Generation

Mining complete set of association


rules has some problems

May be a large number of frequent


item sets
May be a huge number of association
rules

One potential solution is to look at


closed item sets only
Data Mining and Kno

75

Frequent Closed Item Sets

An item set X is a closed item set if there


is no item set X such that X X and every
transaction containing X also contains X

A rule X Y is an association rule on a


frequent closed item set if

both X and XY are frequent closed item sets,


and
there does not exist a frequent closed item set Z
such that X Z XY

Data Mining and Kno

76

Example
ID
10
20
30
40
50

Items
A,C,D,E,F
A,B,E
C,E,F
A,C,D,F
C,E,F

Frequent Item Sets (min support = 2):


A (3),
E (4),
AE (2),
All the closed sets
ACDF (2),
CF (3),
CEF (3),
D (2),
Not closed! Why?
AC (2),
+ 12 more
Data Mining and Kno

77

Mining Frequent Closed


Item Sets (CLOSET)
TDB

NOTE
C:4
E:4
F:4
A:3 Order for
D:2 conditional DB

CEFAD
EA
CEF
CFAD
CEF

D-cond DB (D:2)

A-cond DB (A:3)

F-cond DB (F:4)

E-cond DB (E:4)

CEFA

CEF

CE:3

C:4

CFA

Output: CFAD:2

CF

Output: A:3

Output: E:4

Output: CF:2,CEF:3
EA-cond DB (EA:2)
C

Output: EA:2

Data Mining and Kno

78

Mining with Taxonomies


Taxonomy:
Clothes
Outerwear
Jackets

Footwear
Shirts

Shoes

Hiking Boots

Ski Pants

Generalized association rule


X Y where no item in Y is
an ancestor of an item in X
Data Mining and Kno

79

Why Taxonomy?

The classic association rule mining restricts


the rules to the leave nodes in the taxonomy

However:

Rules at lower levels may not have minimum


support and thus interesting association may go
undiscovered

Taxonomies can be used to prune uninteresting


and redundant rules

Data Mining and Kno

80

Example
ID
10
20
30
40
50
60

Item Set
{Jacket}
{Outerwear}
{Cloths}
{Shoes}
{Hiking Boots}
{Footwear}
{Outerwear, Hiking Boots}
{Cloths, Hiking Boots}
{Outerwear, Footwear}
{Cloths, Footwear}

Items
Shirt
Jacket, Hiking Boots
Ski pants, Hiking Boots
Shoes
Shoes
Jacket

Rule
Outerwear Hiking Boots
Outerwear Footwear
Hiking Boots Outerwear
Hiking Boots Clothes

Support
2
2
2
2

Support
2
3
4
2
2
2
2
2
2
2

Confidence
2/3
2/3
2/2
2/2

Data Mining and Kno

81

Interesting Rules

Many way in which the interestingness of a rule can be evaluated based on ancestors
For example:

A rule with no ancestors is interesting


A rule with ancestor(s) is interesting only if it has enough relative support

Which rules are interesting?

Rule ID
1
2
3

Rule
Clothes Footwear
Outerwear Footwear
Jackets Footwear

Support
10
8
4

Data Mining and Kno

Item
Clothes
Outerwear
Jackets

Support
5
2
1

82

Discussion

Association rule mining finds expression of


the form X Y from large data sets

One of the most popular data mining tasks

Originates in market basket analysis

Key measures of performance

Support

Confidence (or accuracy)

Is support and confidence enough?


Data Mining and Kno

83

Type of Rules Discovered

Classic association rule problem

All rules satisfying minimum threshold


of support and confidence

Focus on subset of rules, e.g.,

Optimized rules
What makes for an
Maximal frequent item sets interesting rule?
Closed item sets
Data Mining and Kno

84

Algorithm Construction

Determine frequent item sets (all


or part)

By far the most computational time


Variations focus on this part

Generate rules from frequent item


sets
Data Mining and Kno

85

Generating Item Sets


Search space
traversed
Support
determined

Bottom-up

Counting

Apriori-like
algorithms

* Have discussed

Intersecting

Apriori*
Partition
AprioriTID
DIC

Top-down

Counting
FP-Growth*

Data Mining and Kno

Intersecting
Eclat

No algorithm
dominates others!
86

Applications

Market basket analysis

Classic marketing application

Applications to recommender
systems

Data Mining and Kno

87

Recommender

Customized goods and services


Recommend products
Collaborative filtering

similarities among users tastes


recommend based on other users
many on-line systems
simple algorithms
Data Mining and Kno

88

Classification Approach

View as classification problem

Product either of interest or not


Induce a model, e.g., a decision tree
Classify a new product as either
interesting or not interesting

Difficulty in this approach?

Data Mining and Kno

89

Association Rule Approach

Product associations

User associations

90% of users who like product A and product B


also like product C
A and B C (90%)
90% of products liked by user A and user B are
also liked by user C

Use combination of product and user


associations
Data Mining and Kno

90

Advantages

Classic collaborative filtering must

identify users with similar tastes


This approach uses overlap of other
users tastes to match given users taste

Can be applied to users whose tastes dont


correlate strongly with those of other users
Can take advantage of information from, say
user A, for a recommendation to user B, even
if they do not correlate

Data Mining and Kno

91

Whats Different Here?

Is this really a classic association


rule problem?
Want to learn what products are liked
by what users
Semi-supervised
Target item

User (for user associations)


Product (for product associations)
Data Mining and Kno

92

Single-Consequent Rules

Only a single (target) item in the


consequent
Go through all such items

Association Rules
All possible item
combination consequent

Associations for
Recommender
Classification
One single item
consequent

Data Mining and Kno

93