Вы находитесь на странице: 1из 11

Market Basket Analysis

Contents

 Introduction
 What is Market Basket Analysis?
 Spreadsheet Computation of Market Basket Analysis
 Characteristics of Market Basket Analysis
 Cautions with Market Basket Analysis
 Applications of Market Basket Analysis
Introduction

You may often see these kinds of words in many web sites:
Customers who bought this item also bought relevantly associated item….,
Browse related titles…., or Explore similar items.
 For example, people who buy bread and cheese may also buy butter, jam,
milk and honey.
 People who buy cold medicine frequently will also buy tissue and orange
juice.
 Knowing the associations of purchase between various items is very
important in business for product promotion, cross-selling and product
placement.
What is Market Basket Analysis?

 Market Basket Analysis is a data mining technique to derive association


between two data sets. We have categorical data of transaction records
as input to the analysis and the output of the analysis is association rules
connecting the data with perceptive relevance.
 Let us start with a simple example. Suppose you have a transaction data
from a small fruit store and the numbers of transactions in one day are
limited as the data shown below.
Input: Transaction Records
Transaction ID Items from the customers who bought more than 1 items
1 Apple, Banana, Cherry, Durian

2 Apple, Durian

3 Banana, Durian

4 Durian, Banana, Cherry

5 Banana, Durian

6 Apple, Banana
7 Apple, Cherry, Durian
People who Also bought the Support Confidence
bought this item following items
Banana Durian 57% 80%
Cherry Durian 43% 100%
 Based on the data above, we can derive the following output of
association rules using Market Basket Analysis:
 The association rule will have the following form
X →Y
 That form has meaning of “people who bought items on set X are often
also bought items on set Y”. For example, if X = {Apple, Banana} and Y =
{Cherry, Durian} and we get the association rule X →Y indicates that
people who bought Apple and Banana also bought Cherry and Durian.
 Support and confidence are two measures of association rules.
 Support is the frequency of transactions to have the all the items on both
sets X and Y are bought together. For example, a support of 5% shows that
5% of all transactions (that we consider for the analysis) indicate that items
on set X and Y are purchased together. In formula, support can be
computed as probability of the union of set X and set Y.
support (X →Y) = P(X ∩ Y)= n(X ∩ Y)/N
 Notation of support count n( X ∩Y ) indicates the total frequency of the set
intersection and N is the total number of transactions for the analysis. A rule
that has very low support may occur simply by chance. We can also view
Support as the number of instances that the association rules will predict
correctly.
 Confidence of 80% shows that 80% of the customers who bought items on
set X also bought items on set Y. In formula, confidence is computed as
conditional probability to obtain set Y given set X.
 The conditional probability also can be computed through proportion of
supports.
Confidence (X →Y) =P(X | Y)= n(X ∩ Y)/n(X)
 Notation n( X ) is the total frequency of set X. Confidence is a measure of
accuracy or reliability about the inference made by the rule that the
number of instances that the association rules will predict correctly among
all instances it applies to.
 To obtain the association rules, we usually apply two criteria based user
discretion:
1. Minimum support
2. Minimum confidence
Characteristics of Market Basket Analysis
 Both independent and dependent data are nominal or categorical type.
Therefore, what we can do is only count the frequency of pattern. Market basket
analysis is sometimes called as Mining Frequent Pattern. If the data are
quantitative, it would be categorized into some interval (but the meaning is
actually nominal) such as: age 0 to 1, age 1-5, age 5 to 12, age 12 to 19, etc.
 In Market Basket Analysis, we usually do not consider about the number of each
items that the customers bought. Whether a customer buys one kg of apple or 10
kg of apple would be considered as the same set of apple.
 We do not use all transactions that are recorded. Only transactions of purchase of
more than one item are considered as data. Transactions of single item are not
used for the analysis.
 The input data are assumed to be clean from error and noise.
 Unlike the simple demonstration example here, the binarized transaction record is
usually a sparse matrix (matrix with many zeros) because of very large number of
transactional record and number of items. Computation and storing sparse matrix
requires special algorithm.
Cautions with Market Basket Analysis
 Association does not imply causality. It is measure of co-occurrence, not
cause and effect.
 Though the computation of associated rule can be automated, the analyst
should be warned that such association may produce unreasonable rule
such as beer -> diaper. Unless the sales will really increase by putting the
associated product together, we shall not take the hypothesis that
produced by market Basket Analysis as the truth from data. Testing such
hypothesis with real sales data would be necessary.
 Computation of Market Basket Analysis grow exponentially with the number
of items
Applications of Market Basket Analysis
 Once you get association rules, you can use this new knowledge for many
things in science and business. Here are examples of few ideas for business
 Cross selling: offer the associated items when customer buy any items from your
store
 Product placement: items that are associated (such as bread and butter, or
tissue and cold medicine, potato chip and beer) can be put near to each
other. If the customers see them, it has higher probability that they will purchase
them together.
 Affinity promotion: design the promotional events based on associated
products.
 The fact that both independent and dependent variables of Market Basket
Analysis are nominal (categorical) data type makes this method very useful to
analyze questionnaire data.
 Fraud detection: based on credit card usage data, we may be possible to
detect certain purchase behavior that can be associated with fraud.
 Customer behavior: associating purchase with demographic, and socio
economic data (such as age, gender and preference) may produce very
useful results for marketing.

Вам также может понравиться