Вы находитесь на странице: 1из 40

Data Mining

Lecture #1 : Jan 15
th
2014
Introduction
Data is produced at a phenomenal rate
Our ability to store has grown
Users expect more sophisticated information
How?
UNCOVER HIDDEN INFORMATION
DATA MINING
Data Mining works with Warehouse
Data
Data Warehousing provides the
Enterprise with a memory
Data Mining provides the
Enterprise with intelligence
Database Processing vs. Data Mining
Processing
Query
Well defined
SQL
Query
Poorly defined
No precise query language
Output
Precise
Subset of database
Output
Fuzzy
Not a subset of database
Database Processing Data Mining
Data Mining a business Process
Business Process: data mining is a business process
that interacts with other business processes
data mining starts with data, then through
analysis informs or inspires action which in turn
creates data that begets data mining
Organizations wanting to excel do not view data
mining as a side show. It readily fits in with other
strategies for understanding markets and
customers
Data Mining large amounts of data
i. How much is a lot of Data?
ii. Excel: max rows possible ?. A very versatile tool for
working with relatively small amounts of data
iii. Early days of data mining (1960s and 70s) data was
scarce and some of the techniques were developed
in that period
iv. Today computing power is readily available and
large amount of data is not a handicap it is an
advantage
Data mining techniques work better with a large sample
population
Data Mining Meaningful Patterns and Rules
i. Business Operations : generates the data as well as
the patterns at the same time
ii. Data Mining: the goal is to find patterns that are
useful to the business. Helping business is more
important than amusing the miner.
iii. Call Center Application: Classifies customers as
Green, Amber and Red for targeting retention,
facilitating customer acquisition goal being
offering better customer value.
iv. Companies are generating business models
centered around data mining.
Data Mining and Customer Relationship Management
Firms of all sizes need to create 1-2-1 relationships with
customers. Form a learning relationship with their customers.
Firms are learning to look at the value of each customer
individually to focus on profitable customers.
Segmentation to personalization requires changes throughout
the organization especially in marketing sales and customer
support.
Delivery centered to a customer centered organization
Data Mining is only a collection of tools and techniques to
support a customer centric organisation
What is Data Mining
Narrow sense : a collection of tools and
techniques to support the business
Broader sense: is an attitude that business
actions should be based on learning, that
informed decisions are better than
uninformed decisions and that measuring
results is beneficial to the business
It is a business process and methodology for
applying analytical tools and techniques
Case Study in Data Mining
Making Money or Loosing Money
Home Equity loans generates revenue for Banks
from interest payment on loans but some
times companies grapple with services that
loose money.
Examples of value added services by Banks
which may be loosing money?
Making Money or Loosing Money
Home Equity Loans generate revenues for the
banks like Fidelity Investments
Bill Paying Service should it be discontinued
as it is loosing money. Customers perceive it
as a value added service.
Customer owns a house and a large credit
card outstanding debt what should the bank
do?
Bank of America Case Study
BofA - boost its home equity loans business.
Using common sense the message was:
People with college age children want to borrow
against home equity to pay tuition bills
People with high but variable incomes want to use
home equity to smooth out peaks and valleys in
their income stream.
Bank of America Case Study
Data from 42 systems of record was cleansed. Some records
dated back to 1914. customer records had about 250 fields
Decision Tree techniques were applied to the customer. Those
that had availed the product offering as well as those who
spurned the offering. Rules were discovered and a good
prospect flag was generated by a data mining model.
Sequential patterns were studied when does the customer
want the loan. Clustering was done. 14 clusters were
generated, one or two had intriguing properties
39% of customers had business and personal accounts
The cluster accounted for 25%+ of the customers who had been
classified as responders
People may be using home equity loans to start a business
Message use your equity to do what you always wanted to do
Virtuous Cycle
1. Identify Business Opportunities
2. Mining Data transform into actionable
information
3. Acting on Information
4. Measure results
5. GO TO STEP 1 ( infinite loop)
Focus on Business Results rather than amusing
the data miner
Data Mining and Marketing Tests
Control Group
Chosen at random
receives message
Response measures
message without model
Target Group
chosen by model
receives message
Response measures
message with model
Holdout Group
Chosen at Random
receives no message
Response Measures
background response
Modeled Holdout
Group
Chosen by model no
message
Response measures
message model without
message
M
e
s
s
a
g
e
Y
e
s
N
o
Picked by Model
NO
YES
Data Mining Systems vs Operational
Systems
Chapter 2
Data Mining Applications in
Marketing and Customer
Relationship Marketing
Customer Lifecycle
Data Mining refers to the life cycle of the
customer relationship. Five major phases:
Prospects: are in the target market
Responders prospects who have exhibited interest
New Customers: responders who make a
commitment
Established Customers
Former customers
Customer Lifecycle Stages
Subscription vs Event Based Relationships
Event Based Relationships
Transactions purchasing a mobile prepaid card
Companies communicate via broadcasts
Encourage customers to visit websites
Subscription Based Relationships
Postpaid Mobile Contract
Contracts enable a learning relationships
Customer can be studied over time
Customer Experience
newspaper subscribes
Data Mining
Process : Customer Acquisition
Who are prospects:
Prospect base may change over time
Will the past be a good predictor for the future
Prospects in a new geography may differ from
current customers
Changes to products, services may bring in a
different target audience
Data Mining
Role in Customer Acquisition
Data availability limits the role of data mining
Response Modeling is used for channels such as direct mail and
telemarketing as cost of contact is relatively high. Data
availability falls into 3 categories:
Source of prospect
Appended individual/household data
Appended demographic data at geographical level
Typically prospect lists are purchased
Modeling may be required to shortlist customers for direct
marketing perhaps based on demographic data
Echo effect is a challenge to building models. For example a
prospect receives an e-mail but responds over phone
Data Mining
Role in Customer Activation
Operational process, how can data mining help
Activation provides a view of new customers at
the point they start. A very important perspective
and as a data source it needs to be preserved
Customer activation provides initial conditions of
customer relationship. Such initial conditions are
often useful predictors of long term customer
behaviour.
Activation Funnel
home delivery newspaper subscribers
New sales leads come though many channels
Prospects/Leads
Only leads with verifiable addresses and credit
cards become orders
ORDERS
SUBSCRIPTIONS
PAID
SUBSCRIPTIONS
Only orders with routable addresses become
subscriptions
Only some subscriptions are paid
Data Mining can play the role in understanding whether or not customers are
moving through the process the way they should be or what characteristics
cause a customer to fail during the activation stage
Data Mining
Customer Relationship Management
Primary goal is to increase customer value
Up-selling buy a more expensive model
Cross Selling broaden customer relationship
Usage Stimulation loyalty points
Customer Value Calculation assign a future
expected value to each customer
Customer Options vs Simplicity ?
Are data mining and personalization
compatible ?
Data Mining
Customer Relationship Management
Data Mining helps dig out customer affinities
Data Mining can play a key role in understanding
the operational side of the business
Customer retention is the key
Predictive Modeling is often applied in this area
Techniques of Survival Analysis or comparing long
standing customers with customers with short tenures
Win-back
Why customers left? ( analyze customer complaints)
Tends to depend more on operational strategies
Targeting Customers
A nationwide publication determined its
readers have the following characteristics:
59% readers are college educated
46% have professional or executive occupations
21% have household income >= USD75K
7% have household income >= $100K
Targeting Objective:
i. Any suggestions for increasing revenue for the publication?
Targeting Customers
A nationwide publication determined its
readers have the following characteristics:
59% readers are college educated
46% have professional or executive occupations
21% have household income >= USD75K
7% have household income >= $100K
Targeting Objective:
i. Increase circulation amongst prospects matching the profile
ii. Sell advertising space to businesses wanting to reach such an
audience
iii. Next Steps ?
Who Fits the Profile
Who matches the profile better
Amy : Professional College educated earns $80K pa
Bob: High School Grad earning $50K pa
How will you make the comparisons?
Who Fits the Profile
Observations?
Any room for improvement?
Who Fits the Profile
US Population Figures:
College Educated = 20.3%
Professional/Executive = 19.2%
Income > $75K = 9.5%
Income > $100K = 2.4%
Who Fits the Profile
Who Fits the Profile
New scores( Index Based) relate the
publications target audience with the US
Population, hence they make more sense
Data Mining & Direct Marketing
Advertising : reaches prospects about whom
nothing is known as individuals
Direct Marketing : requires min min a phone
no or email id
Countries have restrictions on use of data
Household-Level data can be used directly for
a rough cut segmentation based on income,
car ownership .
Is this dataset the right size?
Response Modeling
Campaign response rates are in low single digits
Models help improve response rates to direct solicitation
Likelihood of response
Ranking of prospects
Data Mining techniques are extensively applied to response
modeling.
Direct Solicitation is an expensive process and must conform
to resource constraints (budgets)
Simplifying Assumptions Corpn(SAC)
1 million prospects and budget $300K for
marketing campaign; cost $1/contact
How to maximize responses score the
prospects with a response model
Response Modeling
C
o
n
c
e
n
t
r
a
t
i
o
n
Or Penetration
Lift = concentration/penetration
Benefit = concentration - penetration
Max Benefit
Max Benefit = penetration where the perpendicular distance
between the curves is max
KS statistic is also max
Split points results in a good list and bad list of prospects
Maximizes un-weighted average of sensitivity and specificity
Sensitivity : likelihood that diagnosis is correct ( in medical
world) = true positives/(false negative + true positive)
Specificity : = proportion of true negatives amongst the people
who get a negative result = True -ve/(True ve + false +ve)
Max Benefit point also minimizes the expected loss
Confusion Matrix
Model Prediction
Actual
No YES
NO True Negative False Positive
YES False Negative True Positive
Sensitivity : = True positives/(False negative + True positive)
Specificity : = True Negative /(True Negative + False positive)

Вам также может понравиться