Вы находитесь на странице: 1из 13

CUSTOMER RELATIONSHIP MANAGMNET

THROUGH
DATA MINNING
BY

M.GURUNATH G.SIVACHANDRA

murarigurunath_gupta@yahoo.co.in
sivachandra22@yahoo.com

SREE VIDYANIKETHAN ENGINEERING COLLEGE


A.RANGAMPET

CONTENTS
Abstract

Foundation

 D ata mining
Data mining applications
Customer Relationship Management (CRM)
Data mining & CRM issues – Relation
Decision
 Trees
Generating
 Decision Tree through Simple Probabilistic Approach – An Attempt
The Algorithm
Problem  Solving Issues
Addressing various CRM issues
Algorithm  Implementation
The Problem:
 D efinition
User Selection
Data Definition
Solution:
Splitter algorithm-Implementation
Finding MSA (Most Significant Attribute)
Result:
MSA and Attribute selection priority .
The Final Decision Tree
Conclusion 
References 
ABSTRACT
Almost, each and every real time process is being automated in today’s competitive world of
technological advancements. Automation has become the Blood Line of life. Data Mining is one
of the powerful automation tools, as it has evolved from the concept of Knowledge Discovery
Knowledge Discovery is an intelligent process and Data Mining does it artificially, thus being
Artificially Intelligent. Extracting data from large databases through pruning and other implicit
means is not a single handed job. Data Mining has been a ‘Boon from Mars’ to many technical
fields. Management fields are also not immune to it. Customer Relationship Management is one
such field which has been under focus for deployment of Data Mining. Many financial
companies and business organizations
have grown from rats to riches by tackling marketing issues through Data Mining.However small
organizations can only dream to incorporate mining into their business as they can not afford for
such softwares. CRM is a soft issue and doesn’t need complex mining softwares based on
Bayesian curves and Normalized distributions. The key aspect is, it requires simplicity and not a
higher level of accuracy.We have taken efforts to produce a rather simpler algorithm based on
Probabilistic Classification to generate a Decision Tree. Traversing through the Decision
Tree we can predict values for unknown attributes. Tree generation requires training
thealgorithm through numerous samples. Increasing the number of samples will automatically
enhance the accuracy of the algorithm.Our Algorithm would bridge the gap between such small
Marketing organizations and the technology of Data Mining.

FOUNDATION
• Data Mining
Data Mining is a tool that automates the detection of relevant patterns in a Database. It is a
technology, which on its progressive path leads to Knowledge Discovery, thereby making the
system Artificially Intelligent. In practical terms, the system is made self reliable. There are
certain prerequisites needed to perform datamining. A database full of statistical data, and certain
efficient pruning algorithms to mine out them, form the core region.William Frawley and
Gregory Shapiro (MIT Press, 1991) defined it as “…the nontrivial extraction of implicit,
previously unknown and potentially useful information from data…”In other words it is the
process of discovering meaningful correlations and hidden patterns by mining large amounts of
data stored in warehouses (large repository of data).
The major advantage is its capability to build predictive models rather than being
retrospective.Thus data mining is about exploration and analysis, by automatic or semiautomatic
means, quantities of data can help to uncover meaningful patterns and rules.
Requirements
Often people ask this question stating that statistics are enough to get knowledge, `based on
previously existing data, and what is all new about data mining?!! Now let us explain you…
Data mining effectively automates the statistical
process leading to more accuracy and reduction of burden. Intelligent systems learn from events
i.e. discover knowledge and act more relevantly in future. Thus data analysis techniques through
data mining seem to be automated self decisive tests which result in the most appropriate or
rather best suited solutions to the various situations. So we use data mining as the tool to churn
out data for identification from voluminous samples and action determination based on rules
obtained from Knowledge
Discovery.
Applications
Data mining is not restricted towards any field. In fact it has now become an integral part of
every database oriented application. However the following fields have surprisingly gained more
from the tool
Marketing
E-commerce.
Medicine.
Telecommunications.
Transportation.
Research
Law and order
The Process
Data mining uses simple tools to perform the churning process from large ocean of data. The
following are performed:-
Discovering knowledge
Segmentation
Classification
Association
Preferencing
Visualizing data

Data mining softwares


There are many leading vendors who provide data mining solutions.
To name a few:-
Clementine from SPSS Inc, Chicago, Illinois, Darwin from Oracle, Decision series from
NeoVista, Enterprise Miner from SAS, Intelligent Miner from IBM, Knowledge SEEKER and
Knowledge Studio from Angoss.The above softwares are deployed by various global firms and
government organizations to gain performance and advancements in perspective fields.
Customer RelationshipManagement.(CRM)-A core marketing issue. Any private firm or
organization generally spends more money to get a new customer than to retain the existing
customer, but it is far more expensive to win back a customer after they have left, than it is to
keep satisfied in the first place.So it is very much essential for a concern to maintain good
relations with its customers and to keep them satisfied in all possible means for which the
company might be even digging its treasury.Financial expenditures are to be effectively handled
as cash is the prime commodity and thus you need certain marketing strategies to manage cash.
These issues are addressed as CRM issues. In order to stay competitive companies develop
strategies to become customer focused, customer-driven, and customer–centric. All these terms
define the companies’ desire to build lasting customer relationships. CRM is viewed as solution
that makes these efforts valuable to the company and the customers alike.
Data mining & CRM issues-Their Inter Relation
As data mining is about exploration and analysis, by automatic or semiautomatic means,
quantities of data can help to uncover meaningful patterns and rules. These patterns and rules
help corporations improve their marketing, sales and customer support operations to better
understand their customers. Over the years, corporations have ccumulated very large databases
from applications such as Enterprise Resource Planning (ERP), Client Relationship Management
(CRM), or other operational systems. Our paper would deal with implementing data mining
algorithm for solving a typical CRM problem.
Decision Trees
The decision tree is probably the most popular technique for predictive modeling. An example
explains some of the basics of the decision tree algorithm. The following table shows a set of
training data that could be used to predict
credit risk. In this example, fictionalized information was generated on customers that included
their debt level, income level, what type of employment they had and whether they were at a
good or bad credit risk.
Customer ID Debt level Income level Employment type Credit risk
1 High High Self-employed
Bad
2 High High Salaried
Bad
3 High Low Salaried
Bad
4 Low Low Salaried
Good
5 Low Low Self-employed
Bad
6 Low High Self-employed
Good
The following illustration shows a decision tree that might be created from this
Data.

In this example, the Decision Tree algorithm might determine that the most significant attribute
for predicting credit risk is debt level. The first split in the decision tree is therefore made on debt
level. One of the two new nodes (Debt = High) is a leaf node, containing three cases with bad
credits and no cases with good credit. In this example, a high debt level is a perfect predictor of a
bad credit risk. The other node (Debt = Low) is still mixed, having three good credits and one
bad credit case. The decision tree algorithm then chooses employment type as the next most
Significant predictor of credit risk. The split on employment type has two leaf nodesindicating
that self-employed people have a higher bad credit probability. This is, of course, a small
example based on synthetic data, but it illustrates how the decision tree can use known attributes
of the credit applicants to predict credit risk. In reality there are typically far more attributes for
each credit applicant, and the numbers of applicants would be very large.When the scale of the
problem expands, it is difficult for a person to manually extract the rules to identify good and bad
credit risks. The classification algorithm can consider hundreds of attributes and millions of
records to come up with the decision tree that describes rules for credit risk prediction.

Generating Decision Tree through Simple Probabilistic


Approach – An Attempt.
There are many variations of algorithms that construct decision trees and that use different
splitting methods: tree shapes, pruning techniques, and so on. Some use Entropy as the splitting
criteria, Microsoft Decision Trees uses a Bayesian score as the default. However these
algorithms are complex and use calculations based on curve
analysis. These may be required by high corporate based organizations which focus on level of
accuracy. But even small marketing firms would also like to have mining incorporated into their
business. CRM issues are soft in nature and could compromise to a small factor in regard to
various predictions. Generating the decision tree given the trained model as input is the issue in
mining. So we try to propose a simple decision tree algorithm through
probabilistic methods. In The Due Process we build up a probabilistic classification tree.

Algorithm:
1) Start
2) Define an array for attributes and data types // Atr[ ],Dt[ ]
3) Define a structure for template // Template {<data type1> <attribute1>,
<data type2> <attribute2>,
.
.
<data typeN> <attributeN>int NOC};
// [NOC—no. of occurrence of a template]
4) Define a structure for training model. // Train {<data type1> <attribute1>,
<data type2> <attribute2>,
.
.
<data typeN> <attributeN>};
5) Get the (total number of attributes – 1) as max. //[Key attribute discarded]
6) For each attribute get the data type // Get it in an array Atr [ ], Dt [ ]
//[CRM issues may require data of three types:-
Boolean,(Yes/No—married/single)
Numerical ranges(1000-2000—salary)
char[ ](Engineer, Doctor-profession) ]
7) If attribute is of numeric range call splitter algorithm to find the range.
//[Splitter algorithm -
/* i) Get min value, max value
ii) x=1;
iii) I=10
iv) n=min value / I
v) If (n lies between 0 to 9)
Range[ x ]=min value to (min value + I)
min value= min value + I
x++;
Else
I=I*10
vi) If (min value! = max value)
Repeat steps iii) and iv)
Else
Return x */ ]
8) Get the value returned by splitter algorithm into NOS [An]
// [no. of values an attribute takes is contained in NOS.]
9) If data type is Boolean NOS [An] =2
10) If data type is char [ ] NOS [An] = no. of. Different entries
11) For I = 1 to max sum up NOS [Ai] as totalnos
12) Calculate tempno. = (totalnos. x max)
//[The above value contains total no. of all possible combinations of attribute
values i.e. total no of attribute values combination total no of attributes]
13) Create tempno. (Calculated above) number of objects for the structure
Template.
14) By iteration all possible values for the template objects are filled using
Filltemp algorithm
// [ Filltemp algorithm
/* For I = 1 to max
For J =I+1 to max-1
Set A [ I ] to one value,
Set all possible values for J [ I ]
Call sort function and remove redundant templates */
]
15) Obtain no. of samples to be entered for training as trno.
16) Create trno (Obtained above) number of objects for the structure Train.
// [The above operation is termed as training.]
17)For each template (object in structure Template) obtain NOC value by
comparing objects of structure Train
// [The above operation is to find the no. of occurrences of a template]
18) Call function MSA
// [MSA would find the most significant attribute to predict the predictable
attribute]
MSA
/* i) Get the predictable attribute
ii) Get values for other attribute
iii) For I = 1 to NOS [Apr] // for each state of predictable attribute
Select all the templates with current value for the predictable
attribute.
Sort based on noc
Find the attribute with most no. same value occurrences
Return the attribute as msa [ 1 ]
Find the next attribute using the previous two steps till all attributes
but for the predictable attribute are selected */
]
19) Using the array MSA [ ] generate the Decision Tree.
20) The value for predictable attribute is contained in the template which matched
the values of other attributes and had the maximum NOC value among such
templates
21) Stop

Problem Solving Issues.


Using the above algorithm, we have to generate the Decision Tree for mining in CRM based
issue. Before that the following CRM issues are to be addressed:-
Addressing various CRM issues
Issue1) Defining the problem:
The following are some important aspects to be kept in mind while defining the
problem.
Find something that matters:-
The scope of your project.
The accuracy level, which would be required.
Define the deliverables:-
The output, you would be able to generate.
Time and cost effectiveness of the output.
Pick something well defined and small:-
Take abstract problems.
Be aware of your limitations before defining the problem.
Understand the existing CRM process:-
Don’t take unwanted efforts for processes that provide fewer gains.
Better understand marketing strategies and the practical CRM process.
Issue 2) Defining the user:
Build a profile for each user:-
Try to get the interest and the background of the user.
Give him queries to tap knowledge about him.
Use quick start programmes to tell about your future:-
Project yourself as an emerging firm.
Highlight the benefits to the user.
Issue 3) Defining the data:
Locate the data dictionary:-
Get details about the Meta data.
Obtain various constraints on data.
Define metrics:-
Find the range of values or the possible values the data can take.
Issue 4) Defining the scope of the project:
Determine the scope through modeling.
This would give a preview of your intentions in the project.
Avoid projecting beyond the scope.
This prevents you from making compromises
Algorithm Implementation
The problem:
Definition:-
A cellular company is to take some action for avoiding the churn out of its
customers. It would like to provide certain offers
a)—Free SMS service
b)—Free additional talk time.
c)—Free additional number/line.
d)—Free extra bandwidth for web downloads.
The company would like to perform mining to determine what offer to provide to a
certain customer such that
a)—He is less probable to churn out
b)—The Company would still benefit after providing the offer
c)—The offer is best suited to the customer
Clearly the above definition satisfies the Issue 1) mentioned above.
User Selection:-
To enter the values in the training model we select 10 customers who represent
the real world users covering all classes and of various tastes.
Clearly the above definition satisfies the Issue 2) mentioned above
Data Definition:-
We select certain attributes associated with customer and get the values for
them by providing queries.
Let us suppose the following table was generated as a result
Cus.Id Income Sex Age Duration of Relationship Education Employment Offers
(In Rs.)
p.m.
343 11000 F 35 36 months Grad Employed b
112 9000 M 24 24 months Grad Self-Employed a
898 38000 M 50 2 months Non.Grad Self-employed b
454 22000 F 27 6 months Grad Self-employed c
778 12000 M 25 10 months Non.Grad Employed a
512 15000 F 30 20 months Grad Employed b
255 26000 M 45 25 months Grad Self-Employed d
096 30000 F 42 4 months Non.Grad Employed d
669 16000 F 28 11 months Grad Self-employed a
909 21000 M 56 7 months Non.Grad Self-employed c

Solution :-{ Finding the MSA and plotting the Decision Tree}
Once the above training model is fed and the algorithm trains the miner
The no. of attributes is identified to be 8.[max=8] (9 Attributes-1 Key Attribute)
Three attributes are entered to be Boolean and result in 6 possible values in total.
“Splitter algorithm” returns {NOS-No. of possible states/ranges}
NOS for Income as 4
NOS for Age as 4
NOS for Duration as 4
Offers could assume 4 different values.
Therefore totalnos = 6 + 4 + 4 + 4 + 4 = 22
Now tempno. = totalnos. x max
= 22 x 8 = 176 Templates

Finding MSA
The Predictable attribute here is Offers
It can take 4 possible values viz., a, b, c, d.
For (Offer = a), we have 176/4 =44 templates.
Similarly each offer b ,c ,d would have 44 templates each
Intersecting those templates with objects of structure Train
(Contents of Structure Train - The training model i.e. above table)
We obtain 3 intersections for a, 3 intersections for b, 2 intersections for c and 2
intersections for d.
The MSA is calculated by the most NOC (No. of occurrences) which is represented in the form
of ratios in the following table.
Temp Income Age Emp. Durn. Edu. Sex
a1 9-10k 2:1 20-30 3:0 Emp 1:2 20-30 1:1:1 Gra 2:1 M 2:1
a2 10-20k 20-30 S-Emp 0-10 N-Gra M
a3 10-20k 20-30 Emp 10-20 Gra F
b1 10-20k 2:1 30-40 1:1:1 Emp 2:1 30-40 1:1:1 Gra 2:1 F 1:2
b2 30-40k 40-50 S-Emp 0-10 Gra M
b3 10-20k 20-30 Emp 10-20 N-Gra F
c1 20-3 0k 2:0 20-30 1:1 S-Emp 2:0 0-10 2:0 Gra 1:1 F 1:1
c2 20-3 0k 50-60 S-Emp 0-10 N-Gra M
d1 40-50k 40-50 S-Emp 20-30 N-Gra M

40-50k 2:0 40-50 2:0 Emp 1:1 0-10 1:1 Gra 1:1 F 1:1

MSA 1 2 4 5 3 6

• Result:
The Most Significant Attribute (MSA) in determining the offer was found as Income followed
by Age, Education, Employment, Duration of Relationship and Sex respectively
• The Final Decision Tree

The Decision Tree as we have seen before may not be complete always.After we split according
to the order generated by the MSA we stop at nodes called the leaf node where either there are
no cases for that particular node (or) all the cases belong to the same state (or) the cases are
distributed among the states such that no further split
is possible.
• Conclusion:
Thus we have shown how Data Mining could be deployed for CRM issues .We believe that steps
outlined in this paper and the algorithm formulated are to a great extent successful in optimizing
the existing CRM process.
Though the algorithm may have some limitations, we suppose soft issues like CRM could bear
with them as a high level of accuracy is generally not required. Successful implementation of the
algorithm would benefit small business
organizations as some of them cannot afford to buy Clementine or any other Data mining
software for that case which would make them void of Data Mining. But, Data mining could
provide drastic performance improvements for such business oriented organizations also. We
expect our algorithm to bridge the gap between such organizations and the technology of Data
Mining. We are trying to build a programmable model and before getting into the bottom
of such a process, we would like to analyze the possible merits and demerits of our algorithm.
We are for accepting useful ideas and criticism.
References.
Text:
Building Data Mining Applications for CRM – Alex Berson , Stephen smith
kurt thearling – Tata mcgraw-hill edition 2000
Fundamentals of database systems – Elmasri, navathe - 3rd edition Pearson
Education-2000
Websites:
www.microsoft.com

www.spss.com

www.google.com

Вам также может понравиться