Академический Документы
Профессиональный Документы
Культура Документы
ABSTRACT
With economic globalization and continuous and predict the likelihood of accounts more likely
development of e-commerce,
commerce, customer relationship buy. Customer targeting is one of the most important
management (CRM) has become an important factor components of the customer relationship management
in growth of a company. CRM requires huge (CRM) systems. Customer targeting helps in
expenses. One way to profit from your CRM identifying promising prospects to generate more
investment and drive better results, is through revenue. Improving customer targeting is important
machine learning. Machine learning helps business to for reducing overall cost and boosting business
manage, understand and provide services to customers performances. Marketing professionals achieve this
at individual level. Thus propensity modeling helps task using a classification model for buyer targeting.
the business in increasing marketing perf
performance.
The objective is to propose a new approach for better 1.1 Analysis Scenario:
customer targeting.
For identifying prospective customers It is important
We’ll device a method to improve prediction to measure a subject’s “propensity to buy” a particular
capabilities of existing CRM systems by improving product. We can take advantage of the large amount
classification performance for propensity modeling. of demographic data available to target those who
have the highest propensity to buy thus improving our
Keywords: Customer Relationship onship Management chances of success [9]. We will devise a method to
(CRM), Machine Learning, Customer Segmentation, exploits the customer data in conjunction with the
Customers Targeting, K-means
means algorithm, Smote, demographic data from the overall
ove market population
Logistic Regression, Classification, Clustering containing buyer vs. non-buyer
buyer data.
This will increase the predictive accuracy of the
I. INTRODUCTION classification model and improve Customer
CRM requires a big expense in the form of Targeting Performance.
Implementation, updates, and training. Implementing
Ml on top of crm system will not only increase ROI II. Related Work
but will also enable us to derive better results from the
One of the key problems in CRM is buyer targeting,
huge collection of data available from sales and
that is, to identify the prospects that are most likely to
marketing,
rketing, customer support. This will enable us
become customers. Marketers are applying data
to achieve predictive crm which could gather both
mining tools to solve the problem, such as in [1] the
internal and external data about different prospects
authors focused on classification of online customers
cu
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 2 | Jan-Feb 2018 Page: 1046
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
neighbors[11,2]. Depending upon the amount of over- VI. Case Study
sampling required, neighbors from the k nearest
neighbors are randomly chosen[2]. Let us consider a case study involving a marketing
manager of a large company selling bikes , must run a
VI. Proposed Model: mailing campaign to find the best prospective
customers which is direct marketing .
Now, with the advancement in technology it has
become easier to manage business relationships and To build a buyer propensity model , we require data
the informative data associated with it. Data like which must contain:-
customer contact details, accounts and lead a. Sales transactions data .The data about our
information, sales opportunities in different customers including those who bought bikes in
geographical regions is stored in CRM systems the past.
mostly in cloud at one central location, such that the b. Data from previous marketing campaigns.
data is available to many at real time with ease and c. List of customer who Purchased from third-
speed. As the amount of data generated from sales and party vendors.
marketing, customer support and product
development departments is increasing exponentially, This dataset has been collected From Azureml
the existing CRM systems potential to generate more gallery at http://gallery.azureml.net/ The dataset has :
revenue has also increased.
Dataset Observations Features
Deriving insights from the available data is important
not only to get more profit but also to save our efforts BikeBueyer 10000 18
and time. After all huge data doesn’t necessarily mean
better decision making. Missing values = 9% and above.
Class Imbalance = 85.039% Majority class and
14.96% Minority class.
VII. Working
Handling Class imbalance during data cleansing
and processing:
This problem occurs in machine learning when one of
the class of data is much lesser than the other class of
data. This produces a biased classifier .This problem
is very common and commonly observable in
disciplines like fraud detection.
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 2 | Jan-Feb 2018 Page: 1047
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
SMOTE is widely used because its simple and Where f(x) is chosen to be a linear model.
effective .The new data is given as input to k-means. We use Logistic Regression as a classifier to train our
model .We estimate the coefficients as
n
Feature selection:- arg min 1 𝐩
𝐲 ∑ 𝛃𝐱
𝜷=β1, β2 ,β3 ,… 𝐥𝐨𝐠 (𝟏 + ℮ 𝐢 𝐣 𝐢 𝐢𝐣 )
K-means algorithm is applied for feature selection. n
k=0
Then reduced dataset is given as an input to two class
Logistic Regression classifier. The proposed method Logistic Regression adjusts the value of B.
consists of the following steps: The predictive accuracy can be increased
by regularization. Thus adjusting L2-Norm and L1-
a) The number of clusters k has been determined norm near about 0.01 gave me the highest predictive
using average silhouette for the entire range of k power.
as we specify .The mean of all
the silhouette values is calculated c.) We score our model by computing score for
for distinct clusters. The count of number each xi in test set.
of clusters having high sill value is picked up to 𝒑
be the value of k. We use euclidian distance as the
𝒇(𝒙𝒊 ) = 𝜷𝒋 𝒙𝒊𝒋
distance measure of each data points from the
𝒋 𝟏
centeriod of the cluster.
b) After defining the value of k number of clusters,
Thus our model can be represented as
apply k-means on each of the features of
the dataset individually .For each cluster we
𝑩𝒊𝒌𝒆
measure the clustering performance using the
𝟏
metric silhouette. =
𝒆 𝜷𝟏𝒂𝒈𝒆 𝜷𝟐𝒏 𝒄𝒂𝒓𝒔 𝜷𝟑𝒄𝒐𝒎𝒎𝒖𝒏𝒕𝒆𝒅𝒊𝒔𝒕𝒂𝒏𝒄𝒆 ⋯ 𝒏 𝒄𝒉𝒊𝒍𝒅𝒓𝒆𝒏
c) We include the feature which gives us the
best performance and add it to the final set of
Where ᵝ1, ᵝ2 ,ᵝ3 etc. represent the coefficients of the
features. We repeat applying k-means on the
independent variables.
remaining features individually and pick the
e is the error which represents the variability in the
feature which has the highest sill value.
data.
d) If we have reached the desired number of features,
In this equation, Bike Buyer is the probability that a
we stop otherwise we repeat the
customer will buy a bike, given his or her input data.
procedure as mentioned above. This way we have
where Age, cars, and Communte_distance are the
performed feature selection. The data is passed to
independent variables.
logistic regression for classification.
y=+1 y=-1 VIII. Result and Analysis
ŷ=1 TP FP (Type I) The method has been implemented in azure machine
learning studio using R programming language.
ŷ=-1 FN (Type II) TN
We represent confusion matrix as:-
Classification:-
For evaluation of the quality of results i have used we
have used the following metrics-
We split our preprocessed data into training and test
a .Predictive accuracy:-
set. 𝒏
𝑭𝑷 + 𝑭𝑵 𝟏
= 𝟏[𝒚𝒊 𝒚𝒊 ]
We estimate the coefficients and train our model as 𝒏 𝒏
𝒊 𝟏
follows –
n
𝒂𝒓𝒈 𝒎𝒊𝒏 1 b. Sensitivity:-
𝒇= 𝒎𝒐𝒅𝒆𝒍 𝒇 𝒍 𝒚𝒊 𝒇(𝒙𝒊 ) 𝑻𝑷 ∑𝒏𝒊 𝟏
n [𝒚𝒊 𝒚𝒊 𝒂𝒏𝒅𝒚𝒊 𝟏]
k=0 =
+ 𝑹𝒆𝒈𝒖𝒍𝒂𝒓𝒊𝒛𝒂𝒕𝒊𝒐𝒏 (𝒇) #𝑷𝑶𝑺 ∑𝒏𝒊 𝟏 [𝒚𝒊 𝟏]
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 2 | Jan-Feb 2018 Page: 1048
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
c. Specificity:- Management. The study will help the company
𝑻𝑵 ∑𝒏𝒊 𝟏 [𝒚𝒊 𝒚𝒊 𝒂𝒏𝒅 𝒚𝒊 𝟏] to analyze and forecast customer’s pattern of
= purchase. This method is applicable to industries like
#𝑵𝒆𝒈 ∑𝒏𝒊 𝟏 [𝒚𝒊 𝟏]
banking industry, insurance industry, retail industry,
d. False Positivity Rate:- manufacture industries. we outlined a simplified
𝑭𝑷 ∑𝒏𝒊 𝟏 method to guide marketers and managers in making
[𝒚𝒊 𝒚𝒊 𝒂𝒏𝒅 𝒚𝒊 𝟏]
= focus their advertising and promotion on those
#𝑵𝒆𝒈 ∑𝒏𝒊 𝟏 [𝒚𝒊 𝟏] categories of people in order to reduce time and costs.
The case study we evaluated shows the practical use
e. Precision:- and usefulness of the model.
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 2 | Jan-Feb 2018 Page: 1049
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
conference on Knowledge discovery and data
mining, pages 408–413. ACM,2001
9. Paul B Chou, Edna Grossman, Dimitrios
Gunopulos,and Pasumarti Kamesam. Identifying
prospective customers. In Proceedings of the sixth
ACM SIGKDD international conference on
Knowledge discovery and data mining, pages
447–456. ACM, 2000.
10. Hill T and Lewicki P ( 2007 STATISTICS
Methods and ApplicationsTulsa, OK: StatSoft )
11. NiteshV.Chawla, KevinW. Bowyer,
LawrenceO.Hall, W.Philip Kegelmeyer, SMOTE:
Synthetic Minority Over-sampling
Technique,Journal of Artificial Intelligence
Research 16 (2002) 321–357.
12. Jingyuan Yang,Chuanren Liu,Mingfei Teng,Hui
Xiong,March Liao,Buyer Targeting Optimization:
A Unified Customer sSegmentation Perspective.
In International Conference on Big Data (Big
Data).(2016) 1262-1271
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 2 | Jan-Feb 2018 Page: 1050