Вы находитесь на странице: 1из 7

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

On Machine Learning towards Predictive Sales Pipeline Analytics


Junchi Yan123 , Chao Zhang2 , Hongyuan Zha14 , Min Gong2 ,
Changhua Sun2 , Jin Huang2 , Stephen Chu2 , Xiaokang Yang3
{yanjunchi,xkyang}@sjtu.edu.cn, zha@cc.gatech.edu, {bjzchao,gminsh,schangh,huangjsh,schu}@cn.ibm.com
1
Software Engineering Institute, East China Normal University, Shanghai, 200062, China
2
IBM Research China, Shanghai, 201203, China
3
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
4
College of Computing, Georgia Institute of Technology, Atlanta, Georgia, 30332, USA

Abstract an actual client purchase. All open opportunities are tracked,


ideally culminating in a won deal that generates revenue.
Sales pipeline win-propensity prediction is fundamen-
tal to effective sales management. In contrast to us-
By collecting the up-to-date information about the
ing subjective human rating, we propose a mod- pipeline, analytics approaches can be used to streamline the
ern machine learning paradigm to estimate the win- sales pipeline management. From the management perspec-
propensity of sales leads over time. A profile-specific tive, the resource owner can reallocate their resources based
two-dimensional Hawkes processes model is developed on the pipeline quality assessment in comparison with their
to capture the influence from sellers activities on their sales target or quota, which in turn, can also be dynamically
leads to the win outcome, coupled with leads person- adjusted based on the updated assessment result. From the
alized profiles. It is motivated by two observations: i) individual salesperson perspective, assessment can further
sellers tend to frequently focus their selling activities provide actionable advise to field sellers. By predictively
and efforts on a few leads during a relatively short time. scoring the quality of each lead at hand, it allows field sell-
This is evidenced and reflected by their concentrated in-
teractions with the pipeline, including login, browsing
ers to better prioritize their personal resources and actions,
and updating the sales leads which are logged by the in face of a relatively large number of ongoing leads within a
system; ii) the pending opportunity is prone to reach its tight period. These two situations are especially pronounced
win outcome shortly after such temporally concentrated for companies having large and global client-facing sales
interactions. Our model is deployed and in continual use teams dealing with increasingly complex portfolios of ever-
to a large, global, B2B multinational technology enter- changing products and services.
prize (Fortune 500) with a case study. Due to the gen- The fundamental building block to pipeline quality as-
erality and flexibility of the model, it also enjoys the sessment is the lead-level win-propensity scorer. In fact,
potential applicability to other real-world problems.
machine learning currently has not been widely applied to
the B2B sales pipeline environment, or little technical work
Introduction have been released from the business side. In practice, many
internal pipeline systems, including the company that will
Business-to-business (B2B) selling has evolved consider- be studied in our case study, typically ask the field seller to
ably over the last five decades from the in-person pitches enter his subjective rating towards each of the leads that he
depicted in the television series, to email and user profile- owns. Then these fine-grained evaluations would be aggre-
based deals, to customer relationship management (CRM) gated by accompanying with other factors to facilitate the
systems (Linoff and Berry 2011), and to the emerging trend decision making at different management levels.
of automatic sales analytics that allows the optimization of
However, such a subjective approach would unavoidably
sales processes (Kawas et al. 2013).
introduce noise. From our observation to the referred com-
Therefore, companies are adopting more systematic and pany, on one hand, many sellers intentionally manipulate the
digitalized sales management systems to support the sales ratings in two ways: i) some leads are underrated by the
process. The common pipeline operation model (Kawas et seller in order to avoid the attention and competition from
al. 2013) can be described as follows: As new sales leads other sellers who may also have the channel to touch the
are identified, the seller enters these leads into the sales op- clients behind the leads; ii) in contrast, some leads are over-
portunity pipeline management system. These leads are fur- rated because the sellers suffer pressure from their leaders,
ther evaluated and some are qualified into opportunities. A who set different subtle performance metrics in a process
sales opportunity consists of a set of one or more products oriented management fashion, not only for the final won rev-
or services that the salesperson is attempting to convert into enue. Another drawback is different sellers may have biased

The work is partially supported by NSF IIS-1116886, NIH personal expectations to similar leads. This fact is also com-
R01 GM108341, NSFC 61129001 and 61025005/F010403. mon for human rating in many information retrieval applica-
Copyright c 2015, Association for the Advancement of Artificial tions, and is typically solved by asking pairwise comparison
Intelligence (www.aaai.org). All rights reserved. instead of entering a global score. However, such interfaces

1945
Figure 2: A seller-lead interaction event sequence ending by
a win event. The interaction exhibits temporal clustering pat-
terns and is won after a short period.
Figure 1: Business scenario illustration for the proposed
scoring model. Win propensity for the next two weeks is
issued on a weekly basis within one quarter, until week 11. Learning the Proposed Model
Then new weekly scoring procedure starts as new quarter We specify our problem and practice under the referred com-
begins, by a pipeline cleaning step. pany, whereby the goal is to weekly issue and update the win
likelihood of each lead, within the time window of the next
two weeks since current week. In particular, the life-cycle
are not available to current pipeline systems. of a sales lead is regarded as confined in a business cycle,
From the domain application perspective, there is an ex- which is one quarter in this paper. During one quarter (13
tensive literature (Linoff and Berry 2011) in the field of mar- weeks), the lead is monitored and scored by the model until
keting science in which various selling strategies are char- week 11. Those non-won leads in the end of current quar-
acterized and optimized, but the focus is on the business- ter would be treated in two ways: some of them are identified
to-consumer (B2C) domain rather than business-to-business as garbage leads that would be removed in the beginning of
(B2B). In fact, quantitative sales analytics in the B2B sell- next quarter; the rest would be refreshed as new leads to-
ing has recently been an emerging and active topic in both gether with those newly created ones in the next quarter.
industry and research community (Lawrence et al. 2010;
Varshney and Singh 2013). However, the above analysis is Motivation and Problem Formulation
mostly performed in a retrospective data mining and knowl- For this weekly updated scoring model, the input features
edge discovery fashion. Specifically, (Lawrence et al. 2010) on one hand consist of the static profile information such
describes two practical solutions deployed in IBM tailored as deal size, geography, sector, product and other attributes
for identifying whitespace clients, i.e. OnTARGET and Mar- as exemplified in Table 1. On the other hand, there is an
ket Alignment Program, by analyzing the data from external additional dynamic clue which is in the form of a seller-lead
market. (Kawas et al. 2013) addresses the problem of effi- interaction event sequence associated with each lead within
cient sales resource optimization in two steps. The first step a censored time window. This time window is usually set to
involves using training samples of historically won sales op- the end of a recent previous quarter when building a training
portunities, to estimate the sales response function between dataset, and up to now when performing model scoring on
the salesforces full-time equivalent (FTE) effort and the ex- testing data. We would elaborate in more details for how to
pected revenue or profit (Varshney and Singh 2013); The build the training and testing datasets in the rest of the paper.
second step involves finding the optimal salesforce alloca- There are several ways to transform the referred business
tion subject to business constraints. To our surprise, few problem into a machine learning problem. One straightfor-
work has been done or released for involving predictive ward way is using supervised binary classification. Given
modeling in sales pipeline analytics, especially estimating one quarter historical weekly snapshot data, one can de-
the lead-wise win propensity. fine the labeled training dataset by lead profile features
This paper attempts to score the lead-level win-propensity f1 , f2 , . . . , f11 including its past interaction by sellers,
for a given forward time window, by using the static pro- where the subscript denotes week number, and the cor-
files and dynamic clues from the sales pipeline. To this end, responding win or no-win outcome within the corre-
we propose a profile-specific two-dimensional Hawkes pro- sponding two weeks ahead: o34 , o45 , . . . , o1213 . Then Logis-
cesses model tailored to the problem of estimating the lead- tic classification model or other models can be applied.
level win-propensity within a forward time window. Our This approach suffers several limitations: i) it truncates
model is able to incorporate the static profile features such the observation window to an ad-hoc period which induces
as lead revenue size, product offering, client industry etc., the label; ii) the binary classifier is not a dynamic model, and
as well as to capture the the dynamic influence from seller unable to capture the dynamics of the lead life-cycle flexi-
to lead, which is observed from their interactions activi- bly. To improve this baseline approach, one way is to use
ties including browsing, updating the client-visiting log. The a censored classification model e.g. (Shivaswamy, Chu, and
model is implemented and deployed to a real sales pipeline Jansche 2007), or survival analysis model like Cox model
business environment in a multinational Fortune 500 tech- (Cox and Oakes 1984) under the point process framework.
nology company across different products lines, and gener- In this paper, we are motivated by the specific observa-
ated direct revenue impact which is estimated up to $43.2 tion that a more indicative pattern comes from the interac-
million via internal evaluation in year 2013. The research tions between sales and pipeline, where the interactions re-
team received the Research Accomplishment Award in year fer to different activities logged by the pipeline system when
2013 due to the recognition from the business side. the seller visits the pipeline web portal, i.e. which lead he

1946
Table 1: Exemplary features and data types of sales leads. Now we show how to formulate the problem into a spe-
cific machine learning paradigm. Suppose we have m sam-
Profile type remark or examples ples, i.e. m independent event sequences {c1 , ..., cm } from
geography categorical Greater China, Southeast Asia the multi-dimensional Hawkes process, where each sample,
deal size categorical expected deal size in USD in the form of cs ={(tsi ,usi )}(i=1,...,ns ), is an event sequence
sector categorical general business, industry clients of length ns , occurring during the observation time win-
industry categorical health-care, energy and utility dow [0, Ts ]. Each pair corresponds to an event occurring
product categorical Sub-brands of the main brand by dimension usi at time tsi . We use the following formula
for the log-likelihood of general multi-dimensional Hawkes
processes whose parameters can be estimated via maximum
is browsing/updating. More concretely, we find sellers usu- likelihood estimation (Rubin 1972; Ozaki 1979)
ally focus on one or few certain leads thus (s)he may ac- m
X ns
X U Z
X Ts
!
tively interact with them frequently within a short time pe- L= log usi (tsi ) u (t)dt
riod. Furthermore, as shown in Fig.2, such temporal clus- s=1 i=1 u=1 0

tering activities would also trigger win shortly. Thus it is


By specifying the multi-dimensional Hawkes model for
appealing to suppose the interactions are prone to occur re- the intensity function, we obtain the following objective
peatedly shortly after a recent interaction event, so for the Rt
win event. Based on the above observations, it is desir- function (Liniger 2009) where Guuj (t) = 0 guuj (t)dt.
s s

able to capture the dynamic pattern of sales leads over time, m


X ns
X X
preferably by a parsimonious parametric model to make the L(, a) = log(usi (tsi ) + ausi usj gusi usj (tsi tsj ))
modeling interpretable and efficient. Note the conventional s=1 i=1 ts s
j <ti
Cox model or its time-varying variants does not incorporate U
X ns
U X
X
!
such recurrence pattern in the model, thus lacks of the flexi- Ts u auusj Guusj (Ts tsj )
bility in coping with such interaction event sequences. u=1 u=1 j=1

Seller-pipeline Interaction Modeling Here we collect the parameters into vector-matrix formats,
= (u ) for base intensity, and a = (auu0 ) for the mutually
In the following, we will show how to model the win out- exciting coefficients for dimension u affected by u0 .
comes dependency on the interaction sequences using a
two-dimensional point process model, see (Daley and Vere- Learning Profile-specific Hawkes Processes
Jones 1988) and the references therein. Specifically, we In this paper, we have U =2 processes to model: u = 1: in-
adopt the Hawkes process model (Hawkes 1971) to cap- teraction sequences and u = 2: the outcome event. Under
ture the temporal clustering dynamics. We will start with this context, the base term incorporates the inherent interac-
a brief description of one-dimensional Hawkes processes, tion intention of salespeople to the leads - promising leads
and then extend it to the multi-dimensional case. In partic- typically receive more attention from sales, and also better
ular, our main work lies in proposing a profile-specific two- chance to win even no sellers interaction is observed. The
dimensional Hawkes processes model and a tailored alter- exciting term is used to properly account for the contribu-
nating optimization algorithm to learn the model parameters. tions from much earlier interaction event which may trig-
For mathematical tractability, the exponential kernel is used ger subsequent interaction events that eventually lead to win.
to model our seller-lead interaction modeling problem. Specifically, the exciting effects are modeled to be decaying
For the general Hawkes processes model, in its basic form
as a one-dimensional point process, its conditional over time as gij (t t0 ) = wij ewij (tt0 ) .
P intensity
can be expressed as (Hawkes 1971): = +a i:ti <t g(t Furthermore, our problem at hand bears several more spe-
ti ), where is the base intensity and ti the time of events in cific characters to explore: i) the mutual influence is only
the process before time t. g(t) is the kernel to mimic the one-way from the interaction dimension to outcome dimen-
influence from the previous events. Given an event sequence sion rather than two-way, thus a12 = 0 and w12 = 0; ii)
{ti }ni=1 observed in [0,T ], its log-likelihood estimator is the self exciting phenomenon only exits for the interaction
Qn n Z T events since the outcome event is one-off, thus a22 = 0
(ti ) X
L = log  i=1
RT  = log (t i ) (t)dt and w22 = 0; iii) for the given lead s, the base inten-
exp 0 (s)ds i=1 0 sity is assumed to be associated with its intrinsic attributes
xs = [xs1 , xs2 , . . . , xsK ]T including deal size, channel, age,
Extending the above equation to the U -dimension case, sales stage and other related profiles, which can be encoded
a multi-dimensional Hawkes process is defined by a U - by a parameter vector u = [u0 , u1 , . . . , uK ]T where u0
dimensional point process, and its conditional intensity for is a constant term. Therefore, the training set of leads with
the u-th dimension is (Zhou, Zha, and Song 2013b) different profiles shall be heterogenous regarding with the
X
u (t) = u + auui guui (t ti ) base intensity. We chose the widely used Logistic function
0
i:ti <t by a scaled coefficient 0u i.e. su = 1+exp( u
T s for both
ux )
where consists of a P base intensity term u and an accu- the interaction process (u=1) and the final outcome process
mulative exciting term i:ti <t auui guui (tti ). It can be in- (u=2). For two dimensions, the parameters of base inten-
terpreted as the instant probabilities of point occurrence, de- sity can be different as modeled by {01 ,1 } and {02 ,2 } re-
pending on the previous events across different dimensions. spectively. Note this parametrization only assumes different

1947
gorithm (Hunter and Lange 2004) on the surrogate function:

0
m
X s 1
nX
s
01 hs1 i1
X s a11 g11 (tsi tj ) 
L1 ( , a) pii log + pij log
s=1 i=1
psii j=1
psij
s 1

 nX 
0 s s
Ts 1 h1 + a11 G11 (Ts tj )
j=1

In the k+1-th iteration, we have psii (l+1) , psij (l+1)


(l)
01 hs1
psii (l+1) = Pi1 (l) (l) s (2)
01 (l) hs1 + j=1 a11 g11 (ti tsj )
Pi1 (l)
Figure 3: Pipeline quality and gap analysis web portal. a11 g11 (tsi tsj )
j=1
psij (l+1) = Pi1 (l) (l) s (3)
01 (l) hs1 + j=1 a11 g11 (ti tsj )

leads may have different base intensity, but it still assumes Given the lead s, psij can be interpreted as the likelihood that
the base intensity is constant over time for a given lead. the i-th event (ui , ti ) is affected by the previous j-th event
(uj , tj ) for interaction sequence associated with s and psii
The two above facts decouple the mutual influence to a re- is the likelihood that i-th event is sampled from the back-
duced parameter space, while the third character addresses ground intensity. Moreover, its advantage is the parameter
the heterogenous property of sales leads. Now we formulate
the profile-specific decoupled two-dimensional exciting pro- 1 and a11 can be solved in closed forms, and the non-
negativity constraint of 01 is automatically satisfied.
cess as follows (by letting hsu , hu (xs ) = 1+exp(
1
T xs ) ) L L
u Zeroing the partial derivative 0 and a
11
leads to:
1

s 1 s (l+1)
m nX
!
L =L1 (01 , 1 , a11 , w11 ) + L2 (02 , 2 , a21 , w21 ) (l+1) 1 X pii
01 = P s (4)
m
X s 1
nX  X  h
s 1 Ts
s=1 i=1
hs1
L1 = log 01 hs1 + a11 g11 (tsi tsj ) Pm Pns 1 P s (l+1)
s=1 i=1 ts s (l+1) s=1 i=1 j<i pij
j <ti a11 = P Pn 1 (l) (5)
s
j=1 G11 (Ts tj )
s
s 1
nX
! s
Ts 01 hs1 a11 G11 (Ts tsj )
j=1
Meanwhile, we solve the estimation of the exciting ker-
m   nel scale parameter w11 in g(t tj ) = wew(ttj ) : note
ew(T ti ) 0 when wT  1 as suggested in (Lewis and
X X
L2 = log 02 hs2 + a21 g21 (tns tsj )
s=1 ts
j <tns
Mohler 2011) which shows w can be approximated by:
Pm P
psij (l)
!
(l+1) s=1 i>j
Ts 02 hs2 a21 G21 (0) (1) w11 = Pm P (6)
s=1 i>j (ti tj )psij (l)

Solving for by fixing a, w and 0 Given the fixed ex-


Here the event associated with a lead consists of ns 1 inter- citing term parameters and the base intensity scaling factor,
actions and the ns -th event i.e. final outcome. Note that the we adopt gradient descent to solve the sub-problem with re-
above formulation decouples the first four terms from the spect to variable 1 . More specifically, by dropping off the
remaining four terms regarding parameter 01 , 1 , a11 , w11 Pns 1
constant term j=1 a11 G11 (Ts tsj ) in L1 , we obtain the
(for self-exciting interaction sequence) from 02 , 2 , a21 , w21
following objective which is a function w.r.t. 1 :
(for interactions effect to outcome and its base intensity),
thus we are seeking for estimating all the parameters rather X s 1
m nX
log 01 hs1 + Cis Ts 01 hs1

than like the previous work (Zhou, Zha, and Song 2013a) (7)
s=1 i=1
that assumes the triggering kernel w is known and impos- X
ing additional regularization for the matrix of aij to be where Cis = a11 g11 (tsi tsj )
low rank and sparse. Below we show how to maximize ts s
j <ti
the above L(0 , , a, w) using an alternating optimization
algorithm. Since the two terms L1 (01 , 1 , a11 , w11 ) and For the constant encoded by 10 , the partial derivative is:
L2 (02 , 2 , a21 , w21 ) can be decoupled during optimization, m ns 1
L1 X X 01 exp(1T xs )
in the following we give a strict derivation for the first term = ( 0 s
Ts 01 ) (8)
and a similar procedure is done to the second term. 10 s=1 i=1
h
1 1 + C i 1 + exp(1T xs )

For the other coefficients in 1 , the partial derivative is:


0
Solving for a, w and by fixing L can be surro- L1 X X m ns 1
01 xs exp(1T xs )
gated by its tight lower bound based on Jensens inequal- = ( 0 s
Ts 01 ) k (9)
1k 1 h 1 + C i 1 + exp(1T xs )
ity, which allows for the Majorize-Minimization (MM) al- s=1 i=1

1948
Algorithm 1 Learning profile-specific decoupled two- the heterogeneity of u in real-world problems. This sim-
dimensional Hawkes processes for lead win-propensity estimation plification is also used in (Zhou, Zha, and Song 2013b;
1: Input:
Li and Zha 2014) and the latter work instead parameterizes
2: observed training samples i.e. leads {cs }, m
P the mutual influence aij via latent variables to reduce the
s=1 where
each P lead is associated with an interaction event sequence model space induced by a large number of dimensions for
{ti }, n s 1
which is tailed with the won time stamp tns their social infectivity analysis. In contrast, we address the
i=1
if lead cs is won within a certain period e.g. a full quarter; inherent heterogeneity by parameterizing lead attributes in
3: Profile attributes xs = [xs1 , xs2 , ..., xsK ]T that is associated with a tractable optimization scheme. Note that for a given lead
lead cs , as exemplified in Table 1; with known profile attributes, our model assumes the back-
4: Initial value for 01 , 1 , a11 , w11 , 02 , 2 , a21 , w21 , l=0; ground is a stationary point process equaling to Poisson pro-
5: Iteration stopping threshold L, gradient descent step-size ; cess. This is because our practical problem is confined in
6: Output: Learned parameters 01 , 1 , a11 , w11 for the self- a relatively short business period, e.g. one quarter, thus the
exciting model, and 02 , 2 , a21 , w21 for the affecting model. secular trend rarely exists. Thus we do not need perform the
7: Procedure:
8: for l = 1 : Lmax do
background model fitting using different non-stationary as-
9: // Solving for a11 , w11 , 01 by fixing 1 sumptions as used in (Lewis et al. 2010).
10: Update psii (l+1) , psij (l+1) by Eq. (2) and (3); Impact to real-world problems As far as we know,
(l+1) (l+1) (l+1) this is the first work to establish a modern machine learn-
11: Update 01 , a11 , w11 by Eq. (4), (5), (6); ing paradigm, i.e. profile-specific two-dimensional Hawkes
12: // Solving for 1 by fixing a11 , w11 and 01
(l+1) Processes and learning algorithm for applications to the
13: Update 1k by Eq. (10) by the gradients in Eq. (8), (9); sales pipeline prediction. Though there is a few precedent
14: end for
15: Apply the same method for solving 02 , 2 , a21 , w21 .
statistical methods (Zliobaite, Bakker, and Pechenizkiy ;
Chen et al. 2010) for sales analytics, while these methods
and applications differ significantly from ours in that the his-
torical event sequences (sales interaction) are not captured.
Apply gradient descent to update 1k : For instance, one straightforward way is collecting the ba-
sic statistics of events over a certain time window such as
(l+1) (l) L1 sum, variance etc. However, this aggregation would cause
1k = 1k , k = 0, 1, . . . , K (10)
1k information loss which hurts the potential towards more ad-
vanced predictive modeling. Furthermore, our method can
Similar iterative scheme can be performed for the also be easily generalized to other practical problems. For
the term L2 . Thus we finally obtain the estimations of instance, in asset management, given a sequence of different
1 , 1 , a11 , w11 and 2 , 2 , a21 , w21 separately. The overall types of failure events associated with the asset, {aij , wij }
optimization algorithm is summarized in Algorithm 1. can model the mutual impact between different failure types,
and u (x) can model the background failure rate related to
Related Work and Contribution the asset profile x and failure type u. We have seen the early
The Hawkes process dates back to (Hawkes 1971; Ogata success of recent work on predictive maintenance to urban
1988). The model partitions the rate of events occurring to pipe network (Yan et al. 2013b) and grid (Ertekin, Rudin,
background and self-excited components. The background and McCormick 2013), whereby only a one-dimensional
events are statistically independent of one another, while the Hawkes process is adopted with a constant background rate
offspring events are triggered by prior events. Its applicabil- which ignores the type of failures and the diversity of each
ity for time-series or event sequence data has stimulated at- sample. The proposed model in this paper is more promising
tentions of diverse disciplines, e.g. seismology (Ogata 1988; as it is more flexible to incorporate the rich types of failures
1998), finance (Weber and Chehrazi 2012), criminology (e.g. leak, burst for pipe failure), as well as to handle the het-
(Lewis et al. 2010; Mohler et al. 2011) and asset man- erogeneity of background rate with a parameterized profile
agement(Yan et al. 2013b; Ertekin, Rudin, and McCormick model (e.g. consider the diversity of material type, diameter,
2013) and the references therein. In contrast to the above age for each pipe). Other potential applications can also be
work focusing on one-dimensional Hawkes process, this pa- found such as client purchase life-cycle analysis where each
per aims to seeking a comprehensive formulation and effec- type of items can take one dimension and the background
tive algorithm for profile-specific multi-dimensional Hawkes rate is personalized by the customer profile features.
processes, which is a relatively new topic with several very
recent literature (Liniger 2009; Zhou, Zha, and Song 2013a; Deployment and Evaluation
2013b; Li and Zha 2014; 2013; Li et al. 2014). We perform our study on a Fortune 500 multinational tech-
Technical-innovation Compared with the above men- nology company in the B2B market environment. Through-
tioned work related to Hawkes process, all parameters in out this section, due to the sensitivity of the proprietary
our model are assumed unknown and estimated by our pro- company-owned selling data, we de-identified the brand
posed algorithm. However, (Zhou, Zha, and Song 2013a) name and other profile information, only leave relative met-
assumes the bandwidth of the self(mutual)-exciting ker- rics such as AUC score. Our model was finished in the end
nel wij is known, and the background intensity u be- of 2013Q2. To make an unbiased performance evaluation,
ing a constant parameter for all samples, which ignores the model was evaluated in 2013Q3 with blind testing data

1949
Table 2: AUC for win prediction on blind test data 2013Q3.
The score for each lead is generated by integrating its win
intensity win over the next two weeks. BL denotes Busi-
ness Line, HW(SW) for Hardware(Software).
Market BL Lead # Win% Sales Logit Cox TKL Alg.1
New HW 200K 15.7% .608 .675 .678 .617 .707
New SW 100K 12.5% .586 .659 .661 .614 .701
Mature HW 200K 19.5% .665 .711 .718 .687 .741
Mature SW 150K 14.8% .649 .703 .709 .671 .732

Figure 4: Web portal for next-two-week win probability


(Zhou, Zha, and Song 2013a) that assumes the homogeneity
scoring. Sellers are able to view the latest scoring report.
of background rate for all leads, shows worse performance
compared with the above two models, even it incorporates
in that quarter. In 2013Q4 it was released to sales team to exciting effect. However, when it is combined by the base
impact sellers decision, with the aim of transforming the intensity personalization as solved in our model, it shows
selling ecosystem and methodology. For evaluation reported significant performance improvement. We argue that the rel-
in this paper, due to business sensitivity, we randomly chose atively poor performance of TKL model further comes from
a subset lead set (100K-200K) for each quarter. The exact the biased estimation of the background rate, which induces
overall win rate is less than 20% and the win rate on the additional noise for learning the exciting parameters. Due to
sampled set are disclosed in Table 2 and Table 3. this limitation, the TKL model would also probably cause
Application tools Before jumping into the detailed per- biased estimation in other applications such as (Yan et al.
formance evaluation, we first present two downstream appli- 2013b; Ertekin, Rudin, and McCormick 2013), which can be
cation tools derived by the proposed model. Fig.3 shows the solved by our model by personalizing the background rate.
pipeline quality and gap analysis given the quarterly quota Interference test We release the scoring report to sales
target. This heating map, which covers various areas and team in 2013Q4. Separate sales teams are receiving scoring
product lines, by calculating the overall expected won rev- reports generated by different models, and the performance
enue, is mainly used by sales leaders. Fig.4 illustrates an- is computed separately. For sales teams, they can make their
other tool tailored for individual seller, especially for fresh- resource allocation decision according to our predictions.
men, who need some guidance to prioritize their workload. The corresponding performance is reported in table 3. Com-
Blind test As mentioned earlier, we use 2013Q2 data as pared with the blind test, our model still outperforms as it
the training set, and 2013Q3 as the testing set. In particular, is likely that sellers actions are influenced by prediction.
in this case study, we chose two product lines across both They may invest more resource on the high-propensity leads
mature market and emerging market. Apart from the base- judged by our model which induces regenerative effects be-
line of sales subjective ratings, there are several machine tween prediction and action. It is also worth noting that the
learning methods being taken consideration in our evalua- sellers estimation is improved compared with the blind test-
tion: i) Logistic model, which extracts the sum and variance ing data 2013Q3. Apart from the fluctuation across quarters
of past interaction events (whole time line so far and last due to other external factors, we do receive some feedback
five weeks) as additional input features besides profiles; ii) from sales team that some sellers would cross-check our re-
Cox point process model, where only profile information is port before enter their subjective ratings. This implies our
used to model the hazard rate; iii) Constant background rate report help sellers better evaluate their leads. We leave the
model similar to recent work Triggering Kernel Learning analysis for how prediction and action influences each other
(TKL) (Zhou, Zha, and Song 2013a) that models the back- to our long-term research agenda as it requires more data to
ground rate using a constant parameter, and iv) our proposed calibrate other external factors.
background rate profile-specific Hawkes model.
Table 2 evaluates the AUC performances of ROC curve Further discussion We believe we are still in the initial
for these peer methods1 . One can observe machine learn- stage of advancing machine learning and AI in sales analyt-
ing methods all outperform the subjective ratings, especially ics which is a complex real-world problem yet relatively new
in emerging market due to the relatively young sales team to the computer science research community. Our model
there. The Logistic model and Cox model perform closely can further benefit from other data sources such as sales-
although the Cox model is assumed to be more suitable as it person profile and activity, as well as marketing and pro-
considers the observation window. In our analysis this is be- motion. New performance metrics beyond ROC AUC shall
cause Cox does not consider mutual exciting effect between be studied. More comprehensive methodologies can be de-
interaction and win outcome. The simplified model by TKL signed beyond the scope of this paper such as reinforcement
learning (Kober and Peters 2012), or specifically Markov de-
1
In fact, we evaluate the model for each week by comparing the cision processes (MDPs) (White III and White 1989) and
outcome in the next-two-week observation window. The average graph matching for resource allocation optimization (Tian
AUC over 11 weeks in that quarter are reported in this paper. et al. 2012; Yan et al. 2013a; 2014).

1950
Table 3: AUC on interference test data 2013Q4. Li, L.; Deng, H.; Dong, A.; Chang, Y.; and Zha, H.
2014. Identifying and labeling search tasks via query-based
Market BL Lead # Win% Sales Logit Cox TKL Alg.1
hawkes processes. In Proceedings of the 20th ACM SIGKDD
New HW 200K 18.3% .628 .681 .680 .612 .729 Conference on Knowledge Discovery and Data Mining.
New SW 100K 15.1% .618 .672 .664 .604 .715
Mature HW 200K 21.3% .689 .727 .721 .693 .751 Liniger, T. J. 2009. Multivariate hawkes processes. PhD
Mature SW 150K 18.2% .680 .731 .712 .689 .749
thesis, Swiss Federal Institute Of Technology, Zurich.
Linoff, G. S., and Berry, M. J. A. 2011. Data mining
techniques: For marketing, sales, and customer relationship
Conclusion management. Indianapolis, IN, USA: Wiley Publishing.
We have presented a modern machine learning method for Mohler, G. O.; Short, M. B.; Brantingham, P. J.; Schoen-
sales pipeline win prediction, which has been deployed in berg, F. P.; and Tita, G. E. 2011. Self-exciting point process
a multinational Fortune 500 B2B-selling company. The pro- modeling of crime. Journal of the American Statistical As-
posed method is applicable to other real-world problems due sociation 106(493).
to its generality and flexibility as discussed in the paper. We Ogata, Y. 1988. Statistical models for earthquake occur-
hope this paper can timely raise the wide attentions from in- rences and residual analysis for point processes. J. Amer.
dustries as selling is essential to most business companies. Statist. Assoc. 83(401):927.
Ogata, Y. 1998. Space-time point-process models for earth-
References quake occurrences. Annals of the Institute of Statistical
Chen, C.-Y.; Lee, W.-I.; Kuo, H.-M.; Chen, C.-W.; and Mathematics 50:379402.
Chen, K.-H. 2010. The study of a forecasting sales model Ozaki, T. 1979. Maximum likelihood estimation of hawkes
for fresh food. Expert Systems with Applications. self-exciting point processes. Annals of the Institute of Sta-
Cox, D. R., and Oakes, D. 1984. Analysis of survival data, tistical Mathematics 31(1):145155.
volume 21. CRC Press. Rubin, I. 1972. Regular point processes and their detection.
Daley, D. J., and Vere-Jones, D. 1988. An introduction to Information Theory, IEEE Transactions on 18(5):547557.
the theory of point processes, volume 2. Springer. Shivaswamy, P. K.; Chu, W.; and Jansche, M. 2007. A sup-
Ertekin, S.; Rudin, C.; and McCormick, T. H. 2013. Re- port vector approach to censored targets. In ICDM.
active point processes: A new approach to predicting power Tian, Y.; Yan, J.; Zhang, H.; Zhang, Y.; Yang, X.; and Zha,
failures in underground electrical systems. H. 2012. On the convergence of graph matching: Graduated
assignment revisited. In ECCV.
Hawkes, A. G. 1971. Spectra of some self-exciting and
mutually exciting point processes. Biometrika. Varshney, K. R., and Singh, M. 2013. Dose-response signal
estimation and optimization for salesforce management. In
Hunter, D. R., and Lange, K. 2004. A tutorial on mm algo-
SOLI.
rithms. The American Statistician 58(1):3037.
Weber, T. A., and Chehrazi, N. 2012. Dynamic valuation
Kawas, B.; Squillante, M. S.; Subramanian, D.; and Varsh- of delinquent credit-card accounts. Technical report, EPFL-
ney, K. R. 2013. Prescriptive analytics for allocating sales CDM-MTEI.
teams to opportunities. In ICDM Workshop.
White III, C. C., and White, D. J. 1989. Markov deci-
Kober, J., and Peters, J. 2012. Reinforcement learning in sion processes. European Journal of Operational Research
robotics: A survey. In Reinforcement Learning. Springer. 39(1):116.
579610.
Yan, J.; Tian, Y.; Zha, H.; Yang, X.; Zhang, Y.; and Chu,
Lawrence, R.; Perlich, C.; Rosset, S.; Khabibrakhmanov, I.; S. 2013a. Joint optimization for consistent multiple graph
Mahatma, S.; Weiss, S.; Callahan, M.; Collins, M.; Ershov, matching. In ICCV.
A.; and Kumar, S. 2010. Operations research improves sales
Yan, J. C.; Wang, Y.; Zhou, K.; Huang, J.; Tian, C. H.; Zha,
force productivity at ibm. Interface 40(1):3346.
H. Y.; and Dong, W. S. 2013b. Towards effective prioritizing
Lewis, E., and Mohler, E. 2011. A nonparametric em algo- water pipe replacement and rehabilitation. In IJCAI.
rithm for multiscale hawkes processes. Journal of Nonpara- Yan, J. C.; Li, Y.; Liu, W.; Zha, H. Y.; Yang, X. K.; and Chu,
metric Statistics. S. M. 2014. Graduated consistency-regularized optimization
Lewis, E.; Mohler, G.; Brantingham, P. J.; and Bertozzi, A. for multi-graph matching. In ECCV.
2010. Self-exciting point process models of insurgency in Zhou, K.; Zha, H.; and Song, L. 2013a. Learning so-
iraq. UCLA CAM Reports 10 38. cial infectivity in sparse low-rank networks using multi-
Li, L., and Zha, H. 2013. Dyadic event attribution in so- dimensional hawkes processes. In AISTATS.
cial networks with mixtures of hawkes processes. In CIKM, Zhou, K.; Zha, H.; and Song, L. 2013b. Learning triggering
16671672. ACM. kernels for multi-dimensional hawkes processes. In ICML.
Li, L., and Zha, H. 2014. Learning parametric models for Zliobaite, I.; Bakker, J.; and Pechenizkiy, M. Towards con-
social infectivity in multi-dimensional hawkes processes. In text aware food sales prediction. In ICDMW09.
Twenty-Eighth AAAI Conference on Artificial Intelligence.

1951

Вам также может понравиться