Академический Документы
Профессиональный Документы
Культура Документы
1945
Figure 2: A seller-lead interaction event sequence ending by
a win event. The interaction exhibits temporal clustering pat-
terns and is won after a short period.
Figure 1: Business scenario illustration for the proposed
scoring model. Win propensity for the next two weeks is
issued on a weekly basis within one quarter, until week 11. Learning the Proposed Model
Then new weekly scoring procedure starts as new quarter We specify our problem and practice under the referred com-
begins, by a pipeline cleaning step. pany, whereby the goal is to weekly issue and update the win
likelihood of each lead, within the time window of the next
two weeks since current week. In particular, the life-cycle
are not available to current pipeline systems. of a sales lead is regarded as confined in a business cycle,
From the domain application perspective, there is an ex- which is one quarter in this paper. During one quarter (13
tensive literature (Linoff and Berry 2011) in the field of mar- weeks), the lead is monitored and scored by the model until
keting science in which various selling strategies are char- week 11. Those non-won leads in the end of current quar-
acterized and optimized, but the focus is on the business- ter would be treated in two ways: some of them are identified
to-consumer (B2C) domain rather than business-to-business as garbage leads that would be removed in the beginning of
(B2B). In fact, quantitative sales analytics in the B2B sell- next quarter; the rest would be refreshed as new leads to-
ing has recently been an emerging and active topic in both gether with those newly created ones in the next quarter.
industry and research community (Lawrence et al. 2010;
Varshney and Singh 2013). However, the above analysis is Motivation and Problem Formulation
mostly performed in a retrospective data mining and knowl- For this weekly updated scoring model, the input features
edge discovery fashion. Specifically, (Lawrence et al. 2010) on one hand consist of the static profile information such
describes two practical solutions deployed in IBM tailored as deal size, geography, sector, product and other attributes
for identifying whitespace clients, i.e. OnTARGET and Mar- as exemplified in Table 1. On the other hand, there is an
ket Alignment Program, by analyzing the data from external additional dynamic clue which is in the form of a seller-lead
market. (Kawas et al. 2013) addresses the problem of effi- interaction event sequence associated with each lead within
cient sales resource optimization in two steps. The first step a censored time window. This time window is usually set to
involves using training samples of historically won sales op- the end of a recent previous quarter when building a training
portunities, to estimate the sales response function between dataset, and up to now when performing model scoring on
the salesforces full-time equivalent (FTE) effort and the ex- testing data. We would elaborate in more details for how to
pected revenue or profit (Varshney and Singh 2013); The build the training and testing datasets in the rest of the paper.
second step involves finding the optimal salesforce alloca- There are several ways to transform the referred business
tion subject to business constraints. To our surprise, few problem into a machine learning problem. One straightfor-
work has been done or released for involving predictive ward way is using supervised binary classification. Given
modeling in sales pipeline analytics, especially estimating one quarter historical weekly snapshot data, one can de-
the lead-wise win propensity. fine the labeled training dataset by lead profile features
This paper attempts to score the lead-level win-propensity f1 , f2 , . . . , f11 including its past interaction by sellers,
for a given forward time window, by using the static pro- where the subscript denotes week number, and the cor-
files and dynamic clues from the sales pipeline. To this end, responding win or no-win outcome within the corre-
we propose a profile-specific two-dimensional Hawkes pro- sponding two weeks ahead: o34 , o45 , . . . , o1213 . Then Logis-
cesses model tailored to the problem of estimating the lead- tic classification model or other models can be applied.
level win-propensity within a forward time window. Our This approach suffers several limitations: i) it truncates
model is able to incorporate the static profile features such the observation window to an ad-hoc period which induces
as lead revenue size, product offering, client industry etc., the label; ii) the binary classifier is not a dynamic model, and
as well as to capture the the dynamic influence from seller unable to capture the dynamics of the lead life-cycle flexi-
to lead, which is observed from their interactions activi- bly. To improve this baseline approach, one way is to use
ties including browsing, updating the client-visiting log. The a censored classification model e.g. (Shivaswamy, Chu, and
model is implemented and deployed to a real sales pipeline Jansche 2007), or survival analysis model like Cox model
business environment in a multinational Fortune 500 tech- (Cox and Oakes 1984) under the point process framework.
nology company across different products lines, and gener- In this paper, we are motivated by the specific observa-
ated direct revenue impact which is estimated up to $43.2 tion that a more indicative pattern comes from the interac-
million via internal evaluation in year 2013. The research tions between sales and pipeline, where the interactions re-
team received the Research Accomplishment Award in year fer to different activities logged by the pipeline system when
2013 due to the recognition from the business side. the seller visits the pipeline web portal, i.e. which lead he
1946
Table 1: Exemplary features and data types of sales leads. Now we show how to formulate the problem into a spe-
cific machine learning paradigm. Suppose we have m sam-
Profile type remark or examples ples, i.e. m independent event sequences {c1 , ..., cm } from
geography categorical Greater China, Southeast Asia the multi-dimensional Hawkes process, where each sample,
deal size categorical expected deal size in USD in the form of cs ={(tsi ,usi )}(i=1,...,ns ), is an event sequence
sector categorical general business, industry clients of length ns , occurring during the observation time win-
industry categorical health-care, energy and utility dow [0, Ts ]. Each pair corresponds to an event occurring
product categorical Sub-brands of the main brand by dimension usi at time tsi . We use the following formula
for the log-likelihood of general multi-dimensional Hawkes
processes whose parameters can be estimated via maximum
is browsing/updating. More concretely, we find sellers usu- likelihood estimation (Rubin 1972; Ozaki 1979)
ally focus on one or few certain leads thus (s)he may ac- m
X ns
X U Z
X Ts
!
tively interact with them frequently within a short time pe- L= log usi (tsi ) u (t)dt
riod. Furthermore, as shown in Fig.2, such temporal clus- s=1 i=1 u=1 0
Seller-pipeline Interaction Modeling Here we collect the parameters into vector-matrix formats,
= (u ) for base intensity, and a = (auu0 ) for the mutually
In the following, we will show how to model the win out- exciting coefficients for dimension u affected by u0 .
comes dependency on the interaction sequences using a
two-dimensional point process model, see (Daley and Vere- Learning Profile-specific Hawkes Processes
Jones 1988) and the references therein. Specifically, we In this paper, we have U =2 processes to model: u = 1: in-
adopt the Hawkes process model (Hawkes 1971) to cap- teraction sequences and u = 2: the outcome event. Under
ture the temporal clustering dynamics. We will start with this context, the base term incorporates the inherent interac-
a brief description of one-dimensional Hawkes processes, tion intention of salespeople to the leads - promising leads
and then extend it to the multi-dimensional case. In partic- typically receive more attention from sales, and also better
ular, our main work lies in proposing a profile-specific two- chance to win even no sellers interaction is observed. The
dimensional Hawkes processes model and a tailored alter- exciting term is used to properly account for the contribu-
nating optimization algorithm to learn the model parameters. tions from much earlier interaction event which may trig-
For mathematical tractability, the exponential kernel is used ger subsequent interaction events that eventually lead to win.
to model our seller-lead interaction modeling problem. Specifically, the exciting effects are modeled to be decaying
For the general Hawkes processes model, in its basic form
as a one-dimensional point process, its conditional over time as gij (t t0 ) = wij ewij (tt0 ) .
P intensity
can be expressed as (Hawkes 1971): = +a i:ti <t g(t Furthermore, our problem at hand bears several more spe-
ti ), where is the base intensity and ti the time of events in cific characters to explore: i) the mutual influence is only
the process before time t. g(t) is the kernel to mimic the one-way from the interaction dimension to outcome dimen-
influence from the previous events. Given an event sequence sion rather than two-way, thus a12 = 0 and w12 = 0; ii)
{ti }ni=1 observed in [0,T ], its log-likelihood estimator is the self exciting phenomenon only exits for the interaction
Qn n Z T events since the outcome event is one-off, thus a22 = 0
(ti ) X
L = log i=1
RT = log (t i ) (t)dt and w22 = 0; iii) for the given lead s, the base inten-
exp 0 (s)ds i=1 0 sity is assumed to be associated with its intrinsic attributes
xs = [xs1 , xs2 , . . . , xsK ]T including deal size, channel, age,
Extending the above equation to the U -dimension case, sales stage and other related profiles, which can be encoded
a multi-dimensional Hawkes process is defined by a U - by a parameter vector u = [u0 , u1 , . . . , uK ]T where u0
dimensional point process, and its conditional intensity for is a constant term. Therefore, the training set of leads with
the u-th dimension is (Zhou, Zha, and Song 2013b) different profiles shall be heterogenous regarding with the
X
u (t) = u + auui guui (t ti ) base intensity. We chose the widely used Logistic function
0
i:ti <t by a scaled coefficient 0u i.e. su = 1+exp( u
T s for both
ux )
where consists of a P base intensity term u and an accu- the interaction process (u=1) and the final outcome process
mulative exciting term i:ti <t auui guui (tti ). It can be in- (u=2). For two dimensions, the parameters of base inten-
terpreted as the instant probabilities of point occurrence, de- sity can be different as modeled by {01 ,1 } and {02 ,2 } re-
pending on the previous events across different dimensions. spectively. Note this parametrization only assumes different
1947
gorithm (Hunter and Lange 2004) on the surrogate function:
0
m
X s 1
nX
s
01 hs1 i1
X s a11 g11 (tsi tj )
L1 ( , a) pii log + pij log
s=1 i=1
psii j=1
psij
s 1
nX
0 s s
Ts 1 h1 + a11 G11 (Ts tj )
j=1
leads may have different base intensity, but it still assumes Given the lead s, psij can be interpreted as the likelihood that
the base intensity is constant over time for a given lead. the i-th event (ui , ti ) is affected by the previous j-th event
(uj , tj ) for interaction sequence associated with s and psii
The two above facts decouple the mutual influence to a re- is the likelihood that i-th event is sampled from the back-
duced parameter space, while the third character addresses ground intensity. Moreover, its advantage is the parameter
the heterogenous property of sales leads. Now we formulate
the profile-specific decoupled two-dimensional exciting pro- 1 and a11 can be solved in closed forms, and the non-
negativity constraint of 01 is automatically satisfied.
cess as follows (by letting hsu , hu (xs ) = 1+exp(
1
T xs ) ) L L
u Zeroing the partial derivative 0 and a
11
leads to:
1
s 1 s (l+1)
m nX
!
L =L1 (01 , 1 , a11 , w11 ) + L2 (02 , 2 , a21 , w21 ) (l+1) 1 X pii
01 = P s (4)
m
X s 1
nX X h
s 1 Ts
s=1 i=1
hs1
L1 = log 01 hs1 + a11 g11 (tsi tsj ) Pm Pns 1 P s (l+1)
s=1 i=1 ts s (l+1) s=1 i=1 j<i pij
j <ti a11 = P Pn 1 (l) (5)
s
j=1 G11 (Ts tj )
s
s 1
nX
! s
Ts 01 hs1 a11 G11 (Ts tsj )
j=1
Meanwhile, we solve the estimation of the exciting ker-
m nel scale parameter w11 in g(t tj ) = wew(ttj ) : note
ew(T ti ) 0 when wT 1 as suggested in (Lewis and
X X
L2 = log 02 hs2 + a21 g21 (tns tsj )
s=1 ts
j <tns
Mohler 2011) which shows w can be approximated by:
Pm P
psij (l)
!
(l+1) s=1 i>j
Ts 02 hs2 a21 G21 (0) (1) w11 = Pm P (6)
s=1 i>j (ti tj )psij (l)
1948
Algorithm 1 Learning profile-specific decoupled two- the heterogeneity of u in real-world problems. This sim-
dimensional Hawkes processes for lead win-propensity estimation plification is also used in (Zhou, Zha, and Song 2013b;
1: Input:
Li and Zha 2014) and the latter work instead parameterizes
2: observed training samples i.e. leads {cs }, m
P the mutual influence aij via latent variables to reduce the
s=1 where
each P lead is associated with an interaction event sequence model space induced by a large number of dimensions for
{ti }, n s 1
which is tailed with the won time stamp tns their social infectivity analysis. In contrast, we address the
i=1
if lead cs is won within a certain period e.g. a full quarter; inherent heterogeneity by parameterizing lead attributes in
3: Profile attributes xs = [xs1 , xs2 , ..., xsK ]T that is associated with a tractable optimization scheme. Note that for a given lead
lead cs , as exemplified in Table 1; with known profile attributes, our model assumes the back-
4: Initial value for 01 , 1 , a11 , w11 , 02 , 2 , a21 , w21 , l=0; ground is a stationary point process equaling to Poisson pro-
5: Iteration stopping threshold L, gradient descent step-size ; cess. This is because our practical problem is confined in
6: Output: Learned parameters 01 , 1 , a11 , w11 for the self- a relatively short business period, e.g. one quarter, thus the
exciting model, and 02 , 2 , a21 , w21 for the affecting model. secular trend rarely exists. Thus we do not need perform the
7: Procedure:
8: for l = 1 : Lmax do
background model fitting using different non-stationary as-
9: // Solving for a11 , w11 , 01 by fixing 1 sumptions as used in (Lewis et al. 2010).
10: Update psii (l+1) , psij (l+1) by Eq. (2) and (3); Impact to real-world problems As far as we know,
(l+1) (l+1) (l+1) this is the first work to establish a modern machine learn-
11: Update 01 , a11 , w11 by Eq. (4), (5), (6); ing paradigm, i.e. profile-specific two-dimensional Hawkes
12: // Solving for 1 by fixing a11 , w11 and 01
(l+1) Processes and learning algorithm for applications to the
13: Update 1k by Eq. (10) by the gradients in Eq. (8), (9); sales pipeline prediction. Though there is a few precedent
14: end for
15: Apply the same method for solving 02 , 2 , a21 , w21 .
statistical methods (Zliobaite, Bakker, and Pechenizkiy ;
Chen et al. 2010) for sales analytics, while these methods
and applications differ significantly from ours in that the his-
torical event sequences (sales interaction) are not captured.
Apply gradient descent to update 1k : For instance, one straightforward way is collecting the ba-
sic statistics of events over a certain time window such as
(l+1) (l) L1 sum, variance etc. However, this aggregation would cause
1k = 1k , k = 0, 1, . . . , K (10)
1k information loss which hurts the potential towards more ad-
vanced predictive modeling. Furthermore, our method can
Similar iterative scheme can be performed for the also be easily generalized to other practical problems. For
the term L2 . Thus we finally obtain the estimations of instance, in asset management, given a sequence of different
1 , 1 , a11 , w11 and 2 , 2 , a21 , w21 separately. The overall types of failure events associated with the asset, {aij , wij }
optimization algorithm is summarized in Algorithm 1. can model the mutual impact between different failure types,
and u (x) can model the background failure rate related to
Related Work and Contribution the asset profile x and failure type u. We have seen the early
The Hawkes process dates back to (Hawkes 1971; Ogata success of recent work on predictive maintenance to urban
1988). The model partitions the rate of events occurring to pipe network (Yan et al. 2013b) and grid (Ertekin, Rudin,
background and self-excited components. The background and McCormick 2013), whereby only a one-dimensional
events are statistically independent of one another, while the Hawkes process is adopted with a constant background rate
offspring events are triggered by prior events. Its applicabil- which ignores the type of failures and the diversity of each
ity for time-series or event sequence data has stimulated at- sample. The proposed model in this paper is more promising
tentions of diverse disciplines, e.g. seismology (Ogata 1988; as it is more flexible to incorporate the rich types of failures
1998), finance (Weber and Chehrazi 2012), criminology (e.g. leak, burst for pipe failure), as well as to handle the het-
(Lewis et al. 2010; Mohler et al. 2011) and asset man- erogeneity of background rate with a parameterized profile
agement(Yan et al. 2013b; Ertekin, Rudin, and McCormick model (e.g. consider the diversity of material type, diameter,
2013) and the references therein. In contrast to the above age for each pipe). Other potential applications can also be
work focusing on one-dimensional Hawkes process, this pa- found such as client purchase life-cycle analysis where each
per aims to seeking a comprehensive formulation and effec- type of items can take one dimension and the background
tive algorithm for profile-specific multi-dimensional Hawkes rate is personalized by the customer profile features.
processes, which is a relatively new topic with several very
recent literature (Liniger 2009; Zhou, Zha, and Song 2013a; Deployment and Evaluation
2013b; Li and Zha 2014; 2013; Li et al. 2014). We perform our study on a Fortune 500 multinational tech-
Technical-innovation Compared with the above men- nology company in the B2B market environment. Through-
tioned work related to Hawkes process, all parameters in out this section, due to the sensitivity of the proprietary
our model are assumed unknown and estimated by our pro- company-owned selling data, we de-identified the brand
posed algorithm. However, (Zhou, Zha, and Song 2013a) name and other profile information, only leave relative met-
assumes the bandwidth of the self(mutual)-exciting ker- rics such as AUC score. Our model was finished in the end
nel wij is known, and the background intensity u be- of 2013Q2. To make an unbiased performance evaluation,
ing a constant parameter for all samples, which ignores the model was evaluated in 2013Q3 with blind testing data
1949
Table 2: AUC for win prediction on blind test data 2013Q3.
The score for each lead is generated by integrating its win
intensity win over the next two weeks. BL denotes Busi-
ness Line, HW(SW) for Hardware(Software).
Market BL Lead # Win% Sales Logit Cox TKL Alg.1
New HW 200K 15.7% .608 .675 .678 .617 .707
New SW 100K 12.5% .586 .659 .661 .614 .701
Mature HW 200K 19.5% .665 .711 .718 .687 .741
Mature SW 150K 14.8% .649 .703 .709 .671 .732
1950
Table 3: AUC on interference test data 2013Q4. Li, L.; Deng, H.; Dong, A.; Chang, Y.; and Zha, H.
2014. Identifying and labeling search tasks via query-based
Market BL Lead # Win% Sales Logit Cox TKL Alg.1
hawkes processes. In Proceedings of the 20th ACM SIGKDD
New HW 200K 18.3% .628 .681 .680 .612 .729 Conference on Knowledge Discovery and Data Mining.
New SW 100K 15.1% .618 .672 .664 .604 .715
Mature HW 200K 21.3% .689 .727 .721 .693 .751 Liniger, T. J. 2009. Multivariate hawkes processes. PhD
Mature SW 150K 18.2% .680 .731 .712 .689 .749
thesis, Swiss Federal Institute Of Technology, Zurich.
Linoff, G. S., and Berry, M. J. A. 2011. Data mining
techniques: For marketing, sales, and customer relationship
Conclusion management. Indianapolis, IN, USA: Wiley Publishing.
We have presented a modern machine learning method for Mohler, G. O.; Short, M. B.; Brantingham, P. J.; Schoen-
sales pipeline win prediction, which has been deployed in berg, F. P.; and Tita, G. E. 2011. Self-exciting point process
a multinational Fortune 500 B2B-selling company. The pro- modeling of crime. Journal of the American Statistical As-
posed method is applicable to other real-world problems due sociation 106(493).
to its generality and flexibility as discussed in the paper. We Ogata, Y. 1988. Statistical models for earthquake occur-
hope this paper can timely raise the wide attentions from in- rences and residual analysis for point processes. J. Amer.
dustries as selling is essential to most business companies. Statist. Assoc. 83(401):927.
Ogata, Y. 1998. Space-time point-process models for earth-
References quake occurrences. Annals of the Institute of Statistical
Chen, C.-Y.; Lee, W.-I.; Kuo, H.-M.; Chen, C.-W.; and Mathematics 50:379402.
Chen, K.-H. 2010. The study of a forecasting sales model Ozaki, T. 1979. Maximum likelihood estimation of hawkes
for fresh food. Expert Systems with Applications. self-exciting point processes. Annals of the Institute of Sta-
Cox, D. R., and Oakes, D. 1984. Analysis of survival data, tistical Mathematics 31(1):145155.
volume 21. CRC Press. Rubin, I. 1972. Regular point processes and their detection.
Daley, D. J., and Vere-Jones, D. 1988. An introduction to Information Theory, IEEE Transactions on 18(5):547557.
the theory of point processes, volume 2. Springer. Shivaswamy, P. K.; Chu, W.; and Jansche, M. 2007. A sup-
Ertekin, S.; Rudin, C.; and McCormick, T. H. 2013. Re- port vector approach to censored targets. In ICDM.
active point processes: A new approach to predicting power Tian, Y.; Yan, J.; Zhang, H.; Zhang, Y.; Yang, X.; and Zha,
failures in underground electrical systems. H. 2012. On the convergence of graph matching: Graduated
assignment revisited. In ECCV.
Hawkes, A. G. 1971. Spectra of some self-exciting and
mutually exciting point processes. Biometrika. Varshney, K. R., and Singh, M. 2013. Dose-response signal
estimation and optimization for salesforce management. In
Hunter, D. R., and Lange, K. 2004. A tutorial on mm algo-
SOLI.
rithms. The American Statistician 58(1):3037.
Weber, T. A., and Chehrazi, N. 2012. Dynamic valuation
Kawas, B.; Squillante, M. S.; Subramanian, D.; and Varsh- of delinquent credit-card accounts. Technical report, EPFL-
ney, K. R. 2013. Prescriptive analytics for allocating sales CDM-MTEI.
teams to opportunities. In ICDM Workshop.
White III, C. C., and White, D. J. 1989. Markov deci-
Kober, J., and Peters, J. 2012. Reinforcement learning in sion processes. European Journal of Operational Research
robotics: A survey. In Reinforcement Learning. Springer. 39(1):116.
579610.
Yan, J.; Tian, Y.; Zha, H.; Yang, X.; Zhang, Y.; and Chu,
Lawrence, R.; Perlich, C.; Rosset, S.; Khabibrakhmanov, I.; S. 2013a. Joint optimization for consistent multiple graph
Mahatma, S.; Weiss, S.; Callahan, M.; Collins, M.; Ershov, matching. In ICCV.
A.; and Kumar, S. 2010. Operations research improves sales
Yan, J. C.; Wang, Y.; Zhou, K.; Huang, J.; Tian, C. H.; Zha,
force productivity at ibm. Interface 40(1):3346.
H. Y.; and Dong, W. S. 2013b. Towards effective prioritizing
Lewis, E., and Mohler, E. 2011. A nonparametric em algo- water pipe replacement and rehabilitation. In IJCAI.
rithm for multiscale hawkes processes. Journal of Nonpara- Yan, J. C.; Li, Y.; Liu, W.; Zha, H. Y.; Yang, X. K.; and Chu,
metric Statistics. S. M. 2014. Graduated consistency-regularized optimization
Lewis, E.; Mohler, G.; Brantingham, P. J.; and Bertozzi, A. for multi-graph matching. In ECCV.
2010. Self-exciting point process models of insurgency in Zhou, K.; Zha, H.; and Song, L. 2013a. Learning so-
iraq. UCLA CAM Reports 10 38. cial infectivity in sparse low-rank networks using multi-
Li, L., and Zha, H. 2013. Dyadic event attribution in so- dimensional hawkes processes. In AISTATS.
cial networks with mixtures of hawkes processes. In CIKM, Zhou, K.; Zha, H.; and Song, L. 2013b. Learning triggering
16671672. ACM. kernels for multi-dimensional hawkes processes. In ICML.
Li, L., and Zha, H. 2014. Learning parametric models for Zliobaite, I.; Bakker, J.; and Pechenizkiy, M. Towards con-
social infectivity in multi-dimensional hawkes processes. In text aware food sales prediction. In ICDMW09.
Twenty-Eighth AAAI Conference on Artificial Intelligence.
1951