Вы находитесь на странице: 1из 53

+

Week 4 – Predicting Loan Repayment and


Customer Churn with Logistic Regression and
LDA IEOR 242 – Applications in Data Analysis
Spring 2020 – George Ng IEOR 242, Spring 2020 - Week 4
+ 2

Week 3 Recap
n If dependent variable is continuous à regression such as linear
regression

n If dependent variable is binary (i.e. yes/no) à classification such as


logistic regression

n Business objectives will inform which performance measures are most


relevant

n Many statistical measures to assess performance


n Each related to business performance but not perfect match
n Often very difficult to clearly translate objective in to a tractable optimization
problem à General model quality correlated with business objectives…

n Many different modelling approaches (and we’ll discuss many more) –


each with pros and cons, work best under certain conditions

IEOR 242, Spring 2020 - Week 4


+ 3

Annoucements

n Final Project Groups – Need to Finalize ASAP

n Project topic requires review from Instructor or GSI


– one pager due March 9

IEOR 242, Spring 2020 - Week 4


+ 4

Today’s Agenda

n Guest Lecture on Risk Modelling for Cyber Perils

n Predicting “Churn”

n ROC Analysis

n Linear Discriminant Analysis

IEOR 242, Spring 2020 - Week 4


+ 5

Customer Retention/Churn in Telecom


Companies

n US telecom market is pretty much saturated

n It is a lot harder to acquire a new customer than it


is to retain an existing customer

n Cost of acquiring a new customer is typically five


times the cost of retaining a customer

IEOR 242, Spring 2020 - Week 4


+ 6

Churn Rates of the Top Four Carriers

Q3 2015 2015
Carrier Monthly Churn Annualized Churn

Verizon 1.21% 13.6%

AT&T 1.33% 14.8%

Sprint Nextel 2.76% 28.5%

T-Mobile 2.41% 25.4%

Source: FierceWireless

IEOR 242, Spring 2020 - Week 4


+ 7

Customer Retention

n Call Comcast at end of contract and request a


discount for re-upping

n Often they’ll offer a discount instantly! (such as a


$100 Visa gift card!)

IEOR 242, Spring 2020 - Week 4


+ 8

Price Reduction Decision Tree


n The Marketing division has determined that a 20% price reduction halves
the risk of churn

n If the customer churns, the company makes $0 from this customer in the future

n If the customer does not churn, the company makes $1000 (under no price
reduction) or $800 (if they give the price reduction)

Churn $0
p/2 Elprofit I 0
Reduce Price
No Churn 80011 E
1- p/2 $800
Don’t Reduce Price
Churn $0
p Edprofit p 0
No Churn
1- p
100011
p
$1000
IEOR 242, Spring 2020 - Week 4
+ 9

Identifying High-Risk Customers


n A customer is a high churn-risk if we expect to make money from
a price reduction (corresponds to a high probability of churn)

n A customer is a low churn-risk if we expect to lose money from a


price reduction (corresponds to a low probability of churn)

n From the decision tree, our expected profit with a price reduction
is 800 x ( 1 - p/2 ) and our expected profit with no price reduction
is 1000 x ( 1 – p ) 800 1
I 100011
p

n Using arithmetic, the break-even point is p = 0.333
n Customers with churn probability above 0.333 are dubbed
high churn-risk
n Customers with churn probability below 0.333 are dubbed low
churn-risk

n We will only offer price reductions to high churn-risk customers


IEOR 242, Spring 2020 - Week 4
+ 10

Churn, continued

n More broadly, this is about Customer


Relationship Management (CRM)

n Approaches to limit churn


n Special offers, etc.

n Our dataset:
nA telecom working with Watson Analytics
n Data on whether customers churn in a given year

IEOR 242, Spring 2020 - Week 4


+
Customer Retention Data

IEOR 242, Spring 2020 - Week 4 11


+ 12

Customer Retention Data

n Dependent variable: Whether the customer


left after their contract (1 or 0)
n Independent variables:
n Monthly charges ($)
n Is the customer a senior citizen? (1 or 0)
n Payment method for account (electronic check,
mailed check, bank transfer, credit card)
n Internet service (DSL, Fiber, or None)
n Months as a customer
n Contract type (month-to-month, 1-year, or 2-year)

IEOR 242, Spring 2020 - Week 4


+ 13

Customer Retention Data


p = # of independent variables (p=6); n = # of observations (n=7032)
Churn MonthlyCharges SeniorCitizen PaymentMethod InternetService tenure Contract
1 0 29.85 0 Electronic check DSL 1 Month-to-month
2 0 56.95 0 Mailed check DSL 34 One year
3 1 53.85 0 Mailed check DSL 2 Month-to-month
4 0 42.30 0 Bank transfer DSL 45 One year
5 1 70.70 0 Electronic check Fiber optic 2 Month-to-month
6 1 99.65 0 Electronic check Fiber optic 8 Month-to-month
7 0 89.10 0 Credit card Fiber optic 22 Month-to-month
8 0 29.75 0 Mailed check DSL 10 Month-to-month
9 1 104.80 0 Electronic check Fiber optic 28 Month-to-month
10 0 56.15 0 Bank transfer DSL 62 One year
n=7032 11
12
0
0
49.95
18.95
0
0
Mailed check
Credit card
DSL
No
13 Month-to-month
16 Two year
13 0 100.35 0 Credit card Fiber optic 58 One year
14 1 103.70 0 Bank transfer Fiber optic 49 Month-to-month
15 0 105.50 0 Electronic check Fiber optic 25 Month-to-month
16 0 113.25 0 Credit card Fiber optic 69 Two year
... ... ... ... ... ... ... ...
7028 0 84.80 0 Mailed check DSL 24 One year
7029 0 103.20 0 Credit card Fiber optic 72 One year
7030 0 29.60 0 Electronic check DSL 11 Month-to-month
7031 1 74.40 1 Mailed check Fiber optic 4 Month-to-month
7032 0 105.65 0 Bank transfer Fiber optic 66 Two year

IEOR 242, Spring 2020 - Week 4


+ 14

Training and Testing Set (Random Split)

Churn MonthlyCharges ... Contract 70% Training Set (n=4,922)


1 0 29.85 ... Month-to-month
2 0 56.95 ... One year Churn MonthlyCharges ... Contract
3 1 53.85 ... Month-to-month 1 0 29.85 ... Month-to-month
4 0 42.30 ... One year 2 0 56.95 ... One year
5 1 70.70 ... Month-to-month 3 1 53.85 ... Month-to-month
6 1 99.65 ... Month-to-month 5 1 70.70 ... Month-to-month
7 0 89.10 ... Month-to-month 6 1 99.65 ... Month-to-month
8 0 29.75 ... Month-to-month 7 0 89.10 ... Month-to-month
9 1 104.80 ... Month-to-month 8 0 29.75 ... Month-to-month
10 0 56.15 ... One year 10 0 56.15 ... One year
11 0 49.95 ... Month-to-month 11 0 49.95 ... Month-to-month
12 0 18.95 ... Two year 13 0 100.35 ... One year
13 0 100.35 ... One year ... ... ... ... ...
... ... ... ... ... 7031 1 74.40 ... Month-to-month
7031 1 74.40 ... Month-to-month
7032 0 105.65 ... Two year 30% Testing Set (n=2,110)
Churn MonthlyCharges ... Contract
Full Dataset (n=7,032) 4 0 42.30 ... One year
9 1 104.80 ... Month-to-month
12 0 18.95 ... Two year
... ... ... ... ...
7032 0 105.65 ... Two year

IEOR 242, Spring 2020 - Week 4


+ 15

Logistic Regression for Churn


Probability Prediction
n The dependent variable is Bernoulli random
variable (think success or failure): Y = 0 or 1
n Y = 1 if “success” – customer churns
n Y = 0 if “failure” – customer does not churn

n How do we deal with variables like


PaymentMethod and InternetService?

n We will walk through training this model in R


during the discussion section

IEOR 242, Spring 2020 - Week 4


+ 16

Logistic Regression Output


Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.505894 0.208913 -2.422 0.015454 *
MonthlyCharges 0.001934 0.003601 0.537 0.591318
SeniorCitizen 0.337874 0.097374 3.470 0.000521 ***
PaymentMethodCredit card -0.217340 0.135296 -1.606 0.108186
PaymentMethodElectronic check 0.313616 0.110324 2.843 0.004474 **
PaymentMethodMailed check -0.106306 0.131246 -0.810 0.417954
InternetServiceFiber optic 0.979730 0.153601 6.378 1.79e-10 ***
InternetServiceNo -0.798349 0.178696 -4.468 7.91e-06 ***
tenure -0.032528 0.002569 -12.660 < 2e-16 ***
ContractOne year -0.701521 0.125113 -5.607 2.06e-08 ***
ContractTwo year -1.616726 0.213029 -7.589 3.22e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)


glm.fit <- glm(Churn ~ MonthlyCharges + SeniorCitizen + …, data = churnData.train, family = ‘binomial’)
Null deviance: 5699.5 on 4921 degrees of freedom
summary(glm.fit)
Residual deviance: 4208.7 on 4911 degrees of freedom
AIC: 4230.7

Number of Fisher Scoring iterations: 6

IEOR 242, Spring 2020 - Week 4


+ 17

Logistic Regression Output after


removing Monthly Charges
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.414419 0.120689 -3.434 0.000595 ***
SeniorCitizen 0.336613 0.097339 3.458 0.000544 ***
PaymentMethodCredit card -0.217518 0.135288 -1.608 0.107875
PaymentMethodElectronic check 0.314519 0.110313 2.851 0.004356 **
PaymentMethodMailed check -0.107838 0.131221 -0.822 0.411186
InternetServiceFiber optic 1.046986 0.089274 11.728 < 2e-16 ***
InternetServiceNo -0.856982 0.141254 -6.067 1.30e-09 ***
tenure -0.032076 0.002425 -13.227 < 2e-16 ***
ContractOne year -0.693253 0.124135 -5.585 2.34e-08 ***
ContractTwo year -1.604812 0.211843 -7.575 3.58e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null
glm.fit <- deviance:
glm(Churn 5699.5 on+ PaymentMethodCredit
~ SeniorCitizen 4921 degrees of freedom
+ …, data = churnData.train, family = ‘binomia
Residual deviance: 4209.0 on 4912 degrees of freedom
summary(glm.fit)
AIC: 4229

Number of Fisher Scoring iterations: 6

IEOR 242, Spring 2020 - Week 4


+ 18

Logistic Regression Output after also


removing PaymentMethodCredit card
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.520885 0.083921 -6.207 5.40e-10 ***
SeniorCitizen 0.338071 0.097318 3.474 0.000513 ***
ElectronicCheck 0.419207 0.080747 5.192 2.08e-07 ***
InternetServiceFiber optic 1.048240 0.088697 11.818 < 2e-16 ***
InternetServiceNo -0.855461 0.139527 -6.131 8.72e-10 ***
tenure -0.032051 0.002367 -13.542 < 2e-16 ***
ContractOne year -0.693605 0.124083 -5.590 2.27e-08 ***
ContractTwo year -1.607798 0.211735 -7.593 3.11e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 5699.5 on 4921 degrees of freedom


Residual deviance: 4211.6 on 4914 degrees of freedom
AIC:<-4227.6
glm.fit glm(Churn ~ SeniorCitizen + ElectronicCheck + …, data = churnData.train, family = ‘binomial’)
summary(glm.fit)
Number of Fisher Scoring iterations: 6

IEOR 242, Spring 2020 - Week 4


+ 19

Price Reduction Decision Tree


n The Marketing division has determined that a 20% price reduction halves
the risk of churn

n If the customer churns, the company makes $0 from this customer in the future

n If the customer does not churn, the company makes $1000 (under no price
reduction) or $800 (if they give the price reduction)

Churn $0
p/2
Reduce Price
No Churn
1- p/2 $800
Don’t Reduce Price
Churn $0
p
No Churn
1- p
$1000
IEOR 242, Spring 2020 - Week 4
+ 20

Using the Logistic Regression


Model
n Recall that the break-even point is p = 0.333
n Customers with churn probability above 0.333 are
dubbed high churn-risk
n Customers with churn probability below 0.333 are
dubbed low churn-risk

n Let’s use the logistic regression model to identify


high churn-risk and low churn-risk customers in
the test set

IEOR 242, Spring 2020 - Week 4


+ 21

Identifying Churn-Risk Customers in


the Test Set
Churn-Risk: p = Pr(Yi = 1) > 0.333
Churn SeniorCitizen ... Contract Pr(Yi = 1) Churn-Risk
4 0 0 ... One year 0.066 0
9 1 0 ... Month-to-month 0.512 1
12 0 0 ... Two year 0.029 0
15 0 0 ... Month-to-month 0.536 1
28 1 0 ... Month-to-month 0.467 1
29 0 0 ... Two year 0.012 0
30 1 0 ... Month-to-month 0.256 0
37 1 0 ... Month-to-month 0.687 1
38 0 0 ... Month-to-month 0.279 0
39 1 0 ... Month-to-month 0.464 1
48 1 0 ... Month-to-month 0.707 1
49 0 0 ... Two year 0.022 0
52 1 0 ... Month-to-month 0.512 1
... ... ... ... ... ... ...
7017 0 0 ... Month-to-month 0.281 0
7021 0 1 ... One year 0.067 0
7025 0 0 ... Month-to-month 0.480 1
7030 0 0 ... Month-to-month 0.388 1
7032 0 0 ... Two year 0.039 0

IEOR 242, Spring 2020 - Week 4


+ 22

Confusion Matrix
1238 + 418
Accuracy = = 0.785
1238 + 311 + 143 + 418

T rue P ositives 418


T rue P ositive Rate (T P R) = = = 0.745
All P ositives 418 + 143

F alse P ositives 311


F alse P ositive Rate (F P R) = = = 0.201
All N egatives 1238 + 311

Model
Low Churn-Risk High Churn-Risk
Pr(Yi = 1) < 0.333 Pr(Yi = 1) > 0.333
Reality No Churn (Yi = 0) 1,238 311
Churn (Yi = 1) 143 418

IEOR 242, Spring 2020 - Week 4


+ 23

Many Customer Retention


Inducements
n The Marketing Division has many different customer
retention inducements available to them:
n Smaller/larger discounts
n Free services (with limited time offer)
n Options focused only on certain customers (e.g. fiber optic
customers, or senior citizens, or …)

n Each option would require its own decision tree, break-


even threshold calculation, and confusion matrix

n Let us now see how we can assess how well the model
performs across all break-even thresholds

IEOR 242, Spring 2020 - Week 4


+
Receiver Operating
Characteristic (ROC) Analysis

IEOR 242, Spring 2020 - Week 4 24


+ 25

Receiver Operating Characteristic


(ROC) Analysis
n The ROC curve captures the trade-off between
false positive rates and true positive rates as the
break-even threshold used to make the confusion
matrix is varied

n Some history: The ROC curve was first used in


World War II for the analysis of radar signals for
aircraft. In 1950s ROC curves were used in signal
detection more broadly

IEOR 242, Spring 2020 - Week 4


+ 26

Logistic Model with break-even


threshold p = 0.333
Model
Low Churn-Risk High Churn-Risk
Pr(Y = 1) < 0.333 Pr(Y = 1) > 0.333
Reality No Churn (Y = 0) 1,238 311
Churn (Y = 1) 143 418

1.00

TPR = 0.745

True Positive Rate


0.75

FPR = 0.201
0.50

0.25

0.00
0.00 0.25 0.50 0.75 1.00
False Positive Rate

IEOR 242, Spring 2020 - Week 4


+ 27

Logistic Model with break-even


threshold p = 0.50
Model
Low Churn-Risk High Churn-Risk
Pr(Y = 1) < 0.50 Pr(Y = 1) > 0.50
Reality No Churn (Y = 0) 1,410 139
Churn (Y = 1) 278 283

1.00

TPR = 0.504

True Positive Rate


0.75

0.50
FPR = 0.090
0.25

0.00
0.00 0.25 0.50 0.75 1.00
False Positive Rate

IEOR 242, Spring 2020 - Week 4


+ 28

Logistic Model with break-even


threshold p = 0.99
Model
Low Churn-Risk High Churn-Risk
Pr(Y = 1) < 0.99 Pr(Y = 1) > 0.99
Reality No Churn (Y = 0) 1,549 0
Churn (Y = 1) 561 0

1.00

TPR = 0.000

True Positive Rate


0.75

0.50

FPR = 0.000
0.25

0.00
0.00 0.25 0.50 0.75 1.00
False Positive Rate

IEOR 242, Spring 2020 - Week 4


+ 29

Logistic Model with break-even


threshold p = 0.10
Model
Low Churn-Risk High Churn-Risk
Pr(Y = 1) < 0.10 Pr(Y = 1) > 0.10
Reality No Churn (Y = 0) 724 825
Churn (Y = 1) 31 530

1.00

TPR = 0.945

True Positive Rate


0.75

0.50
FPR = 0.533
0.25

0.00
0.00 0.25 0.50 0.75 1.00
False Positive Rate

IEOR 242, Spring 2020 - Week 4


+ 30

Logistic Model with break-even


threshold p = 0.001
Model
Low Churn-Risk High Churn-Risk
Pr(Y = 1) < 0.001 Pr(Y = 1) > 0.001
Reality No Churn (Y = 0) 0 1,549
Churn (Y = 1) 0 561

1.00

TPR = 1.00

True Positive Rate


0.75

FPR = 1.00
0.50

0.25

0.00
0.00 0.25 0.50 0.75 1.00
False Positive Rate

IEOR 242, Spring 2020 - Week 4


+ 31

ROC Curve
n The ROC curve plots the TPR and FPR for every
break-even threshold p between 0.0 and 1.0

n ROC curve can also be drawn for LDA, and for any
method that predicts probabilities
1.0
0.8
True positive rate
0.6
0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

IEOR 242, Spring 2020 - Week 4


+ 32

The Naïve Baseline Model


n Naïve baseline model:
n Choose a value of p between 0.0 and 1.0
n Then classify every observation as Success with probability p
and as Failure with probability 1-p

n This model does not consider any of the data


n (Analogous to the naïve baseline of the linear regression model)

n Naïve baseline model has true positive rate p, because p


proportion of the positive observations will be correctly
classified as positive

n Naïve model has false positive rate p, because p proportion of


the negative observations will be incorrectly classified as
positive

n What is the accuracy of the naïve baseline model?

IEOR 242, Spring 2020 - Week 4


+ 33

Naïve Baseline Model with p = 0.20

n TPR = 0.2

1.0
n FPR = 0.2

0.8
True positive rate
0.6
0.4
0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

IEOR 242, Spring 2020 - Week 4


+ 34

Naïve Baseline Model with p = 0.50

n TPR = 0.5

1.0
n FPR = 0.5

0.8
True positive rate
0.6

0.4
0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

IEOR 242, Spring 2020 - Week 4


+ 35

Naïve Baseline Model with p = 0.80

n TPR = 0.8

1.0
n FPR = 0.8

0.8

True positive rate


0.6

0.4
0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

IEOR 242, Spring 2020 - Week 4


+ 36

ROC Curve and Naïve Baseline Curve

1.0
0.8
True positive rate
0.6
0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

IEOR 242, Spring 2020 - Week 4


+ 37

Area Under the ROC Curve (AUC)

n We want ROC curves that simultaneously achieve a high


TPR and low FPR
n This corresponds to a high area under the ROC curve,
which is called the AUC (Area Under the Curve)
n Maximum AUC: 1.000
n AUC of our model: 0.841
n AUC of naïve baseline: 0.500
n Similar in spirit to R2 (and OSR2), the AUC is a unit-free
measure of model quality that is used to evaluate the
overall quality of a logistic regression model

IEOR 242, Spring 2020 - Week 4


+ 38

Interpreting the AUC

n Interpretation: given a randomly selected


positive observation (customer who churned)
and a randomly selected negative observation
(customer who did not churn), AUC is the
likelihood that the model would correctly
differentiate which is which
n That is, it is the likelihood that the model would assign
a higher churn probability to the customer who
churned

n AUC measures the model’s discriminative ability

IEOR 242, Spring 2020 - Week 4


+
Linear Discriminant Analysis

IEOR 242, Spring 2020 - Week 4 39


+ 40

Linear Discriminant Analysis

n Do we really need another model for predicting


probabilities?
n Yes and no…

n Typically logistic regression is preferred

n Linear discriminant analysis has some advantages


in certain situations
n LDA is popular for multiclass classification – when there
are more than 2 response categories
n LDA is more “stable” in certain situations

IEOR 242, Spring 2020 - Week 4


+ 41

Linear Discriminant Analysis

n Dependent variable can take on one of K possible


values Y 2 {1, . . . , K}

n Model:
n Assume that we know the prior marginal probability
distribution of Y :
⇡k = Pr(Y = k)
n Given that Y = k, we know the distribution of the feature
vector X
n (Say X has density function fk (x) )

IEOR 242, Spring 2020 - Week 4


+ 42

Linear Discriminant Analysis

n LDA assumes that the data was generated


according to the following “generative process”:
n Sample Y according to probabilities (⇡1 , ⇡2 , . . . , ⇡K )
n Given Y = k , sample X from density function fk (x)

n This process is repeated for each data point

n Note that this process “flips around” how


prediction works – we are usually given X and
asked to predict Y. Pr(X|Y=k)
n Makes sense in some situations, e.g., Y corresponds to a
person’s biological sex assigned at birth

IEOR 242, Spring 2020 - Week 4


+ 44

Linear Discriminant Analysis

n Linear Discriminant Analysis makes the following


additional assumptions:
n Given that Y = k, X is normally distributed with mean
vector µk
n Regardless of which class we belong to (i.e., the value of Y),
X has the same variance/correlations (covariance matrix)

n When p = 1, these assumptions are:


n Given Y = k, X ⇠ N (µk ,
2
) ( does not depend on k)
n In other words, ✓ ◆
1 1
fk (x) = p exp 2
(x µk ) 2
2⇡ 2
IEOR 242, Spring 2020 - Week 4
+ 45

Where does the “Linear” come


from?
n Again, assume p = 1, and given Y = k, X ⇠ N (µk , 2
)

n Define the discriminant functions:

n Then, if we are given X = x, choosing k to maximize:


⇡k fk (x)
Pr(Y = k|X = x) = PK
l=1 ⇡l fl (x)
n is the same as choosing k to maximize

IEOR 242, Spring 2020 - Week 4


+ 46

Linear Discriminant Analysis,


Estimating the Parameters
n What are the parameters of the model?

n How do we estimate the parameters?

IEOR 242, Spring 2020 - Week 4


+ 47

Linear Discriminant Analysis,


Estimating the Parameters
n Use maximum likelihood estimation (formulas in
this case) to estimate:
n i) (⇡1 , ⇡2 , . . . , ⇡K ) (marginal probabilities)
n ii) µ = (µ1 , . . . , µK ) (in class means)
n iii) (common variance)

IEOR 242, Spring 2020 - Week 4


+ 48

Linear Discriminant Analysis,


Estimating the Parameters
n Use maximum likelihood estimation (formulas in
this case) to estimate:
n i) (⇡1 , ⇡2 , . . . , ⇡K ) (marginal probabilities)
n ii) µ = (µ1 , . . . , µK ) (in class means)
n iii) (common variance)

n This leads to somewhat intuitive formulas:


n ( observations belong to class k)
n (within class sample averages)

n (weighted average of sample


variances with bias correction)
IEOR 242, Spring 2020 - Week 4
+ 51

Linear Discriminant Analysis

5
4
np =1

3
2
1
0
−4 −2 0 2 4 −3 −2 −1 0 1 2 3 4

np =2

x2

x2
x1 x1

IEOR 242, Spring 2020 - Week 4


+ 52

Linear Decision Boundaries when


p = 2 and K = 3
n LDA fits a linear decision boundary
4

4
2

2
X2

X2
0

0
−2

−2
−4

−4

−4 −2 0 2 4 −4 −2 0 2 4

X1 X1

IEOR 242, Spring 2020 - Week 4


+ 54

LDA vs. Logistic Regression

n Sometimes logistic regression can perform poorly (for


example, if the two classes are “well separated” or
sometimes if n is small)

n Often both methods perform similarly

n Usually if this is the case, logistic regression is


preferred
n Easier to interpret
n Nice statistical properties

n Confusion matrices, ROC curves, and related concepts


all apply to LDA as well

IEOR 242, Spring 2020 - Week 4


+ 55

LDA vs. Logistic Regression for


Churn Prediction
1.0
0.8

n Logistic = blue
True positive rate

0.6

n LDA = red
0.4

n Similar performance
like this is typical
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

IEOR 242, Spring 2020 - Week 4


+ 56

Analytics for Customer Retention

n Churn rates in telecom are generally going down

n Predictiveanalytics is used extensively in


managing customer retention
2008 2015
Carrier Annualized Churn Annualized Churn

Verizon Postpaid: 15.6% Postpaid: 13.6%

AT&T Postpaid: 18.4% Postpaid: 14.8%

Sprint Nextel Postpaid: 23.3% Postpaid: 22.7%


Prepaid: 65.2% Prepaid: 39.0%
T-Mobile Overall: 43.9% Postpaid: 15.5%
Prepaid: 42.1%
IEOR 242, Spring 2020 - Week 4
+ 57

n Some of the figures in this presentation are taken


from “An Introduction to Statistical Learning, with
applications in R” (Springer, 2013) with permission
from the authors: G. James, D. Witten, T. Hastie and
R. Tibshirani

IEOR 242, Spring 2020 - Week 4

Вам также может понравиться