Вы находитесь на странице: 1из 2

Comparing Methods Assignment

This assignment will be completed in teams of students.


(Total: 30 pts).

Introduction
The purpose of this assignment is to demonstrate your knowledge and understanding of the analytical
techniques and tools learned in the course and to show your understanding of how it relates to a business
scenario. This assignment is somewhat different from previous ones: I do not give you very detailed
instructions on how to build your analytical process in RapidMiner. Instead, you are expected to do the
modeling, validation and performance analysis on the given dataset so you could answer the questions below
and make some recommendations in the business situation as it applies.

Submission Instruction
Perform the necessary tasks using RapidMiner, answer the questions below and prepare the required
screenshots.
Submit 2 files:
- a Word document file with the answers and screenshots to the lettered questions. (Make sure that the
lettering of questions stays the same!) Place the team member names on the top of the document. Name
your file Comparing Methods Assignment LastName1-LastName2 .docx. (Warning: for full points,
make sure that you name documents correctly and keep the answers correctly numbered lettered.)
- the RapidMiner project file, named Comparing Methods Assignment LastName1-LastName2 .rmp.
(The project file can be generated from RapidMiner by going to File -> Export Process. Select the
destination folder and the name for the file. It will be saved as a .rmp file.)

Instructions
Download the mobile-churn.csv file posted on Canvas. The file contains a dataset collected by a phone
company about attrition, in other words, about customers who cancelled their services and possibly signed up
with another company. The company is interested in what it could do to keep customers, to prevent their
defection. Look at the data and make some recommendations based on the findings of your analysis.
Here is the explanation of the variables in the dataset:
a. Gender_Female: female or not
b. PhoneService_Yes: whether the customer has phone service with the company
c. MultipleLines_Yes: whether the customer has multiple line service
d. InternetService_DSL: whether the customer has DSL internet
e. InternetService_Fiber optic: whether the customer has Fiber optic internet
f. StreamingTV_Yes: customer streams TV
g. StreamingMovies_Yes: customer streams movies
h. Contract_One year: type of contract for customer: 1 yr
i. Contract_Two year: type of contract the customer: 2 yr
j. PaperlessBilling_Yes: whether the customer signed up for paperless billing
k. PaymentMethod_ Automatic: payment set up to be automatic
l. Retired: 0 for not, 1 for yes
m. Tenure (months): how long has been a customer with the company
n. MonthlyCharges: $ amount of monthly payments for the subscribed services
o. Churn: Whether the customer churned (i.e. is not a customer any more)
1. As a first step, build 3 models using different classification techniques (Neural Net: use the default
settings; Decision Tree: use gini_index as the criterion; and Logistic Regression: use the default
settings) that are capable of classifying customers into 2 categories (churn/no churn.) Use the X-
validation operator right away for each techniques used. Set the number of folds to 3 (it will result in
shorter process runtimes). For measuring the performance of the 3 models, look at the following
performance measures: Accuracy, Kappa, Lift, F-measure, AUC (NOT the optimistic or pessimistic).
(Hint: use the binomial classification performance operator to obtain all of these measures.)

Make 3 readable screenshots of the following for all 3 models (9 screenshots; 9pts):

- Top level processes


- Parameter settings for the 3 different techniques that are inside the cross validation operator
- Appropriate model results (Network, Tree, Weights)

2.
a. Make a screenshot of the confusion matrix output for each of the 3 methods. (3pts)
b. Prepare a table to report the 5 performance measures for the 3 models. Put the different models
in the rows and have 5 columns for the 5 measures (3pts)

Accuracy Kappa Lift F AUC


NN
DT
LR

c. Discuss the performance for each of the three models based on the performance measures.
Relate the performances to the baseline model (calculate the a priori probabilities first!). (3pts)
Prepare a visual evaluation of the 3 models by including a screenshot of the ROC comparison
chart. (Hint: Use the Compare ROC operator. Have the same models with the same parameters
as in the other runs above.)(2pts)
d. Using the observed performance measures, compare the performance of the 3 models. Do they
perform the same? Which one is better, worse, why? (2pts)
e. Are the 3 models giving you more or less the same suggestions regarding the important
factors/variables? If there are differences, what are they? (2pts)

3. Choose one of the models (possibly the best performing one) and address the following questions: How
can you interpret the results of the model? Which attributes seem to matter the most? How do you
know it? Discuss their importance and/or effect sizes. (3pts)

4. How could the results of the model be useful for the telecommunications company? What business
recommendations can be suggested based on the results? (3pts)

Вам также может понравиться