Вы находитесь на странице: 1из 8

Riley Brown

Individual Assignment 4
5/20/19

1. Revisiting the Conjoint Study for All-in-one Printers (3 Points)


Recall the Conjoint Analysis survey on All-in-one printers from Individual Assignment 1. Also
recall, that in that survey, ratings data (on a scale of 0 - 100) for the 16 profiles were collected
from 35 respondents. The relative attribute importance for the respondents, resulting from
their utility functions, are provided to you in the file, Printers_RAI.csv. Use this data to run a
Cluster Analysis (Don’t standardize the variables). Find k-means segments based on individuals’
relative attribute importance. Consider the three-segment solution. Describe the three
segments in plain English.

Based on the k-means segments, Segment 1 finds brand attributes and fast
speeds to be very important. They do not care much if there are wireless capabilities.
Segment 2 does not find price or speed to be of high importance. However,
wireless capabilities are very important to this segment.
Segment 3 places low importance on brand attributes but finds the price of the
printers to be very important.

Relevant Code:
2. Revisiting Consumers for the ConneCtor PDA (4 Points)

Recall the ConneCtor PDA problem from Individual Assignment 3. Suppose, based on the Needs of
the respondents in the survey (X1 – X10), they were divided into two broad segments. The segment
memberships of the respondents, and their demographic information (Z1 – Z5), are provided in the
file PDA_2seg.csv. Discuss how well you can use the demographic variables to discriminate between
the two segments (estimate a suitable binary logistic regression model to predict consumers’
segment membership using the Stepwise variable selection approach and report the classification
accuracy). Interpret the estimated model. Note: Split the data set into a ‘Training Sample’ (70%) and
a ‘Testing Sample’ (30%). Note: Use any five of the six occupation category dummies as potential
inputs in your regression model.

Our model will discriminate between the two segments with 68.89% accuracy. The difference
between the testing dataset accuracy and the training dataset accuracy could be due to the samples
being biased, since overfitting does not seem to be the cause of the discrepancy.

Based on using the Stepwise variable selection approach our estimated model is: V = 1.7835 –
1.5474*(Z2) – 1.9457*(Z5_3) – 0.4933*(Z4), where Z2 is education level (where 1=high school,
2=some college, 3=college graduate, 4=graduate degree), Z5_3 is someone who works in sales, and
Z4 is amount of time spent away from the office (ranked on a scale of 1-7).

This model means that education level decreases by 1 unit, the odds they customer is in
segment 1 increases by a factor of 0.213. Furthermore, when a customer does not have an
occupation in sales (decreases from 1 to 0), the odds they are in segment 1 increased by a factor of
0.143. Lastly, when a customer’s amount of time away from the office increases by 1, the odds they
are in segment 1 increases by a factor 1.638.

Relevant Code:
3. Identifying Loyal Customers (5 Points)
Dermaglow a skincare spa chain wants to be able to identify its loyal customers (Those who visit
only Dermaglow for their skincare needs). They have data for 348 customers who are known to
be either loyal or not loyal. The data is available in the file, Dermaglow.csv. It includes the
following variables:
Loyal: Coded as 1 if customer is known to be loyal and 0 if customer is not loyal
Avg.spent: Avg. amount spent per visit by the customer at a Dermaglow spa (in $)
Intervisit.time: Average time interval (in weeks) between visits to a Dermaglow spa
Mincome: Monthly income of customer (in $1000s)
Rating: Customer’s satisfaction rating for Dermaglow (on a scale of 1-100)

Note: Split the data set into a ‘Training Sample’ (70%) and a ‘Testing Sample’ (30%).
a. Estimate a binary logistic regression model to predict Dermaglow’s loyal customers using
all input variables in the model. Report the classification accuracy in the Training and
Testing samples.

The estimated regression using all input variables is: V = -10.933042 +


0.002689*(Avg.Spent) – 0.178322*(Intervisit.time) – 0.008495*(MIncome) +
0.178313*(Rating).
The classification accuracy in the Training Sample is 72.54%. The classification accuracy
in the Testing Sample is 71.15%.

Relevant Code:

b. Now, use the Stepwise Model Selection approach. Report the classification accuracy in the
Training and Testing samples for the selected model. Report and interpret the selected
model. Using this model, calculate the predicted probability that a customer that spends
an average of $50 per visit, visits once in six weeks, has a monthly income of $6,000, and
has a 70% satisfaction rating, will be a Loyal customer.
The classification accuracy for the Training Sample is 72.54%, and the classification
accuracy for the testing model is 71.15%.
Using the Stepwise Selection approach the model is: V = -10.74798 + 0.17687*(Rating) –
0.17813*(Intervisit.time). This means that per every 1 unit increase in the rating given, that
the odds a person is a loyal customer increases by a factor of 1.19. Furthermore, as the
amount of time in between visits for a customer decreases by 1 unit, their odds of being a
repeat customer increases by a factor of 0.836.
Using the given information and the model from the Stepwise Selection Approach we
get: -10.74798 + 0.17687*(70) – 0.17813*(6) = 0.56414; thus, there is a 56.41% likelihood
that the customer is loyal.

Relevant Code:
c. Plot the ROC curves, and report the AUC numbers for the models in 3(a) and 3(b)?

AUC numbers and ROC curve for Model 3a:

AUC numbers and ROC curve for Model 3b:


Relevant Code:

Вам также может понравиться