Вы находитесь на странице: 1из 48

ISM 645- Assignment 2

Generating, Assessing and Scoring


Predictive Models
Ritika Chhabra, Bala Durga Sridevi Kesavarapu, Brook
Tester, Anthony Rodriguez
ABSTRACT
The target of this assignment is to use the data collected which includes whether or not customers have
purchased any organic products. This is in relation to Full Food supermarket’s management who would
like to determine which customers are likely to purchase their new line of organic products. The
supermarket has a customer loyalty program. As an initial buyer incentive plan, the supermarket
provided coupons for organic products to all of its loyalty program participants and has collected the
data which we will use for our analysis

The rationale under the analyses is to use StatExplore, replacement, imputing, decision making,
regression, neural network, model comparison, profit matrix, score node, etc. The effectiveness and
feasibility of the proposed modeling procedures were subjected to three tasks; Data exploration,
Descriptive analytics and Predictive analytics will be performed.

1) Data Exploration will help to bring into notice if any data needs to be removed, imputed,
transformed, or massaged with special attention to missing data, range of data, max, min,
distribution, skewness and outliers.
2) Descriptive Analytics will help us dig deeper into the data set which can help the management
make rules and guidelines for the future.
3) Predictive Analytics will include data partitioning and use of at least three different modeling
techniques.
The model will help the management in a way that makes it easier for them to know if people will buy
from their new line of organic products or not.

As the results reveal, the proposed model built from predictive modeling tools is sufficiently
interpretable as to provide a reason for any actions.
Executive Summary

Full Food Supermarket(FFS) is interested in offering a new line of organic products. Prior to a full roll-
out of the new organics line, FFS marketed this line of products to its customer loyalty program
participants by offering coupons for organic products. The data gathered from the response to this
coupon marketing campaign provided valuable insight into the profiles of customers who are likely to
purchase or not purchase organic products.

By taking a cursory look at the data, several insights were apparent:


● Organic products were bought most often
○ by customers with an affluence grade greater than 15 (scale of 1-30)
○ by customers under the age of 50
○ by customers with low loyalty program membership tenure
■ 0-10 with affluence grade > 15
■ 0-2 with age less than 40
■ 0-15 and aged 20-40
○ by female customers

● Organic products were bought least often


○ by customers aged 40-80
○ males older than 50
○ males with affluence grade < 12
○ customers with affluence grade < 2 regardless of loyalty program tenure

The data was cleaned via imputation, replacement, and transformation prior to it being used to train and
validate 12 predictive analytics models(Details on model configuration are explained in detail in this
report) :
● 3 decision trees (1 auto-configured, 2 manually split)
● Regression (Linear, Stepwise, Polynomial)
● Neural Network
● Auto Neural Network
● LARS
● DM Neutral
● Dmine Regression
● MBR

The best performing as judged by the lowest misclassification rate was the Decision Tree model we
manually configured to split using the affluence grade. Its misclassification rate was 18.4941%. A profit
matrix was also created as part of this analytics project. The model providing the highest profit to Full
Food Supermarket was the auto-configured Neural Network model.
REPORT:
To begin analyzing collected data to predict which customers are likely to purchase organic products,
the first step is to add the data source to the project. After it is done, begin building the process flow
diagram to analyze this data. The various steps which we need to take are creating a library, a diagram
and a data source.

Step 1) Create a Project “assignment 2”

Step 2) Create a new library- “AASMNT”

Step 3) Create a data source using the data file ‘Organics’. This data set has 13 variables and 22,223
observations. We’ll keep the settings to advanced and proceed.

Step 4) The data source wizard appears with the column metadata and we will change the “TargetAmt”
role from target to rejected and the level from nominal to interval. We are doing this because two target
variables are listed, and this exercise concentrates on the binary variable – “TargetBuy.” We will limit
our target to only one.

Further, we will keep the role of data to be raw and SAS tells us how many roles and levels we have in
our data set after the changes made. Now the data source is created so we will move on to our diagram.

Step 5) Pull in the data source ‘organics’ to our diagram workspace and with this the property panel
changes. Now we will explore this data.

DATA EXPLORATION:
While exploring and inspecting distribution of the data Organics via column variables under the
properties panel, the following could be found:

● The histogram (SCREENSHOT) for “TargetBuy” shows that 5,505 people bought organics
whereas 16,718 didn’t. The proportion of people who purchased organic products is
approximately 25%.
● The variable “DemClusterGroup” contains collapsed levels of variable “DemCluster.” Presume
that, based on previous experience, “DemClusterGroup” is sufficient for this type of modelling
effort. Set the model role for “DemCluster” to Rejected but this is already done using Advanced
Metadata Advisor.
● As noted above, only “TargetBuy” is used for this analysis and should have a role of “Target.”
And we also cannot use “TargetAmt” as an input as it is measured at the same time as
“TargetBuy” and therefore has no causal relationship to “TargetBuy.”
This gives us a lot of essential information. Now from here we proceed to our descriptive analytics.

Using Explore-StatExplore from the SEMMA panel and analyzing the Variable Worth
(SCREENSHOT) bar chart it could be seen that the most important input variable that predicts whether
a person buy organic products or not are: “DemAge” (Age in years), “DemAffl”(Affluence grade on a
scale from 1 to 30) and “DemGender” (M=Male, F=Female, U=Unknown). Inputs such as
“DemClusterGroup”, “DemTVReg” and “DemReg” do not matter much.
Now having a look at the Output (SCREENSHOT) we see:

● There were 8 distinct levels of “DemClusterGroup” and 674 were missing.


● 4 distinct levels of “DemGender” and 2512 were missing.
● 14 distinct levels of “DemTVReg” and 465 were missing.
● 4 distinct levels of “PromClass” and 0 were missing.
● 2 distinct levels of “TargetBuy” and 0 were missing.
Exhibit 4 represents a detailed look into characteristics of the data such as mean, median, mode,
skewness, kurtosis, etc.

Looking at the interval variable summary statistics (SCREENSHOT), we can see that people who buy
organic products have:
● High “DemAffl” mean value
● Low “DemAge” mean value
● Low “PromSpend“ mean value
● Low “PromTime“ mean value

Exhibit 6 tells us more about our data set. The gender is very important in predicting if a person buys
organic products or not because it has high Chi square and this is also one of the most important input
variables. Now we’ll try replacement on this data and to customize it using the replacement editor. Here,
the changes we made can be seen in exhibit 7.

After replacement, we will connect organics – replacement – StatExplore. Some of the things do change
as we have limited the input based upon range, but the overall conclusion doesn’t change. This can be
seen in exhibit 9 and exhibit 10. If we have a closer look at the two exhibits, exhibit 4 and exhibit 10; we
can have a detailed look at the changes.

Now since we have empty data cells., let’s go ahead with imputing. If a class variable has missing data
we replace it with a tree surrogate and if we have a missing internal variable then we replace it with
mean.

Now let’s run StatExplore again. Having a look at the Skewness of the input variables;

● Positively skewness/skewed right means many low score and few high scores
● Negatively skewed/skewed left means fewer low scores and many high scores.
Also, for the positively skewed inputs- mode is lowest, median is middle, and mean is highest whereas
for negatively skewed inputs- mean is lowest, median is in the middle and mode is the highest.

DESCRIPTIVE ANALYTICS:

Descriptive analytics allows us to understand the past for the purpose of generating testable hypotheses
for predicting future. It helps us to understand the relationship between or among the data elements.
When we performed descriptive analytics on the data set, the objective was to gain an understanding of
what can help management to know who buys organic products and who doesn’t.

The various tools we used were based on graphs, measures of central tendency, and measures of
variability, relative position and relationship.

We arrived at the following conclusions:

1) With the help of scatter plot, we learned that those who bought organic products have high
affluence grade (15-20) and most of them age below 50.

2) People with Tin “PromCard“ and less “PromSpend“ buy organic products than other people.

3) People with platinum “PromCard“ and “PromSpend“ around 230000 buy organics.

4) People aged 40-80 don’t buy much organic products.

5) People with less than 2 affluence grade refrains from buying organic no matter their loyalty card
tenure.

6) People with affluence grade more than 15 and loyalty card tenure between 0-10 prefer organic.

7) People with Silver “PromCard“ and spend less than 5000 never buy organic.

8) People with age less than 40 and low loyalty card tenure (0-2) prefer to buy organic.

9) People with high affluence grade tend to buy organic n matter what the total spend is.

10) Females tend to buy organic products.

11) People with ulster television range don’t buy organics.

12) Females with cluster group tend to buy organics.

13) People with Southwest as demographic region and age 60-80 don’t buy organics.

14) Females age less than 40 tend to buy more organics.

15) Males above age 50 don’t buy organics.

16) People with Southeast as demographic region and age less than 40 tend to buy organics.

17) Males with affluence grade less than 12 don’t buy organics.
18) Females with affluence grade more than 15 buy organics.

19) People from North as geographic region and less than 1000 total spend don’t buy organics.

20) People age 20-40 and loyalty card tenure 0-15 tend to buy organics.
Based on the conclusions above, we can imply that in certain situations, various factors such as
affluence grade, gender, loyalty card tenure, etc., influence whether a person will buy organics or not.

PREDICTIVE ANALYTICS:

Now that we have descriptive analytics done on our dataset, we can proceed with predictive analytics. It
allows us to predict with confidence what will happen next based on what has happened in the past so
that we can make smarter decisions and improve outcomes.

In our example, we are making a model to predict which customers are likely to purchase organic
products based upon the data collected that includes whether the customers purchased organic products
after they were given the coupons.

After the imputing stage, we portioned the data into three parts of 40%, 30%, 30%; training, validation
and testing respectively and then selected the best decision tree based upon the misclassification rate and
by using model comparison.

This was done in order to develop a set of rules and special guidelines for the management to follow.

Decision tree

We made three decision trees using decision tree predictive method which can be seen in the exhibit 13-
17.

Decision tree 1: Decision Tree using Enterprise Miner’s default configuration without making any
changes. This tree contained 27 leaves and its misclassification rate is 0.185091 (SCREENSHOT).

Decision tree 2: Decision Tree split at Replacement age with leaves 27 leaves and misclassification rate
of 0.185691 (SCREENSHOT).

Decision tree 3: Decision Tree split at Replacement Affluence grade with 28 leaves and
misclassification rate of 0.184941 (SCREENSHOT).

As per model comparison node and the misclassification rate, the best tree out of all above three trees is
DT3.

Now we’ll use other predictive methods to analyze.


Regression:

Regressions offer a different approach to prediction compared to decision trees. Regressions, as


parametric models, assume a specific association structure between inputs and target. By contrast, trees,
as predictive algorithms, do not assume any association structure; they simply seek to isolate
concentrations of cases with like-valued target measurements.

After running the StatExplore node we observed that the class inputs and interval inputs have several
missing values. We have to impute those missing values. In order to do that we will add an Impute node
from the Modify tab into the diagram and connect it to the Data Partition node. We shall set the node to
impute “U” for unknown class variable values and the overall mean for unknown interval variable
values. Create imputation indicators for all imputed inputs. Do the following changes to the impute
node:

● Select default input method in the class variables group as default constant value.
● Under the indicator variables select type as “unique” and role as “input.”
● Now add the regression node to the impute node and connect it.
● Run the regression node to view the results. (SCREENSHOT)
● The initial lines of the Output window summarizes the roles of variables used (or not) by the
Regression node.
● The training data set name, target variable name, number of target categories, and most
importantly, the number of model parameters can be observed in the following lines
(SCREENSHOT)
● Next, consider maximum likelihood procedure, overall model fit, and the Type 3 Analysis of
Effects.
● The Type 3 Analysis tests the statistical significance of adding the indicated input to a model that
already contains other listed inputs. A value near 0 in the Pr > ChiSq column approximately
indicates a significant input; a value near 1 indicates an extraneous input.(SCREENSHOT).
● We observe that the statistical significance measures a range from <0.0001 (highly significant) to
0.9593 (highly dubious). Results such as this suggest that certain inputs can be eliminated
without affecting the predictive prowess of the model.
● There appears to be some discrepancy between the values of these two statistics in the training
and validation data. This indicates a possible overfit of the model. It can be mitigated by using an
input selection procedure.
Stepwise regression:

Stepwise selection combines elements from both the forward and backward selection procedures.
The method begins in the same way as the forward procedure, sequentially adding inputs with the
smallest p-value below the entry cutoff. After each input is added, however, the algorithm reevaluates
the statistical significance of all included inputs. If the p-value of any of the included inputs exceeds the
stay cutoff, the input is removed from the model and reentered into the pool of inputs that are available
for inclusion in a subsequent step. The process terminates when all inputs available for inclusion in the
model have p-values in excess of the entry cutoff and all inputs already included in the model have p-
values below the stay cutoff.
We are implementing a sequential selection method in the regression node by making the
following changes to it:

● Under the model selection, select the model as “stepwise” and change the selection criterion to
“validation error”. Validation error, also known as Error Function, equals negative 2 log-
likelihood for logistic regression models and error sum of squares (SSE) for linear regression
models. We are selecting validation error as selection criterion because our predictions are
estimates.
● Now run the regression node to view the results.(SCREENSHOT)
● The stepwise procedure starts with Step 0, an intercept-only regression model. The value of the
intercept parameter is chosen so that the model predicts the overall target mean for every case.
The parameter estimate and the training data target measurements are combined in an objective
function. The objective function is determined by the model form and the error distribution of the
target. The value of the objective function for the intercept-only model is compared to the values
obtained in subsequent steps for more complex models. A large decrease in the objective
function for the more complex model indicates a significantly better model.(SCREENSHOT)
● Step 1 adds one input to the intercept-only model. The input and corresponding parameters are
chosen to produce the largest decrease in the objective function. To estimate the values of the
model parameters, the modeling algorithm makes an initial guess for their values. The initial
guess is combined with the training data measurements in the objective function. Based on
statistical theory, the objective function is assumed to take its minimum value at the correct
estimate for the parameters. The algorithm decides whether changing the values of the initial
parameter estimates can decrease the value of the objective function. If so, the parameter
estimates are changed to decrease the value of the objective function and the process iterates.
The algorithm continues iterating until changes in the parameter estimates fail to substantially
decrease the value of the objective function.(SCREENSHOT)
● The output next compares the model fit in Step 1 with the model fit in Step 0. The objective
functions of both models are multiplied by 2 and differenced. The difference is assumed to have
a chi-squared distribution with one degree of freedom. The hypothesis that the two models are
identical is tested. A large value for the chi-squared statistic makes this hypothesis unlikely.
● The hypothesis test is summarized in the next lines.(SCREENSHOT)
● The output summarizes an analysis of the statistical significance of individual model effects. For
the one input model, this is similar to the global significance test above.(SCREENSHOT)
● Finally, an analysis of individual parameter estimates is made.(SCREENSHOT)
● The stepwise selection process continues for eight steps. After the eighth step, neither adding nor
removing inputs from the model significantly changes the model fit statistic. At this point, the
Output window provides a summary of the stepwise procedure.(SCREENSHOT)
● In the preparation for regression we will check for any transformations of the data warranted. To
do this we will open the variables window of the regression node and select the imputed interval
inputs.(SCREENSHOT)
● Click Explore button. We observe here that both Card Tenure and Affluence Grade have
moderately skewed distributions. Applying a log transformation to these inputs might improve
the model fit.
● Disconnect the Impute node from the Data Partition node.
● Regression models are sensitive to extreme or outlying values in the input space. Inputs with
highly skewed or highly kurtotic distributions can be selected over inputs that yield better overall
predictions. To avoid this problem, analysts often regularize the input distributions using a
simple transformation.
● Add a Transform Variables node from the Modify tab to the diagram and connect it to the Data
Partition node.
● Connect the Transform Variables node to the Impute node.
● Now open the variables window of the transform variables node and change method to Log for
DemAffl and PromTime inputs. Click ok.
● We will run the Transform variables node and explore the exported training data.
● Open the variables window and select method to log for DemAffl and Prom Time inputs. Click
ok.(SCREENSHOT)
● Now, run the transform variable node to explore the exported training data. We will see if the
transformations result in less skewed distributions. To do this, update the impute node and open
the variables window of the impute node.
● Select LOG_DemAffl and LOG_PromTime inputs and click on Explore.
● We observe that the distributions are nicely symmetric.(SCREENSHOT)
● By looking at the output window we can say that the log transformation increases the validation
ASE slightly.
Polynomial Regression:

● Select the new Regression node from the model tab and drag it to the workspace. Rename it as
Poly regression and connect it to the impute node.
● To add polynomial terms to the model, you use the Term Editor. To use the Term Editor, you
need to enable User Terms.
● Change the main effects, two factor interactions and polynomial terms to yes.
● Change selection model to stepwise and selection criterion to validation error.
● Run the polynomial regression node.
● If we examine the fix statistics window, we can see that the additional term reduces the
validation ASE slightly. (SCREENSHOT)
Neural Network

● A neural network can be thought of as a regression model on a set of derived inputs, called
hidden units. In turn, the hidden units can be thought of as regressions on the original inputs.
The hidden unit “regressions” include a default link function (in neural network language, an
activation function), the hyperbolic tangent.
● From the model tab, add a neural network tool to the diagram and connect it to the impute node.
● Set the model selection criterion to average error.
● Run the neural network node and examine the validation average squared error and fit statistics
window (SCREENSHOT)
● We observe that the validation ASE for the neural network model is slightly smaller than the
standard regression model, nearly the same as the polynomial regression, and slightly larger than
the decision tree's ASE.(SCREENSHOT)
Auto Neural

● The AutoNeural tool offers an automatic way to explore alternative network architectures and
hidden unit counts.
● From the model tab, drag the AutoNeural tool to the diagram workspace and connect it to the
regression node.
● Confirm that this setting is in effect: Train Action - Search. This configures the AutoNeural node
to sequentially increase the network complexity.
● Select Number of Hidden Unit -1. With this option, each iteration adds one hidden unit.
● Select Tolerance - Low. This prevents preliminary training from occurring.
● Select Direct- No. This deactivates direct connections between the inputs and the target.
● Select Normal- No. This deactivates the normal distribution activation function.
● Select Sine - No. This deactivates the sine activation function.
● Run the AutoNeural node and view the results (SCREENSHOT).
● The number of weights implies that the selected model has one hidden unit. The average squared
error and misclassification rates are quite low.
● By looking at the results, we can see that the various fit statistics versus training iteration uses a
single hidden unit network.
● Training stops at iteration 8 (based on an AutoNeural property setting). Validation
misclassification is used to select the best iteration, in this case. Weights from this iteration are
selected for use in the next step. (SCREENSHOT)
● A second hidden unit is added to the neural network model. All weights related to this new
hidden unit are set to zero. All remaining weights are set to the values obtained in iteration 3
above. In this way, the two-hidden-unit neural network and the one-hidden-unit neural network
have equal fit statistics.
● The final model shows the hidden units added at each step and the corresponding value of the
objective function (related to the likelihood). (SCREENSHOT)

LARS model:

Forward selection, provides good motivation for understanding how the Least Angle Regression
(LARS) node works. In forward selection, the model-building process starts with an intercept. The best
candidate input (the variable whose parameter estimate is most significantly different from zero) is
added to create a one-variable model in the first step. The best candidate input variable from the
remaining subset of input variables is added to create a two-variable model in the second step, and so on.
Notice that an equivalent way to phrase the description of what happens in Step 2 is that the candidate
input variable most highly correlated with the residuals of the one variable model is added to create a
two-variable model in the second step.

The LARS algorithm generalizes forward selection in the following way:


● Weight estimates are initially set to a value of zero.
● The slope estimate in the one variable model grows away from zero until some other candidate
input has an equivalent correlation with the residuals of that model.
● Growth of the slope estimate on the first variable stops, and growth on the slope estimate of the
second variable begins.
● This process continues until the least squares solutions for the weights are attained, or some
stopping criterion is optimized.
● This process can be constrained by putting a threshold on the aggregate magnitude of the
parameter estimates. The LARS node provides an option to use the LASSO (Least Absolute
Shrinkage and Selection Operator) method for variable subset selection.
● The LARS node optimizes the complexity of the model using a penalized best fit criterion. The
Schwarz's Bayesian Criterion (SBC) is the default.
● From the Model tab drag the LARS node to the diagram workspace and connect it to the impute
node.
● Run the LARS node and view the results.
● The selected model, based on SBC, is the model at step 4.(SCREENSHOT).
DM Neural node:

DM Neural tool is designed to provide a flexible target prediction using an algorithm with some
similarities to a neural network. A multi-stage prediction formula scores new cases. The problem of
selecting useful inputs is circumvented by a principal components method. Model complexity is
controlled by choosing the number of stages in the multi-stage predictions formula.

The predictions of the DM Neural are:

• Up to three PCs with highest target R square are selected.

• One of eight continuous transformations are selected and applied to selected PCs.

• The process is repeated three times with residuals from each stage.

The algorithm starts by transforming the original inputs into principal components, which are orthogonal
linear transformations of the original variables. The three principal components with the highest target
correlation are selected for the next step.

● Output and (SCREENSHOT)

Dmine Regression:

Dmine regression is designed to provide a regression model with more flexibility than a standard
regression model. It should be noted that with increased flexibility comes an increased potential of
overfitting. A regression-like prediction formula is used to score new cases. Forward selection chooses
the inputs. Model complexity is controlled by a minimum R-square statistic.
● The main distinguishing feature of Dmine regression versus traditional regression is its grouping
of categorical inputs and binning of continuous inputs.
● The levels of each categorical input are systematically grouped together using an algorithm that
is reminiscent of a decision tree. Both the original and grouped inputs are made available for
subsequent input selection.
● All interval inputs are broken into a maximum of 16 bins in order to accommodate nonlinear
associations between the inputs and the target. The levels of the maximally binned interval inputs
are grouped using the same algorithm for grouping categorical inputs. These binned-and-grouped
inputs and the original interval inputs are made available for input selection.
● A forward selection algorithm selects from the original, binned, and grouped inputs. Only inputs
with an R square of 0.005 or above are eligible for inclusion in the forward selection process.
Forward selection on eligible inputs stops when no input improves the R square of the model by
the default value 0.0005.
● Output (SCREENSHOT).

MBR

● Memory Based Reasoning (MBR) is the implementation of k-nearest neighbor prediction in SAS
Enterprise Miner. MBR predictions should be limited to decisions rather than rankings or
estimates. Decisions are made based on the prevalence of each target level in the nearest k-cases
(16 by default). Most primary outcome cases results in a primary decision, and a majority of
secondary outcome cases results in a secondary decision. Nearest neighbor algorithms have no
means to select inputs and can easily fall victim to the curse of dimensionality. Complexity of
the model can be adjusted by changing the number of neighbors considered for prediction.
● By default, the prediction decision made for each point in the input space is determined by the
target values of the 16 nearest training data cases. Increasing the number of cases involved in the
decision tends to smooth the prediction boundaries, which can be quite complex for small
neighbor counts.
● Unlike other prediction tools where scoring is performed by DATA step functions independent
of the original training data, MBR scoring requires both the original training data set and a
specialized SAS procedure, PMBR.
● Output and (SCREENSHOT)

Model Comparison

In order to compare the models, a control point was set up to filter the model results to the
additional model comparison nodes with different control statistics. By default, the initial comparison fit
statistics based on the misclassification rate for binary targets, as it makes sense to target minimizing
misclassifications for the best model. In the model comparison, Tree 2 and Tree 3 had the same
misclassification rate of 18.731%, although all trees were within a 0.2% misclassification range.
(SCREENSHOT).
● With the binary data target, using the selection statistic as the misclassification rate makes sense.
All three decision trees along with the auto neural, neural, DM neural and Dmine Regression are
below the 20% misclassification range so the accuracy per model is extremely close. The only
models with a 20% or higher misclassification and; therefore, least accurate are the DMNeural
and MBR models developed.
● Misclassification is typically the best model comparison, as our target is to predict whether an
individual will purchase organic products. Given that target, a misclassification will have the
most negative impact as predicting a purchase that results in a non-purchase is means that the
accuracy of the model is not up-to-par. Hence, active versus predictive needs to be as accurate as
possible when predicting a boolean flag.

Profit Matrix

For the profit matrix, the SCOREORG data source for scoring was added to the project to score
the best models chosen against the score data set containing source variables only. Note that the data
probabilities for the default model provided has a 25% target buy purchase and 75% target buy non-
purchases (SCREENSHOT); however, we’re assuming that this is correct and not incorrect from
oversampling. Additional assumptions made for the profit matrix are assumed values that would be
typically provided by a business analyst, marketing resource, or another subject matter expert . As we’re
missing total purchase count, it is difficult to gauge how much of the purchase is organics only. If total
count of items was included, then it would be possible to generate a more accurate average organics
amount. However, based on the data provided, it would be dangerous to assume that if a user purchases
organic, the total purchase is all organic products in terms of profits and weigh the decision on total
spend.

To generate a profit matrix, the ORGANICS_PROFIT data source based on theoretical business
analyst data was created. The assumption is that if a user purchases an organic product, then the average
number of items (TargetAmt) from the sample provided is 1.183 organic items. We’ll assume an
average organic item price of $5, so the weighted cost for an organic buy is $5.915; however, if a user
does not purchase an organic item then it costs the store -$0.5 in promotions campaign costs.
(SCREENSHOT). As seen in the exhibit, decision 2 has been zeroed out and decision 1 weighted with a
target buy of 5.415 (5.915-0.5) and a non-target buy of -0.50.

● Neural Network is the best model from a model comparison given the profit matrix defined. The
model comparison node was updated to use the selection statistics as average profit/loss. For this
model, the cumulative total expected profit tells us that we’ll max out our expected profit of
$6000 if targeting the best 50% of individuals (validation results) for an organics purchase
(SCREENSHOT). Even though the model does not have the best misclassification rate, it has the
highest profit maximization given the parameters (SCREENSHOT). Remember also that this is
based off the sample set, so the actual profit amounts after the campaign may be significantly
higher.
● The advanced model based off the profit matrix is the one that we would use. This is due to
targeting the most profit per organic purchaser based off the organics profit data source defined.
Recommendations

Based on the analysis of the data provided by the coupon marketing campaign and the predictive
analytics as supplied by the various models used in the research, we can identify key indicators of future
organic product purchases. When the new organic product line is rolled out, targeted marketing should
be applied to the following demographics:

● Females
● Consumers under the age of 40
● Consumers with a high affluence grade (> 15)
● Consumers living in the Southeast region

The following demographics are less likely to purchase organics based on the original coupon
marketing campaign:

● Consumers aged 40-80


● Consumers with affluence grade <= 2
● Males with affluence grade < 12 or age > 50
It is possible consumers aged greater than 40 could be solicited to purchase organics via a
marketing campaign centered around immediate and long-term health benefits of eating organic foods.

Targeting individuals with a lower affluence grade may not be feasible regardless of the
marketing campaign due to the higher cost of organic products and their likely inability to afford such
products.
Appendix
Exhibit 1
Exhibit 2
Exhibit 3
Exhibit 4
Exhibit 5
Exhibit 6
Exhibit 7
Exhibit 8
Exhibit 9
Exhibit 10
Exhibit 11
Exhibit 12
Exhibit 13
Exhibit 14
Exhibit 15

Exhibit 16

Exhibit 17
Exhibit 18

Exhibit 19
Exhibit 20
Exhibit 21
Exhibit 22

Exhibit 23
Exhibit 24

Exhibit 25

Exhibit 26
Exhibit 27
Exhibit 28
Exhibit 29

Exhibit 30
Exhibit 31

Exhibit 32
Exhibit 33

Exhibit 34
Exhibit 35

Exhibit 36
Exhibit 37
Exhibit 38
Exhibit 39

Exhibit 40
Files

Enterprise Miner Diagrams

Вам также может понравиться