Вы находитесь на странице: 1из 32

Goodbelly Marketing: Analysis and Recommendations

Cody Wild & Luba Gloukhov


October 7, 2014

List of Tables
1
2
3
4
5
6
7

Univariate Summary Statistics . . . . . . . . . . . . . . . . . . . . .


Breakdown of Observations By Occurance of Marketing Techniques
Estimated Revenue Benefit of Marketing Technique By Region . . .
Region Abbreviations and Names . . . . . . . . . . . . . . . . . . .
Summary of Units Sold and Price By Region . . . . . . . . . . . . .
Breakdown of Observations When Endcap == 1 . . . . . . . . . . .
Breakdown of Observations for Endcap == 0 . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

3
4
11
12
31
31
32

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

5
7
7
9
13
13
14
14
29
29
30
30

List of Figures
1
2
3
4
5
6
7
8
9
10
11
12

Units Sold vs. Date (Model A1) . . . . . . . . . . . . . . . . .


Residual Distribution (Model B5) . . . . . . . . . . . . . . . .
Residual vs. Fitted Values (Model B5) . . . . . . . . . . . . .
Increase Due to Endcap in Areas With and Without Sales Rep
Distribution of Units Sold . . . . . . . . . . . . . . . . . . . .
Distribution of Average Retail Price . . . . . . . . . . . . . .
Distribution of Revenue . . . . . . . . . . . . . . . . . . . . .
Distribution of Total Units Sold Per Store . . . . . . . . . . .
Model C2: Residual Distribution . . . . . . . . . . . . . . . .
Model C2: Residuals vs Fitted values . . . . . . . . . . . . . .
Model C2: Residuals vs Average Retail Price . . . . . . . . . .
Impact of Demo on Revenue (given Endcap = 0) . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

Executive Summary

Understood most generally, this report supports continuation of both endcap and demonstration
programs, since both can be associated with powerful and significant boosts to sales and consequently revenue.
Where the endcap campaign is successful, it is linked to with increases between $1, 850 and
$2, 050 in weekly store revenue. The success of this campaign, however, is strongly influenced by
the presence of a regional sales representative interfacing with the store. In regions without a sales
representative, the effect of an endcap was essentially nonexistent. In the short term, this result
1

suggests that endcaps should be curtailed in stores without a sales representative, and expanded
in stores with one. In the long term, more research should be done to determine factors underlying
the presence of a sales representative are driving this dynamic.
Demonstrations are more consistently beneficial, leading toaverage revenue boosts of $430 in
the week they are conducted, and $300 in the 5 weeks thereafter. This addresses and assuages
one of the fears laid out in our briefing: that the benefits of a demonstration recede quickly. In
fact,our analysis suggests that the positive effects of a demo continue at 75% of initial strength for
up to five weeks after they are conducted.
Although definitive judgments on continuation would require cost information, based on the
available data, we believe marketing funds have by and large been spent effectively in the past,
and, incorporating this reports insights as well as additional research, can be deployed even more
effectively in future.

Introduction

2.1

Problem Statement

Given the financial exigencies imposed by the recession and consequent market constriction in
2008, the question of whether Goodbellys current marketing projects are leading to justifiably
large returns is of particular relevance. With that as a backdrop, this report investigates whether,
and to what degree, two specific marketing campaigns targeting Goodbelly sales at Whole Foods
have been successful.
The first of these campaigns involved the use of marketing funds to set up two parallel incentive
structures. One of these incentivized sales representatives to convince more Whole Foods stores
to present Goodbelly products at the ends of aisles. The other incentivized Whole Foods stores
to create the most decorative endcaps to attract customers. The second marketing campaign paid
for part-time individuals to set up a demonstration stand in the Whole Foods and offer samples
to customers.

2.2

Data Description

The dataset provided for this analysis includes 1,386 observations, each of which represents information for a given store, for each of the eleven weeks between May 4 and July 13 of 2010, across
12 variables:
Date - Date corresponding to first day of the week
Region - Two-digit code corresponding to geographic region

Store - Name of individual Whole Foods store


U nits.Sold - Goodbelly units sold during the week
Average.Retail.P rice - Average price of Goodbelly products
Sales.Rep - Binary variable set to 1 if a regional, rather than national, sales rep was responsible for the store and 0 otherwise
1

A full list of abbreviations and corresponding region names can be found in the Appendix.

Endcap - Binary variable set to 1 if the store had a Goodbelly endcap installed during the
week, and 0 otherwise
Demo - Binary variable set to 1 if the store conducted a demo during this week, 0 otherwise
Demo1.3 - Binary variable set to 1 if the store conducted a demo during the 1-3 weeks prior
to this week, 0 otherwise
Demo4.5 - Binary variable set to 1 if the store conducted a demo during the 4-5 weeks prior
to this week, 0 otherwise
N atural - Number of natural food stores located within five miles of this Whole Foods store
F itness - Number of fitness centers located within five miles of this Whole Foods store
**Revenue - Not included in the initial dataset, this variable, set equal to U nits.Sold
Average.Retail.P rice was created for ease of analysis

Table 1: Univariate Summary Statistics


Variable
Min Median Mean
U nits.Sold
47.6
236.7
253.8
Average.Retail.P rice 2.9
4.1
4.1
Revenue
204.2 967.3 1041.0
[H]
Endcap
0
0
.038
Demo
0
0
.058
Demo1.3
0
0
.157
Demo4.5
0
0
.076

Max
1041
6.3
4252.0
1
1
1
1

Both U nits.Sold and Revenue follow unimodal and right-skewed distributions. Average.Retail.P rice
follows a bimodal right-skewed distribution. Histograms of both of these continuous variables can
be found in Appendix 1. As shown in the above table, Demos were occured in 5.8% of the observations within this dataset. The mean values for Demo1.3 and Demo4.5 are somewhat higher, at
15.7% and 7.6% respectively, accounting both for presence of multiple weeks within these intervals
and for demos that occured during the weeks prior to the window covered by the dataset. Demos
appeared at some point in the time spanning the data set in 69 of the 126 stores. Endcaps appeared
in just 12 of the 126 stores.

Table 2: Breakdown of Observations By Occurance of Marketing Techniques


Demo Demo1.3 Demo4.5 Endcap
n
Units.Sold
Average.Retail.Price
0
0
0
0
992
213.27 57.03
4.09 0.44
1
28
497.78 257.91
4.11 0.53
1
0
82
304.88 57.88
4.25 0.53
1
3
443.86 244.68
3.71 0.88
1
0
0
171
308.53 47.38
4.09 0.52
1
14
662.26 229.49
3.72 0.47
1
0
15
366.30 70.76
4.31 0.40
1
0
N aN
N A N aN N A
1
0
0
0
57
334.21 55.51
4.19 0.45
1
5
710.74 238.72
4.00 0.43
1
0
2
394.86 47.37
4.63 0.85
1
0
N aN
N A N aN N A
1
0
0
12
427.50 38.52
4.39 0.50
1
2
905.95 28.49
3.50 0.32
1
0
2
532.47 34.05
4.22 0.53
1
1
1041.20
NA
4.08 N A
All
1386
253.82 111.00
4.11 0.46

Model Construction

3.1

Initial Exploration

Before examining models that directly addressed the effect of Goodbellys marketing, our team
started with more basic models that we hoped would reveal patterns in the data. Since the data was
structured as a time series, our first exploratory model, Model A1, regressed U nits.Sold against
Date. Although this Simple Linear Regression had low explanatory power, with an Ra2 at just over
.05, the fact that Date was a highly significant variable, with p < 2e16 , communicated the fact
that the data contained a chronological trend, with Units Sold increasing as observations moved
forward in time.
Since the response under consideration in this analysis is connected to customer purchasing
decisions, basic familiarity with economic theory suggested that Average.Retail.P rice may explain
variation in units sold, with customers buying less at higher prices. Despite this intuition, Model
A2, which regressed U nits.Sold against both Date and P rice 2 , found the effect P rice to not
be significantly different from zero, with p = .32. Given this lack of significance of P rice as
an explanatory variable here, and further given the similarity in the distributions of U nits.Sold
and Revenue, our team decided to continue the process of model construction removing P rice as
an explanatory variable and using Revenue, rather than U nits.Sold, as the dependant variable.
This decision was motivated by the fact that, ultimately, the question of whether these marketing
programs are cost-effective is determined by whether the revenue they bring in is greater than the
cost they incur. Therefore, modeling Revenue as the dependant variable is more directly in line
with the questions under consideration in this report.
2

For the sake of brevity, P rice is used here to refer to Average.Retail.P rice

Figure 1: Units Sold vs. Date (Model A1)

It was at this point, with Model A3, that the explanatory variables of interest, the indicators of
the presence of each marketing technique during a given observation, were added into the regression.
With this addition, the explanatory power of our model increased almost tenfold, growing from
an Ra2 = .054 to Ra2 = .523. All of the marketing variables (i.e. Demo, Demo1.3, Demo4.5, and
Endcap) have positive and significant coefficients in this model: the first concrete sign of a notable
benefit associated with these marketing strategies. Notably, once these indicators are added into
the model, Date ceases to be significant at = .05, suggesting that the upward chronological trend
observed in more simplistic models and graphics is largely explained by the greater presence of
marketing efforts as time goes on.
Now that the most general explanatory variables - Date and P rice - as well as the targeted
explanatory variables have been incorporated, the next step was to include information that differentiates stores from one another. On the most aggregate level, stores are distinguished from
one another by their geographical location, which Model A4 incorporated by adding Region as an
explanatory variable3 . This model was a substantial improvement on its predecessor, with a jump
of 15% in terms of absolute explanatory power, from Ra2 = .523 to Ra2 = .668
Given this as a solid core of a model, our team continued to build upwards with the variables
on hand, which now distinguish individual stores within region. Models A5 and A6, which add
F itness and N atural respectively to the baseline established by A4, find neither to add significant
predictive value, with p values of .56 and .58. In Model A7, which adds Sales.Rep, we see that
while it doesnt add substantial predictive power - a change of only .001 in Ra2 , it is significantly
different from zero, and thus was kept in subsequent models moving forward.

3.2

Regional Interaction Models

Prior to adding any interaction terms, the benchmarked current best model is specified as follows:
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5 + 6Sales.Rep
3

In interpreting output from models involving Region, be aware that the zero-level of Region, and therefore the
one chosen by R to be removed and incorporated into the intercept to ensure invertibility, was consciously chosen
to be New England (NE), since that region exhibited revenue closest to the average level revenue across all regions.

Given that the field of possibly fruitful single variables has been considered, we now move on to
interaction terms, which would lead to the effects of one variable varying according to the levels
of another. Starting again from the most macro view of possible differentiation, we considered
Region as a possible category by which effects of a marketing technique might differ. Based on
this, we constructed Model B1, which adds a Region Endcap interaction term to the base model
above.
The addition of this interaction boosted our models Ra2 value, which had plateaued at roughly
.67 in Models A4-A7, up to .78. It should be noted that a majority of the levels of Region have
NA coefficients for their interaction with Endcap. This is because only five regions ever experience
an endcap: Florida, Mid-Atlantic, Northern California, Pacific Northwest, and Rocky Mountain.
Since only these five regions experience endcaps, R defines interaction coefficients for only four
of them, leaving the last to be encompassed in the value of the uninteracted Endcap coefficient.
In this case, that region is Rocky Mountain, meaning that the the effect of Endcap in the Rocky
Mountain region is 1573, with all other coefficients expressed in relative terms to that one.
Although this will be explored in considerably more depth in later sections, it is worth noticing
at this juncture the stark divide between the effect of an Endcap on Revenue in NC, PN, and RM
- where it is in the range of $1400-$2050 - and FL and MA, where it is between 100 and 30.
Since the success of Regions addition suggested that Endcap may vary significantly based on
geographical area, Model B2 moved from a more aggregated geographical breakdown to a more
minute one, adding an interaction term between Store and Endcap. Although the nominal R2
increased by .02 once this interaction was added, the Ra2 increased by far less, with a change of .002.
Given that information, our team elected to continue model construction without a breakdown by
individual store, on the basis of a few rationales. First, and most straightforwardly, this small
boost in explanatory power was completely out of proportion to the the increase in complexity
triggered by this interactions addition, which grew our coefficient vector from 20 to over 200.
Secondly, we opted to continue without Store-level interactions because we believe that, all else
equal, a more general model is a more powerful, as it carries more easily abstractable insights, and
is less likely to overfit to this specific dataset.
Moving back to the starting point of a single interaction between Region and Endcap, Model
B3 takes the Region interaction and applies it to the Demo variable. In this case, however, the
interaction between Region and Demo is insignificant, with p-values on individual coefficients
ranging from .23 to .92, and a coefficient of partial determination with a p-value of .83. So, while
Endcap appears to vary greatly and significantly by Region, there isnt support, at any reasonable
level of significance, for the hypothesis that Demo does likewise.
While Model B3 provided strong evidence that the effect of a Demo on revenue doesnt vary by
region, Model B4 adddresses the question of whether the effect of a Demo could vary depending
on whether previous Demos had occured at the same store. To address this, it incorporates
interactions between Demo:Demo1.3, Demo1.3:Demo4.5, and Demo:Demo4.5. Out of these, only
the foremost was significant at = .05, with the partial effect of a Demo heightened when it was
preceded by another Demo in the prior 1-3 weeks.
After consideration of these various interactions as possible additions to our model, our benchmark model is Model B5, specified as follows:
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3
+ 5 Demo4.5 + 6 Region Endcap + 7 Demo Demo1.3 + 7 Sales.Rep
Given that this is under consideration as a potential final model, tests of the underlying assumptions of homoskedasticity and normality of error terms are now necessary. The Breusch-Pagan
6

test, returned a p-value of .11 on this model, which gives us insufficient evidence to reject the null
hypothesis of homoskedastic error. However, the Shapiro-Wilk normality test returned a p-value
.f .007, leading us to reject the null hypothesis of normality at = .05.
Figure 2: Residual Distribution (Model B5)

Figure 3: Residual vs. Fitted Values (Model B5)

Since neither the metric of homoskedasticity nor that of normality is ideal, a BoxCox estimation was performed to determine whether a Y transformation may be useful in correcting some
of these deficiencies. This estimation returned a value of .75, leading us to Model B6, which is
identical in its parameters to B5, but with a transformed performed on Y. As a result of this transformation, Ra2 decreases slightly, from .785 to .75. However, there are also strong compensatory
changes on the metrics of homoskedasticity and normality, with Breusch-Pagan and Shapiro-Wilk
tests returning p-values of .91 and .54 respectively. One minor note is that in this transformed
model, Demo:Demo1.3, which had previously been borderline significant, now ceases to be significant at any reasonable level, with a p-value of .211. With all Demo:Demo interactions removed
7

from the current model, this implies that the effect of a Demo at any given point in time is not
significantly different from the effect of that Demo given that prior Demos have been performed.
In simpler terms, the effect of susbsequent Demos is simply additive rather than both additive and
multiplicative.
The adoption of this transformation, as well as the removal of the Demo:Demo1.3 interaction,
leads to Model B7, as specified here:
Revenue.75 = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5
+ 6 Region Endcap + 7 Sales.Rep
This model was a strong candidate in many ways. It balanced parsimony and explanatory power,
incorporated only minimal transformation, exhibited had no obvious deficiencies in the way of
nonconstant error variance of non-normally-distributed error terms.
However, there remained one fundamental problem with this model: the most compelling
and unexpected story it told was one that we had no clear way of explaining. This story is the
divergence previously mentioned, between the strikingly large benefit associated with an Endcap
promotion in three regions that had experienced Endcaps - NC, RM, PN - and the nonexistent
or even negative effect in the other two - FL and MA. Model B7 succeeded at identifying this
divergence, but it didnt clarify what factors made these two groups of regions different from one
another. This leaves us without a position of strength from which to answer the question: among
the six regions were Endcaps have not previously been employed, where are they likelier to behave
like the successful group (Group 1), and where are they likelier to behave like the unsuccessful one
(Group 2)?
In an attempt to determine the key factor distinguishing these groups of regions, the data was
subsetted to only include these five regions, and a binary indicator variable was created, set to 1 if
the observation belonged to a region in Group 1, and set to 0 if it belonged to a region in Group
2. We then ran a logistic regression with this Indicator (ecSuccess), and variables we thought
could be characteristic of regions - F itness, N atural, Average.Retail.P rice, andSales.Rep - as
explanatory variables. The output of this regression isnt included, because when it was run, it in
fact failed to converge, warning of a likely error due to an essentially perfect fit. This unexpected
result triggered another round of examining our data, which resulted in an elegant and, ultimately,
logically consistent realization: among stores that had experienced Endcaps, Sales.Rep, which
indicated whether the store interacted with a regional sales representative or just a national one,
was a perfect predictor of whether a store belonged to a Group 1 Region or a Group 2 Region. In
other words, out of the Endcap stores, all observations in the unsuccessful regions of MA and FL
lacked a regional Sales Rep, while all observations in the successful regions of NC, PN, and RM
had one.

3.3

Sales Rep Interaction Models

Given this realization, it occured to us that since the categorization of a region into a successful
or unsuccessful group depended on Sales.Rep, the ideal interaction may be not between Region
and Endcap but between Sales.Rep and Endcap. This interaction is demonstrated visually by the
figure below.
This led to C1, the first in our final set of models, which switched out the two aforementioned
interactions. As we had hoped, this new interaction term was significant, with a R2 of roughly .77,
at par with the prior best model. However, this new model did have clear problems, as expressed
by Breusch-Pagan and Shapiro-Wilk test yielding p values of < 2.2e16 and .0018 respectively.
8

Figure 4: Increase Due to Endcap in Areas With and Without Sales Rep

A Box-Cox estimation was performed on Model C1, but it didnt generate a useful result,
with a suggested lambda parameter of one. At this point, we reviewed the list of variables that
had previously been excluded from the model, to see if reincorporating them at this point might
correct this problem. After several unsuccessful attempts, we found one that worked: switching the
dependant variable back to U nits.Sold and reintegrating Average.Retail.P rice as an explanatory
variable.
This resulted in our final model, Model C2.

Final Model: Implications

Model C2, our final recommended model, is specified as:


U nits.Sold = 0 + 1 Sales.Rep + 2 Region + 3 Average.Retail.P rice + 4 Endcap + 5 Demo
+ 6 Demo1.3 + 7 Demo4.5 + 8 Sales.Rep Endcap
When this model is run, it generates the following coefficients and t-values:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
290.4688
18.4468 15.746 < 2e-16 ***
Demo
106.4176
5.7614 18.471 < 2e-16 ***
Demo1.3
72.6323
3.8370 18.929 < 2e-16 ***
Demo4.5
72.5876
5.1404 14.121 < 2e-16 ***
Endcap
-1.1611
12.4999 -0.093
0.9260
RegionFL
-13.2980
10.5617 -1.259
0.2082
RegionMA
-12.6337
9.9816 -1.266
0.2058
RegionMW
7.6899
6.6410
1.158
0.2471
RegionNA
-19.2996
9.9375 -1.942
0.0523 .
RegionNC
7.1791
6.9733
1.030
0.3034
RegionPN
0.2981
7.5342
0.040
0.9684
RegionRM
1.7499
7.3517
0.238
0.8119
RegionSO
-15.7381
10.7988 -1.457
0.1452
RegionSP
7.9846
7.0766
1.128
0.2594
RegionSW
-20.8752
10.6241 -1.965
0.0496 *
Sales.Rep
39.4663
10.5825
3.729
0.0002 ***
Average.Retail.Price -21.7311
3.7075 -5.861 5.74e-09 ***
Endcap:Sales.Rep
457.4982
15.3309 29.842 < 2e-16 ***
--Multiple R-squared:

0.8071,Adjusted R-squared:

0.8047

While the R2 value for this model is essentially identical to that of Model C1, this model minimizes
the problems posed by heteroskedasticity and nonnormality, with Breusch-Pagan and ShapiroWilk p-values of .26 and .73 respectively. The coefficient on Endcap when Sales.Rep == 0 isnt
significantly different from zero - a result that aligns with previous findings of Endcap having a
marginal or even negative effect within these regions. By contrast, when Sales.Rep == 1, the
coefficient on Endcap is a highly significant 457.5.
The effect of Demo on U nits.Sold is more consistent, as interaction terms both between Demo
and Region and between Demo turned out not to be significant. When a Demo is conducted
in the week of observation, that fact is associated with a 106.4 increase in volume of U nits.Sold.
When a Demo has been conducted further in the past, either by 1-3 or 4-5 weeks, that fact is
associated with 72.6 increase in U nits.Sold for both cases.
The coefficient on P rice, which in prior models had lacked significance, fits the expectations of
economic theory by being significant and negative in this model. This suggest that, in prior models,
high correlation between P rice and a variable not yet in the model was a source of endogeneity
and thus bias in our estimates.
10

Although Region adds relatively little predictive power to this model, its inclusion helped
stem problems of heteroskedasticity and nonnormality that were present otherwise, and so it was
included for its normalizing effect.
The table below summarizes the average estimated revenue boosts associated with each of
the marketing techniques in each Region, using P rice = the average P rice in that region and
Sales.Rep set equal to 1 if there are stores within that region who interface with a regional rather
than national sales representative. Revenue benefit values which lie within the range of the dataset
(i.e. where we have observations for that combination of Region and Sales.Rep) are are indicated
with .
Table 3: Estimated Revenue Benefit of Marketing Technique By Region
Region Sales.Rep meanPrice n
Endcap
Demo
Demo1.3 Demo4.5

NE
0
$4.33
30
$2.69 $462.69
$318.00 $323.13
NE
1
$4.09
80
$1860.84 $437.15 $300.45 $305.29
FL
0
$4.15
88
$2.57
$443.05
$304.50 $309.41
MA
0
$3.61
209
$2.24 $385.80 $265.15 $269.43
MW
1
$3.94
176
$1789.96 $420.50 $289.00 $293.66
NA
0
$4.13
143
$2.57
$441.40
$303.37
$308.25

NC
1
$4.53
165 $2058.67
$483.62
$332.39
$337.74

PN
1
$4.05
99 $1841.52
$432.61
$297.33
$302.12
RM
1
$4.40
110 $2001.14 $470.11 $323.10 $328.30
SO
1
$3.80
77
$2.36
$405.61
$278.77
$283.26

SP
1
$4.40
132
$1999.50 $469.72
$322.83
$328.04
SW
0
$4.20
77
$2.60
$448.18
$308.03
$312.99

Recommendations

Based on the results of our final model, we can provide an analytically sound answer to the
questions posed in our problem statement: namely, whether marketing techniques undertaken by
Goodbelly resulted in substantial and long-lived increases in sales.
Within this data, in-store demonstrations were shown to have a consistent, positive, and significant effect that didnt vary significantly between regions or as a result of an additional demonstration having taken place during a prior week. This positive effect was predictably highest, with
an average boost of 106 units sold and $430 in increased revenue, when the demonstration had
occured that week. The effect then tapered to 72.6 additional units sold in the week after the
demonstration, and stayed elevated at this level for at least five weeks thereafter. This result has
a few key implications for future marketing decisions. Firstly, this analysis contradicts, at least
in regards to the five weeks following a demonstration, the worry that effects of marketing are
short-lived, as the positive effect continues at 75% strength for up to five weeks, and potentially
longer. Secondly, since no interactions between this weeks demonstration and prior weeks demonstrations were ultimately found to be significant, our results suggest that, in order to operate most
cost-effectively and allow the impact of a demo to take full effect, repeat demos in the same store
should be spaced at least five weeks apart.
The campaign of constructing decorative endcaps had a stronger positive effect on sales, but
a less universal one. In stores and in regions with a regional sales representative, endcaps were
associated with an average estimated sales boost of 457.5 units, translating to $1860 $2050 in
11

additional revenue. However, in stores without a sales representative, that boost was essentially
nonexistent. On the most basic level, this result would strongly recommends that the endcap
campaign be continued in regions - NC, PN, RM - where it has found success, and minimized or
eliminated in regions - MA and FL - where it has not. Projecting slightly further from the results
of this analysis, this result suggests that the expansion of endcaps into new stores and regions is
likeliest to lead to strong positive results in areas with a regional sales representative. By this
criteria, Midwest, New England, and South Pacific would be ideal expansion targes.
However, it is worth taking a moment to examine why it is that the presence of a sales representative is associated with such a strong positive effect. Maybe regional sales representatives
have a deeper body of knowledge about their particular region, and are better able to target endcap installation to stores where they suspect it will be successful. Perhaps the closer connections
that are formed between regional representatives and store employees motivate stores to construct
endcaps that are more well-decorated and located more prominently within the store. It could
even be that the same criteria that determines whether a regional sales representative is appointed
is determinative of whether an endcap campaign is successful. The primary takeaway here is that
while this report finds an association here, if Goodbelly wants to understand on a deeper level
the causal mechanics at work here, further research would be necessary. This could potentially
take the form of interviewing both regional and national sales representatives to ascertain the time
spent on their endcap campaigns or more qualitative information about the endcaps themselves,
such as their location within the store.
It should be noted that, since information pertaining to the costs of each technique, all of these
recommendations are made on the basis of increased revenue alone, rather than a more clear-cut
cost-benefit analysis. However, given this caveat, if the revenue increases estimated by this report
do indeed outstrip costs, then we recommend continuing or expanding all demonstration campaigns
and a subset of endcap campaigns, and hope that this analysis can lead to more confident and
ultimately successful decision-making by Goodbelly.

6
6.1

Appendix
Univariate Statistics
Table 4: Region Abbreviations and Names
Abbreviation
Name
NE
New England
FL
Florida
MA
Mid Atlantic
MW
Midwest
NA
North Atlantic
NC
North California
PN
Pacific Northwest
RM
Rocky Mountain
SO
South
SP
South Pacific
SW
Southwest

12

Figure 5: Distribution of Units Sold

Figure 6: Distribution of Average Retail Price

13

Figure 7: Distribution of Revenue

Figure 8: Distribution of Total Units Sold Per Store

14

6.2

Model Specification & Output


Model A1
(Date SLR)
U nits.Sold = 0 + 1 Date
lm(formula = Units.Sold ~ Date, data = gbData)
Residuals:
Min
1Q
-239.15 -59.45

Median
-14.32

3Q
38.88

Max
770.94

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17090.576
1934.447 -8.835
<2e-16 ***
Date
1.175
0.131
8.966
<2e-16 ***
--Residual standard error: 107.9 on 1384 degrees of freedom
Multiple R-squared: 0.0549,Adjusted R-squared: 0.05421
F-statistic: 80.39 on 1 and 1384 DF, p-value: < 2.2e-16
Model A2
(A1 + Price)
U nits.Sold = 0 + 1 Date + 2 Average.Retail.P rice
lm(formula = Units.Sold ~ Date + Average.Retail.Price, data = gbData)
Residuals:
Min
1Q Median
3Q
Max
-241.03 -59.08 -13.65
38.83 770.74
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-1.749e+04 1.975e+03 -8.853
<2e-16 ***
Date
1.200e+00 1.334e-01
8.992
<2e-16 ***
Average.Retail.Price 6.330e+00 6.369e+00
0.994
0.32
--Residual standard error: 107.9 on 1383 degrees of freedom
Multiple R-squared: 0.05557,Adjusted R-squared: 0.05421
F-statistic: 40.69 on 2 and 1383 DF, p-value: < 2.2e-16

Model A3
(A1 + Explanatory Marketing Variables)
Revenue = 0 + 1 Date + 2 Demo + 3 Demo1.3 + 4 Demo4.5 + 5 Endcap
15

lm(formula = Revenue ~ Date + Demo + Demo1.3 + Demo4.5 + Endcap,


data = gbData)
Residuals:
Min
1Q
-1703.0 -179.9

Median
-8.6

3Q
172.9

Max
1437.4

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11170.0531 6114.5655
1.827
0.0679 .
Date
-0.6975
0.4141 -1.684
0.0923 .
Demo
582.4088
36.9297 15.771
<2e-16 ***
Demo1.3
405.0617
24.9277 16.249
<2e-16 ***
Demo4.5
395.0120
32.5316 12.142
<2e-16 ***
Endcap
1188.0596
45.6155 26.045
<2e-16 ***
--Residual standard error: 320.4 on 1380 degrees of freedom
Multiple R-squared: 0.5247,Adjusted R-squared: 0.523
F-statistic: 304.7 on 5 and 1380 DF, p-value: < 2.2e-16
Model A4
(A3 - Date + Region)
Revenue = 0 + 1 Demo + 2 Demo1.3 + 3 Demo4.5 + 4 Region + 5 Endcap
lm(formula = Revenue ~ Demo + Demo1.3 + Demo4.5 + Region + Endcap,
data = gbData)
Residuals:
Min
1Q
-1354.74 -152.39

Median
1.84

3Q
144.90

Max
1146.75

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
943.40
25.99 36.299 < 2e-16 ***
Demo
487.39
31.32 15.562 < 2e-16 ***
Demo1.3
286.96
20.83 13.777 < 2e-16 ***
Demo4.5
275.80
27.94
9.871 < 2e-16 ***
RegionFL
-311.57
38.91 -8.008 2.47e-15 ***
RegionMA
-268.78
31.53 -8.524 < 2e-16 ***
RegionMW
40.85
32.56
1.255 0.20985
RegionNA
-197.40
34.29 -5.757 1.05e-08 ***
RegionNC
237.42
33.14
7.164 1.28e-12 ***
RegionPN
106.41
37.69
2.823 0.00482 **
RegionRM
116.23
36.36
3.196 0.00142 **
RegionSO
-214.23
40.05 -5.349 1.04e-07 ***
RegionSP
138.16
34.70
3.981 7.21e-05 ***
RegionSW
-196.59
40.05 -4.908 1.03e-06 ***
16

Endcap
1147.04
39.33 29.166 < 2e-16 ***
--Residual standard error: 267.4 on 1371 degrees of freedom
Multiple R-squared: 0.6712,Adjusted R-squared: 0.6678
F-statistic: 199.9 on 14 and 1371 DF, p-value: < 2.2e-16
Model A5
(A4 + Fitness)
Revenue = 0 + 1 Region + 2 Demo + 3 Demo1.3 + 4 Demo4.5 + 5 Endcap + 6 F itness
lm(formula = Revenue ~ Region + Demo + Demo1.3 + Demo4.5 + Endcap +
Fitness, data = gbData)
Residuals:
Min
1Q
-1356.96 -150.23

Median
2.45

3Q
147.41

Max
1147.65

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 937.605
27.902 33.603 < 2e-16 ***
RegionFL
-312.657
38.964 -8.024 2.18e-15 ***
RegionMA
-269.371
31.557 -8.536 < 2e-16 ***
RegionMW
36.683
33.378
1.099 0.27196
RegionNA
-197.495
34.294 -5.759 1.04e-08 ***
RegionNC
236.555
33.184
7.129 1.64e-12 ***
RegionPN
108.168
37.828
2.860 0.00431 **
RegionRM
115.365
36.404
3.169 0.00156 **
RegionSO
-216.248
40.217 -5.377 8.89e-08 ***
RegionSP
136.798
34.790
3.932 8.84e-05 ***
RegionSW
-197.436
40.088 -4.925 9.46e-07 ***
Demo
487.504
31.328 15.561 < 2e-16 ***
Demo1.3
287.071
20.835 13.778 < 2e-16 ***
Demo4.5
276.034
27.951
9.876 < 2e-16 ***
Endcap
1147.327
39.341 29.164 < 2e-16 ***
Fitness
2.734
4.785
0.571 0.56783
--Residual standard error: 267.5 on 1370 degrees of freedom
Multiple R-squared: 0.6712,Adjusted R-squared: 0.6676
F-statistic: 186.5 on 15 and 1370 DF, p-value: < 2.2e-16
Model A6
(A4 + Natural)
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5 + 6 N atural
lm(formula = Revenue ~ Region + Demo + Demo1.3 + Demo4.5 + Endcap +
Natural, data = gbData)
17

Residuals:
Min
1Q
-1350.2 -152.0

Median
1.4

3Q
146.8

Max
1141.3

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 949.905
28.514 33.313 < 2e-16 ***
RegionFL
-312.316
38.941 -8.020 2.24e-15 ***
RegionMA
-268.474
31.545 -8.511 < 2e-16 ***
RegionMW
41.724
32.609
1.280 0.20092
RegionNA
-197.883
34.305 -5.768 9.88e-09 ***
RegionNC
238.591
33.217
7.183 1.12e-12 ***
RegionPN
104.911
37.800
2.775 0.00559 **
RegionRM
115.064
36.434
3.158 0.00162 **
RegionSO
-216.473
40.264 -5.376 8.93e-08 ***
RegionSP
138.295
34.710
3.984 7.12e-05 ***
RegionSW
-199.442
40.389 -4.938 8.86e-07 ***
Demo
486.983
31.336 15.540 < 2e-16 ***
Demo1.3
286.370
20.861 13.727 < 2e-16 ***
Demo4.5
275.511
27.953
9.856 < 2e-16 ***
Endcap
1149.041
39.503 29.087 < 2e-16 ***
Natural
-4.263
7.674 -0.556 0.57859
--Residual standard error: 267.5 on 1370 degrees of freedom
Multiple R-squared: 0.6712,Adjusted R-squared: 0.6676
F-statistic: 186.5 on 15 and 1370 DF, p-value: < 2.2e-16
Model A7
(A4 + Sales.Rep)
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5 + 6Sales.Rep
lm(formula = Revenue ~ Region + Demo + Demo1.3 + Demo4.5 + Endcap +
Sales.Rep, data = gbData)
Residuals:
Min
1Q
-1353.41 -151.92

Median
1.87

3Q
146.13

Max
1146.04

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 852.559
49.100 17.364 < 2e-16 ***
RegionFL
-220.911
56.922 -3.881 0.000109 ***
RegionMA
-177.677
52.336 -3.395 0.000706 ***
RegionMW
7.119
36.014
0.198 0.843338
RegionNA
-106.560
53.940 -1.976 0.048408 *
RegionNC
203.121
36.647
5.543 3.57e-08 ***
18

RegionPN
72.437
40.743
1.778 0.075640 .
RegionRM
82.567
39.463
2.092 0.036600 *
RegionSO
-123.390
57.766 -2.136 0.032854 *
RegionSP
104.067
38.019
2.737 0.006276 **
RegionSW
-105.750
57.766 -1.831 0.067368 .
Demo
484.688
31.302 15.484 < 2e-16 ***
Demo1.3
284.642
20.828 13.666 < 2e-16 ***
Demo4.5
279.623
27.958 10.001 < 2e-16 ***
Endcap
1147.759
39.276 29.223 < 2e-16 ***
Sales.Rep
125.128
57.414
2.179 0.029472 *
--Residual standard error: 267 on 1370 degrees of freedom
Multiple R-squared: 0.6723,Adjusted R-squared: 0.6687
F-statistic: 187.4 on 15 and 1370 DF, p-value: < 2.2e-16

Model B1
(A7 + Region:Endcap Interaction)
Revenue = 0 + 1 Region + 2 Demo + 3 Demo1.3 + 4 Demo4.5 + 5 Endcap
+ 6 Region Endcap + 7 Sales.Rep
lm(formula = Revenue ~ Region + Demo + Demo1.3 + Demo4.5 + Endcap +
Region * Endcap + Sales.Rep, data = gbData)
Residuals:
Min
1Q
-676.54 -146.08

Median
-6.82

3Q
136.29

Max
886.60

Coefficients: (6 not defined because of singularities)


Estimate Std. Error t value Pr(>|t|)
(Intercept)
846.00
39.65 21.336 < 2e-16 ***
RegionFL
-74.62
46.59 -1.602 0.10948
RegionMA
-137.67
42.34 -3.251 0.00118 **
RegionMW
7.40
29.08
0.254 0.79919
RegionNA
-100.00
43.56 -2.296 0.02184 *
RegionNC
118.21
29.94
3.948 8.27e-05 ***
RegionPN
22.68
33.97
0.668 0.50446
RegionRM
64.69
32.09
2.016 0.04402 *
RegionSO
-116.83
46.65 -2.505 0.01238 *
RegionSP
100.22
30.70
3.264 0.00112 **
RegionSW
-99.19
46.65 -2.126 0.03365 *
Demo
478.18
25.38 18.840 < 2e-16 ***
Demo1.3
288.81
16.93 17.061 < 2e-16 ***
Demo4.5
311.01
22.66 13.724 < 2e-16 ***
Endcap
1573.98
109.94 14.316 < 2e-16 ***
Sales.Rep
129.32
46.36
2.789 0.00535 **
19

RegionFL:Endcap -1549.78
130.09 -11.913 < 2e-16 ***
RegionMA:Endcap -1676.08
141.65 -11.833 < 2e-16 ***
RegionMW:Endcap
NA
NA
NA
NA
RegionNA:Endcap
NA
NA
NA
NA
RegionNC:Endcap
475.65
124.56
3.819 0.00014 ***
RegionPN:Endcap -130.99
124.15 -1.055 0.29158
RegionRM:Endcap
NA
NA
NA
NA
RegionSO:Endcap
NA
NA
NA
NA
RegionSP:Endcap
NA
NA
NA
NA
RegionSW:Endcap
NA
NA
NA
NA
--Residual standard error: 215.6 on 1366 degrees of freedom
Multiple R-squared: 0.7869,Adjusted R-squared: 0.784
F-statistic: 265.5 on 19 and 1366 DF, p-value: < 2.2e-16
Model B2 (B1 + Store + Store:Endcap Interaction)
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5
+ 6 Region Endcap + 7 Store + 8 Store Endcap + 9 Sales.Rep
n.b: Full regression output omitted due to the high degree of complexity, with more than 250
coefficients. Summary model statistics as well as ANOVA output are provided instead.
Residual standard error: 214.6 on 1245 degrees of freedom
Multiple R-squared: 0.8076,Adjusted R-squared: 0.786
F-statistic: 37.33 on 140 and 1245 DF, p-value: < 2.2e-16
Analysis of Variance Table
Response: Revenue
Df
Region
10
Demo
1
Demo1.3
1
Demo4.5
1
Endcap
1
Store
115
Region:Endcap
4
Endcap:Store
7
Residuals
1245

Sum Sq Mean Sq
F value Pr(>F)
91338349 9133835 198.2805 < 2e-16 ***
21652374 21652374 470.0373 < 2e-16 ***
19032981 19032981 413.1746 < 2e-16 ***
7225538 7225538 156.8545 < 2e-16 ***
60823921 60823921 1320.3869 < 2e-16 ***
21645750
188224
4.0860 < 2e-16 ***
18230088 4557522
98.9363 < 2e-16 ***
803136
114734
2.4907 0.01528 *
57351206
46065

Model B3
(B1 + Region:Demo Interaction)
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5
+ 6 Region Endcap + 8 Region Demo + 7 Sales.Rep
lm(formula = Revenue ~ Region + Demo + Demo1.3 + Demo4.5 + Endcap +
20

Region * Endcap + Demo * Region + Sales.Rep, data = gbData)


Residuals:
Min
1Q
-678.40 -148.33

Median
-3.63

3Q
135.76

Max
883.78

Coefficients: (10 not defined because of singularities)


Estimate Std. Error t value Pr(>|t|)
(Intercept)
846.028
39.700 21.311 < 2e-16 ***
RegionFL
-74.648
46.650 -1.600 0.109791
RegionMA
-135.086
42.512 -3.178 0.001518 **
RegionMW
6.454
30.381
0.212 0.831791
RegionNA
-100.029
43.612 -2.294 0.021964 *
RegionNC
116.347
30.996
3.754 0.000182 ***
RegionPN
15.798
34.845
0.453 0.650342
RegionRM
61.191
33.800
1.810 0.070458 .
RegionSO
-116.859
46.705 -2.502 0.012463 *
RegionSP
94.353
32.130
2.937 0.003374 **
RegionSW
-99.219
46.705 -2.124 0.033817 *
Demo
441.396
91.736
4.812 1.66e-06 ***
Demo1.3
288.567
17.020 16.955 < 2e-16 ***
Demo4.5
310.970
22.785 13.648 < 2e-16 ***
Endcap
1573.435
110.209 14.277 < 2e-16 ***
Sales.Rep
132.113
46.895
2.817 0.004915 **
RegionFL:Endcap -1549.231
130.350 -11.885 < 2e-16 ***
RegionMA:Endcap -1670.111
142.159 -11.748 < 2e-16 ***
RegionMW:Endcap
NA
NA
NA
NA
RegionNA:Endcap
NA
NA
NA
NA
RegionNC:Endcap
476.420
124.825
3.817 0.000141 ***
RegionPN:Endcap -159.159
126.302 -1.260 0.207834
RegionRM:Endcap
NA
NA
NA
NA
RegionSO:Endcap
NA
NA
NA
NA
RegionSP:Endcap
NA
NA
NA
NA
RegionSW:Endcap
NA
NA
NA
NA
RegionFL:Demo
NA
NA
NA
NA
RegionMA:Demo
-10.569
112.004 -0.094 0.924835
RegionMW:Demo
18.031
107.049
0.168 0.866268
RegionNA:Demo
NA
NA
NA
NA
RegionNC:Demo
19.998
117.956
0.170 0.865398
RegionPN:Demo
148.704
125.202
1.188 0.235153
RegionRM:Demo
42.249
108.858
0.388 0.697994
RegionSO:Demo
NA
NA
NA
NA
RegionSP:Demo
68.583
111.279
0.616 0.537788
RegionSW:Demo
NA
NA
NA
NA
--Residual standard error: 215.9 on 1360 degrees of freedom
21

Multiple R-squared: 0.7874,Adjusted R-squared: 0.7835


F-statistic: 201.5 on 25 and 1360 DF, p-value: < 2.2e-16
Model B4 (B1 : Demo/Demo Interactions)
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5
+6 RegionEndcap+7 DemoDemo1.3+Demo1.3Demo4.5+DemoDemo4.5+7 Sales.Rep
lm(formula = Revenue ~ Region + Demo + Demo1.3 + Demo4.5 + Endcap +
Region * Endcap + Demo * Demo1.3 + Demo1.3 * Demo4.5 + Demo *
Demo4.5 + Sales.Rep, data = gbData)
Residuals:
Min
1Q
-679.03 -149.14

Median
-5.51

3Q
135.62

Max
882.60

Coefficients: (6 not defined because


Estimate Std. Error
(Intercept)
846.479
39.669
RegionFL
-75.097
46.573
RegionMA
-135.224
42.291
RegionMW
5.895
29.105
RegionNA
-100.480
43.562
RegionNC
116.688
29.969
RegionPN
18.954
33.980
RegionRM
61.579
32.120
RegionSO
-117.310
46.642
RegionSP
100.923
30.680
RegionSW
-99.670
46.642
Demo
443.985
28.766
Demo1.3
282.093
18.337
Demo4.5
310.871
25.160
Endcap
1531.290
112.207
Sales.Rep
132.849
46.315
RegionFL:Endcap -1507.089
131.910
RegionMA:Endcap -1627.203
144.128
RegionMW:Endcap
NA
NA
RegionNA:Endcap
NA
NA
RegionNC:Endcap
519.498
126.706
RegionPN:Endcap
-90.004
126.495
RegionRM:Endcap
NA
NA
RegionSO:Endcap
NA
NA
RegionSP:Endcap
NA
NA
RegionSW:Endcap
NA
NA
Demo:Demo1.3
127.918
63.508
Demo1.3:Demo4.5
-58.548
60.823
Demo:Demo4.5
145.612
110.244
22

of singularities)
t value Pr(>|t|)
21.339 < 2e-16 ***
-1.612 0.107093
-3.197 0.001418 **
0.203 0.839528
-2.307 0.021227 *
3.894 0.000104 ***
0.558 0.577072
1.917 0.055427 .
-2.515 0.012013 *
3.290 0.001029 **
-2.137 0.032781 *
15.435 < 2e-16 ***
15.383 < 2e-16 ***
12.356 < 2e-16 ***
13.647 < 2e-16 ***
2.868 0.004189 **
-11.425 < 2e-16 ***
-11.290 < 2e-16 ***
NA
NA
NA
NA
4.100 4.38e-05 ***
-0.712 0.476882
NA
NA
NA
NA
NA
NA
NA
NA
2.014 0.044186 *
-0.963 0.335919
1.321 0.186785

--Residual standard error: 215.3 on 1363 degrees of freedom


Multiple R-squared: 0.7881,Adjusted R-squared: 0.7847
F-statistic: 230.4 on 22 and 1363 DF, p-value: < 2.2e-16
Model B5
(B1 + Demo:Demo1.3 Interaction)
Revenue = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5
+ 6 Region Endcap + 7 Demo Demo1.3 + 7 Sales.Rep
lm(formula = Revenue ~ Region + Demo + Demo1.3 + Demo4.5 + Endcap +
Region * Endcap + Demo * Demo1.3 + Sales.Rep, data = gbData)
Residuals:
Min
1Q
-678.14 -148.91

Median
-2.61

3Q
135.85

Max
881.66

Coefficients: (6 not defined because


Estimate Std. Error
(Intercept)
847.529
39.594
RegionFL
-76.052
46.525
RegionMA
-135.528
42.283
RegionMW
4.063
29.071
RegionNA
-101.530
43.495
RegionNC
115.068
29.921
RegionPN
18.268
33.973
RegionRM
62.904
32.052
RegionSO
-118.360
46.579
RegionSP
100.991
30.656
RegionSW
-100.720
46.579
Demo
448.034
28.507
Demo1.3
277.345
17.615
Demo4.5
307.204
22.686
Endcap
1549.236
110.293
Sales.Rep
132.734
46.314
RegionFL:Endcap -1525.130
130.319
RegionMA:Endcap -1642.973
142.147
RegionMW:Endcap
NA
NA
RegionNA:Endcap
NA
NA
RegionNC:Endcap
503.161
124.928
RegionPN:Endcap -108.683
124.331
RegionRM:Endcap
NA
NA
RegionSO:Endcap
NA
NA
RegionSP:Endcap
NA
NA
RegionSW:Endcap
NA
NA
Demo:Demo1.3
143.276
62.069
23

of singularities)
t value Pr(>|t|)
21.405 < 2e-16 ***
-1.635 0.102352
-3.205 0.001381 **
0.140 0.888878
-2.334 0.019725 *
3.846 0.000126 ***
0.538 0.590857
1.963 0.049900 *
-2.541 0.011162 *
3.294 0.001012 **
-2.162 0.030765 *
15.716 < 2e-16 ***
15.745 < 2e-16 ***
13.541 < 2e-16 ***
14.047 < 2e-16 ***
2.866 0.004221 **
-11.703 < 2e-16 ***
-11.558 < 2e-16 ***
NA
NA
NA
NA
4.028 5.94e-05 ***
-0.874 0.382195
NA
NA
NA
NA
NA
NA
NA
NA
2.308 0.021128 *

--Residual standard error: 215.3 on 1365 degrees of freedom


Multiple R-squared: 0.7878,Adjusted R-squared: 0.7847
F-statistic: 253.3 on 20 and 1365 DF, p-value: < 2.2e-16
studentized Breusch-Pagan test
data: test
BP = 35.2751, df = 26, p-value = 0.1058

Shapiro-Wilk normality test


data: test$residuals
W = 0.9969, p-value = 0.006932
Model B6
(B5 With Y 0 = Y .75 Transform)
Revenue.75 = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5
+ 6 Region Endcap + 7 Demo Demo1.3 + 7 Sales.Rep
lm(formula = Revenue^0.75 ~ Region + Demo + Demo1.3 + Demo4.5 +
Endcap + Region * Endcap + Demo * Demo1.3 + Sales.Rep, data = gbData)
Residuals:
Min
1Q
-101.968 -19.252

Median
0.226

3Q
18.607

Max
108.774

Coefficients: (6 not defined because of singularities)


Estimate Std. Error t value Pr(>|t|)
(Intercept)
156.5346
5.2827 29.631 < 2e-16 ***
RegionFL
-10.9843
6.2074 -1.770 0.077028 .
RegionMA
-18.5830
5.6415 -3.294 0.001013 **
RegionMW
0.4917
3.8788
0.127 0.899139
RegionNA
-14.8584
5.8032 -2.560 0.010563 *
RegionNC
14.7704
3.9922
3.700 0.000224 ***
RegionPN
2.4013
4.5328
0.530 0.596361
RegionRM
8.0439
4.2764
1.881 0.060188 .
RegionSO
-17.1849
6.2147 -2.765 0.005766 **
RegionSP
12.9378
4.0901
3.163 0.001595 **
RegionSW
-14.8376
6.2147 -2.387 0.017099 *
Demo
56.5647
3.8035 14.872 < 2e-16 ***
Demo1.3
35.9593
2.3503 15.300 < 2e-16 ***
Demo4.5
38.8173
3.0268 12.824 < 2e-16 ***
Endcap
170.9696
14.7156 11.618 < 2e-16 ***
Sales.Rep
18.1382
6.1793
2.935 0.003388 **
RegionFL:Endcap -167.5761
17.3875 -9.638 < 2e-16 ***
24

RegionMA:Endcap -180.4750
18.9656 -9.516 < 2e-16 ***
RegionMW:Endcap
NA
NA
NA
NA
RegionNA:Endcap
NA
NA
NA
NA
RegionNC:Endcap
56.5342
16.6682
3.392 0.000714 ***
RegionPN:Endcap
-7.5310
16.5885 -0.454 0.649908
RegionRM:Endcap
NA
NA
NA
NA
RegionSO:Endcap
NA
NA
NA
NA
RegionSP:Endcap
NA
NA
NA
NA
RegionSW:Endcap
NA
NA
NA
NA
Demo:Demo1.3
10.3542
8.2814
1.250 0.211403
--Residual standard error: 28.72 on 1365 degrees of freedom
Multiple R-squared: 0.7534,Adjusted R-squared: 0.7498
F-statistic: 208.5 on 20 and 1365 DF, p-value: < 2.2e-16

studentized Breusch-Pagan test


data: test
BP = 16.8634, df = 26, p-value = 0.9132

Shapiro-Wilk normality test


data: test$residuals
W = 0.9989, p-value = 0.5397
Model B7 (B6 - Demo:Demo1.3)
Revenue.75 = 0 + 1 Region + 2 Endcap + 3 Demo + 4 Demo1.3 + 5 Demo4.5
+ 6 Region Endcap + 7 Sales.Rep
lm(formula = Revenue^0.75 ~ Region + Demo + Demo1.3 + Demo4.5 +
Endcap + Region * Endcap + Sales.Rep, data = gbData)
Residuals:
Min
1Q
Median
-101.852 -19.374
-0.111

3Q
Max
18.657 109.131

Coefficients: (6 not defined because of singularities)


Estimate Std. Error t value Pr(>|t|)
(Intercept)
156.4244
5.2831 29.609 < 2e-16 ***
RegionFL
-10.8812
6.2082 -1.753 0.079875 .
RegionMA
-18.7376
5.6414 -3.321 0.000919 ***
RegionMW
0.7329
3.8748
0.189 0.850010
RegionNA
-14.7481
5.8037 -2.541 0.011159 *
RegionNC
14.9972
3.9889
3.760 0.000177 ***
RegionPN
2.7203
4.5266
0.601 0.547957
25

RegionRM
8.1731
4.2761
1.911 0.056167 .
RegionSO
-17.0746
6.2154 -2.747 0.006090 **
RegionSP
12.8824
4.0907
3.149 0.001673 **
RegionSW
-14.7274
6.2154 -2.370 0.017950 *
Demo
58.7430
3.3818 17.370 < 2e-16 ***
Demo1.3
36.7875
2.2554 16.311 < 2e-16 ***
Demo4.5
39.0925
3.0194 12.947 < 2e-16 ***
Endcap
172.7580
14.6489 11.793 < 2e-16 ***
Sales.Rep
17.8919
6.1775
2.896 0.003836 **
RegionFL:Endcap -169.3573
17.3326 -9.771 < 2e-16 ***
RegionMA:Endcap -182.8674
18.8727 -9.690 < 2e-16 ***
RegionMW:Endcap
NA
NA
NA
NA
RegionNA:Endcap
NA
NA
NA
NA
RegionNC:Endcap
54.5464
16.5956
3.287 0.001039 **
RegionPN:Endcap
-9.1429
16.5418 -0.553 0.580549
RegionRM:Endcap
NA
NA
NA
NA
RegionSO:Endcap
NA
NA
NA
NA
RegionSP:Endcap
NA
NA
NA
NA
RegionSW:Endcap
NA
NA
NA
NA
--Residual standard error: 28.73 on 1366 degrees of freedom
Multiple R-squared: 0.7531,Adjusted R-squared: 0.7497
F-statistic: 219.3 on 19 and 1366 DF, p-value: < 2.2e-16
studentized Breusch-Pagan test
data: test
BP = 17.1345, df = 25, p-value = 0.8769
Shapiro-Wilk normality test
data: test$residuals
W = 0.9988, p-value = 0.4771

Model C1
Revenue = 0 +1 Sales.Rep+2 Endcap+3 Demo+4 Demo1.3+5 Demo4.5+6 Sales.RepEndcap
lm(formula = Revenue ~ Demo + Demo1.3 + Demo4.5 + Endcap + Sales.Rep *
Endcap + Sales.Rep, data = gbData)
Residuals:
Min
1Q
-940.18 -152.37

Median
-9.58

3Q
143.23

Max
827.31

Coefficients:
Estimate Std. Error t value Pr(>|t|)
26

(Intercept)
740.043
Demo
453.726
Demo1.3
275.410
Demo4.5
331.758
Endcap
-9.924
Sales.Rep
294.568
Endcap:Sales.Rep 1736.923
---

9.229
26.151
17.199
23.104
55.406
12.993
67.450

80.190
17.350
16.013
14.359
-0.179
22.672
25.751

<2e-16
<2e-16
<2e-16
<2e-16
0.858
<2e-16
<2e-16

***
***
***
***
***
***

Residual standard error: 225 on 1379 degrees of freedom


Multiple R-squared: 0.7659,Adjusted R-squared: 0.7649
F-statistic: 751.9 on 6 and 1379 DF, p-value: < 2.2e-16
studentized Breusch-Pagan test
data: test
BP = 116.0507, df = 6, p-value < 2.2e-16
Shapiro-Wilk normality test
data: test$residuals
W = 0.9962, p-value = 0.001782
Model C2
U nits.Sold = 0 + 1 Sales.Rep2 Average.Retail.P rice + 3 Endcap + 4 Demo
+ 5 Demo1.3 + 6 Demo4.5 + 7 Sales.Rep Endcap
lm(formula = Units.Sold ~ Demo + Demo1.3 + Demo4.5 + Endcap +
Region + Sales.Rep * Endcap + Average.Retail.Price + Sales.Rep,
data = gbData)
Residuals:
Min
1Q
-171.010 -33.577

Median
0.677

3Q
33.349

Max
179.660

Coefficients:
(Intercept)
Demo
Demo1.3
Demo4.5
Endcap
RegionFL
RegionMA
RegionMW
RegionNA
RegionNC

Estimate Std. Error t value Pr(>|t|)


290.4688
18.4468 15.746 < 2e-16 ***
106.4176
5.7614 18.471 < 2e-16 ***
72.6323
3.8370 18.929 < 2e-16 ***
72.5876
5.1404 14.121 < 2e-16 ***
-1.1611
12.4999 -0.093
0.9260
-13.2980
10.5617 -1.259
0.2082
-12.6337
9.9816 -1.266
0.2058
7.6899
6.6410
1.158
0.2471
-19.2996
9.9375 -1.942
0.0523 .
7.1791
6.9733
1.030
0.3034
27

RegionPN
RegionRM
RegionSO
RegionSP
RegionSW
Sales.Rep
Average.Retail.Price
Endcap:Sales.Rep
---

0.2981
1.7499
-15.7381
7.9846
-20.8752
39.4663
-21.7311
457.4982

7.5342
7.3517
10.7988
7.0766
10.6241
10.5825
3.7075
15.3309

0.040
0.9684
0.238
0.8119
-1.457
0.1452
1.128
0.2594
-1.965
0.0496 *
3.729
0.0002 ***
-5.861 5.74e-09 ***
29.842 < 2e-16 ***

Residual standard error: 49.05 on 1368 degrees of freedom


Multiple R-squared: 0.8071,Adjusted R-squared: 0.8047
F-statistic: 336.7 on 17 and 1368 DF, p-value: < 2.2e-16
studentized Breusch-Pagan test
data: yz
BP = 20.3246, df = 17, p-value = 0.258
Shapiro-Wilk normality test
data: yz$residuals
W = 0.9991, p-value = 0.7279

28

6.3

Region & Sales.Rep Interaction Effects


Figure 9: Model C2: Residual Distribution

Figure 10: Model C2: Residuals vs Fitted values

29

Figure 11: Model C2: Residuals vs Average Retail Price

Figure 12: Impact of Demo on Revenue (given Endcap = 0)

30

6.4

Summary Tables
Table 5: Summary of Units Sold and Price By Region
Units.Sold
Average.Retail.Price
Region
n
mean
sd
mean
sd
FL
88 188.49 48.31
4.15
0.24
MA
209 222.40 62.41
3.61
0.25
MW
176 282.52 71.67
3.94
0.37
NA
143 181.32 50.97
4.13
0.36
NC
165 312.36 145.47
4.53
0.31
NE
110 254.37 67.46
4.16
0.50
PN
99 343.80 207.35
4.05
0.44
RM
110 302.39 126.00
4.40
0.53
SO
77 192.16 49.01
3.80
0.28
SP
132 285.79 63.37
4.40
0.33
SW
77 178.36 50.39
4.20
0.32
All
1386 253.82 111.00
4.11
0.46

Table 6: Breakdown of Observations When Endcap == 1


Demo
0
1
Demo1.3
Demo1.3
0
1
0
1
Demo4.5
Demo4.5
Demo4.5
Demo4.5
0
1
0
1
0
1
0
1
Region n length length length length length length length length
FL
11
11
0
0
0
0
0
0
0
MA
6
0
2
3
0
1
0
0
0
NC
15
10
1
3
0
1
0
0
0
PN
17
5
0
7
0
3
0
2
0
RM
4
2
0
1
0
0
0
0
1
All
53
28
3
14
0
5
0
2
1

31

32

Region
n
FL
77
MA
203
MW
176
NA
143
NC
150
NE
110
PN
82
RM
106
SO
77
SP
132
SW
77
All
1333

Table 7: Breakdown of Observations for Endcap == 0


Endcap
0
1
Demo
Demo
0
1
0
1
Demo1.3
Demo1.3
Demo1.3
Demo1.3
0
1
0
1
0
1
0
Demo4.5
Demo4.5
Demo4.5
Demo4.5
Demo4.5
Demo4.5
Demo4.5
De
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
length length length length length length length length length length length length length length length
75
2
0
0
0
0
0
0
0
0
0
0
0
0
0
148
12
32
0
11
0
0
0
0
0
0
0
0
0
0
119
6
30
4
12
0
5
0
0
0
0
0
0
0
0
143
0
0
0
0
0
0
0
0
0
0
0
0
0
0
96
17
23
6
5
0
2
1
0
0
0
0
0
0
0
74
11
19
0
6
0
0
0
0
0
0
0
0
0
0
69
4
6
0
2
0
1
0
0
0
0
0
0
0
0
53
7
27
4
9
2
3
1
0
0
0
0
0
0
0
77
0
0
0
0
0
0
0
0
0
0
0
0
0
0
61
23
34
1
12
0
1
0
0
0
0
0
0
0
0
77
0
0
0
0
0
0
0
0
0
0
0
0
0
0
992
82
171
15
57
2
12
2
0
0
0
0
0
0
0

Вам также может понравиться