Вы находитесь на странице: 1из 10

Advanced Marketing Analytics

Individual Assignment

Submitted By:
Shreya Kumar
2018PGP355
Section : A

1|Page
1.(a)
Use regression data. This dataset talks about the sales of different cereals and also explains the
amount of calories, protein, fats, etc. in each cereal. Further, it provides the insights on where these
cereals are located (variable-shelf) and the advertising amount spent on them. We also know the
weight and the cups available. (Total points=40)
a. Estimate and interpret a regression model with sales as DV and shelf, calories, protein, fat, sodium,
fiber, carbo, sugars, potass, vitamins, weight, cups, and adv as IVs (consider 0.05 significance level).
Report the significance and the performance of the model.
b. How is “fat” related to sales? (consider 0.1 significance level).
c. Is there any difference in sales if a product is kept in different shelf? d. Please provide the R code
for (a), (b), and (c).

To estimate a regression model with sales as DV, following programs need to be run.

Required Program:

data = read.csv("C:/Users/Shreya/Documents/R/AMA/Assignment (regression data) Q1.csv",


header = T)
cereals = lm(sales~shelf + calories + protein + fat + sodium + fiber + carbo + sugars + potass
+ vitamins + weight + cups + adv,data=data)
summary(cereals)

Results from R:

2|Page
The relation between cereals and sugar, fat, calories, protein and carbo:

CEREALS =101834.2*PROTEIN -12187.0*CALORIES +149428.6*FAT + 84887.3*CARBO +


84335.0*SUGAR

The model is significant because, with a 95% confidence interval, the p value of sugar, fat, calories, protein
and carbo is significant.
Performance of the model is low as 6.082% (adjusted R²=0.06082) of the variation is accounted by this mo
del.

3|Page
Hence, the model is significant but does not perform well to forecast the sales of cereals.

1.(b) To estimate a relation between fat & sales, following programs need to be run.

Required Program:

var = c("sales","fat")
data1 = data[var]
View(data1)
names(data1)
library(dplyr)
library(Hmisc)
corr=rcorr(as.matrix(data1)
corr
r

Results from R code:

Correlation between sales and fat are low, i.e. only 0.13, and the correlation is insignificant because
significance value is 0.2779 which is more than 0.10. This low correlation signifies that when fat changes,
sales do not change much.
In the regression model, the coefficient of fat has a p value of 0.01618, it is significant. Because the
variables aren’t correlated with sales, the R² of the regression model is very low.
The significant important value of p indicates that it isn’t any significant correlation between fat and
sales.

1.(c) To find any difference in sales if a product is kept in different shelf, following programs need to
be run.

4|Page
Required Program/ R code:

cereals2 = lm(sales~shelf,data=data)
summary(cereals2)

Results from R:

The value of sales will be the same for a low or middle or high shelf as the p value of shelf is insignificant ev
en if the intercept is significant.

1.(d) R Code for above solutions

data = read.csv("C:/Users/Shreya/Documents/R/AMA/Assignment (regression data) Q1.csv",


header = T)
cereals =
lm(sales~shelf+calories+protein+fat+sodium+fiber+carbo+sugars+potass+vitamins+weight+c
ups+adv,data=data)
5|Page
summary(cereals)
var = c("sales","fat")
data1 = data[var]
View(data1)
names(data1)
library(dplyr)
library(Hmisc)
corr=rcorr(as.matrix(data1)
corr
cereals2 = lm(sales~shelf,data=data)
summary(cereals2)

Q2. Use logit data. (Total points=20)


a. Estimate and interpret a logit model (Model a) where dv=coke_selection and rest are IVs.

b. Estimate a logit model (Model b) with dv=coke_selection and IVs=gender, occupation, country_of_or
igin, price, distribution, and adv_ratio. Compare the performance of the (Model a) and (Model b).

c. Please provide the R code.

2.(a) To estimate & interpret Model a where dv = coke_selection & rest are IVs following codes need to b
e run.

Required Code:

data = read.csv("C:/Users/Shreya/Documents/R/AMA/Assignment (logit data) Q2.csv",


header = T)
View(data)
library(aod)
names(data)
library(ggplot2)
logit = glm(coke.selection ~ gender
+occupation+country_of.origin+price+distribution+adv_rati o+satisfaction_avg+competition
+storevisit_perweek+health.conciousness, data = data, family = "binomial")
summary(logit)

6|Page
Results from R:

Required Code:

7|Page
exp(coef(logit))
Results from R:

The Model a is

ln(coke_selection/(1-coke_selection)= 0.7125 + 2.9010 gender + 0.8143occupation –


2.6236country_of.origin – 0.1598price + 0.4770distribution - 0.3347adv_ratio –
0.6086satisfaction_avg + 0.2463competition – 0.4684 storevisit_perweek –
0.3605health.conciousness

The P value of gender, occupation, country have a significant P value.


Probability of coke_selection can be obtained from the exponential of the formula obtained.
The log odds of coke_selection depend on on country of origin, gender and occupation.

2.(b) To estimate a logit model (Model b) with dv=coke_selection and IVs=gender, occupation,
country_of_origin, price, distribution, and adv_ratio & to compare the performance of the (Model a)
and (Model b) following codes need to be run.

Required R Code:

logit2 = glm(coke.selection ~ gender


+occupation+country_of.origin+price+distribution+adv_ratio, data = data, family =
"binomial")
summary(logit2)

Results from R:

8|Page
The Model b is

ln(coke_selection/(1-coke_selection)= 0.1272 + 2.8555 gender + 0.8065occupation –


2.6284country_of.origin – 0.1793price + 0.5174distribution - 0.2997adv_ratio

Required Code for Comparison of Model a & Model b:

anova(logit2,mylogit,test="Chisq")

Results from R: o/p

AIC Value of Model a is 592.67 & AIC value of first Model b 590.32. The lower the AIC value is, the best
the model represents the variation in the dependent variable from the independent ones. Hence Model
b performs better.

2.(c) R Code for above solutions

data = read.csv("C:/Users/Sidharth's/Documents/R/AMA/As signment (logit data) Q2.csv",


header = T)
9|Page
View(data)
library(aod)
names(data)
library(ggplot2)
logit= glm(coke.selection ~ gender + occupation+ country_of.origin + price+ distribution+
adv_ratio+ satisfaction_avg + competition + storevisit_perweek + health.conciousness, data
= data, family = "binomial")
summary(logit)
exp(coef(logit))
logit2 = glm(coke.selection ~ gender
+occupation+country_of.origin+price+distribution+adv_ratio, data = data, family =
"binomial")
summary(logit2)
anova(logit2,mylogit,test="Chisq")

10 | P a g e

Вам также может понравиться