Академический Документы
Профессиональный Документы
Культура Документы
Individual Assignment
Submitted By:
Shreya Kumar
2018PGP355
Section : A
1|Page
1.(a)
Use regression data. This dataset talks about the sales of different cereals and also explains the
amount of calories, protein, fats, etc. in each cereal. Further, it provides the insights on where these
cereals are located (variable-shelf) and the advertising amount spent on them. We also know the
weight and the cups available. (Total points=40)
a. Estimate and interpret a regression model with sales as DV and shelf, calories, protein, fat, sodium,
fiber, carbo, sugars, potass, vitamins, weight, cups, and adv as IVs (consider 0.05 significance level).
Report the significance and the performance of the model.
b. How is “fat” related to sales? (consider 0.1 significance level).
c. Is there any difference in sales if a product is kept in different shelf? d. Please provide the R code
for (a), (b), and (c).
To estimate a regression model with sales as DV, following programs need to be run.
Required Program:
Results from R:
2|Page
The relation between cereals and sugar, fat, calories, protein and carbo:
The model is significant because, with a 95% confidence interval, the p value of sugar, fat, calories, protein
and carbo is significant.
Performance of the model is low as 6.082% (adjusted R²=0.06082) of the variation is accounted by this mo
del.
3|Page
Hence, the model is significant but does not perform well to forecast the sales of cereals.
1.(b) To estimate a relation between fat & sales, following programs need to be run.
Required Program:
var = c("sales","fat")
data1 = data[var]
View(data1)
names(data1)
library(dplyr)
library(Hmisc)
corr=rcorr(as.matrix(data1)
corr
r
Correlation between sales and fat are low, i.e. only 0.13, and the correlation is insignificant because
significance value is 0.2779 which is more than 0.10. This low correlation signifies that when fat changes,
sales do not change much.
In the regression model, the coefficient of fat has a p value of 0.01618, it is significant. Because the
variables aren’t correlated with sales, the R² of the regression model is very low.
The significant important value of p indicates that it isn’t any significant correlation between fat and
sales.
1.(c) To find any difference in sales if a product is kept in different shelf, following programs need to
be run.
4|Page
Required Program/ R code:
cereals2 = lm(sales~shelf,data=data)
summary(cereals2)
Results from R:
The value of sales will be the same for a low or middle or high shelf as the p value of shelf is insignificant ev
en if the intercept is significant.
b. Estimate a logit model (Model b) with dv=coke_selection and IVs=gender, occupation, country_of_or
igin, price, distribution, and adv_ratio. Compare the performance of the (Model a) and (Model b).
2.(a) To estimate & interpret Model a where dv = coke_selection & rest are IVs following codes need to b
e run.
Required Code:
6|Page
Results from R:
Required Code:
7|Page
exp(coef(logit))
Results from R:
The Model a is
2.(b) To estimate a logit model (Model b) with dv=coke_selection and IVs=gender, occupation,
country_of_origin, price, distribution, and adv_ratio & to compare the performance of the (Model a)
and (Model b) following codes need to be run.
Required R Code:
Results from R:
8|Page
The Model b is
anova(logit2,mylogit,test="Chisq")
AIC Value of Model a is 592.67 & AIC value of first Model b 590.32. The lower the AIC value is, the best
the model represents the variation in the dependent variable from the independent ones. Hence Model
b performs better.
10 | P a g e