Multi Regression Model

Regression model-FMCG data
Reading the data

fmcg<-read.csv("F:/fmcg.csv")
media_spends<-read.csv("F:/Media_spends.csv")
fmcg[is.na(fmcg)]<-0
Description about the data

The data is from FMCG domain. We have data on volume sales, value sales and
the number of stores for different SKU’s of different brands for a particular
category. Data for total spends on TV media is also available for leading brands.
The following question has to be answered from the data:

Brand-434 is a popular brand for use in bodypart-1. It has recently launched a
variant called “variant-1163”. The brand is interested in increasing price of
“variant-1163” by 10% in the next couple of months. Would you recommend it?
Subsetting media spends for variant-1163

var_1163_media<-media_spends[1:24,c(1,6)]
Subsetting the volume sales

var_1163<-subset(fmcg,Variant=="variant-1163")
var_1163<-var_1163[1:10,c(6,59:82)]
var_1163<-as.data.frame(t(var_1163))
var_1163<-var_1163[2:25,]
var_1163[,1:10]<-sapply(var_1163[, 1:10], as.character)
var_1163[,1:10]<-sapply(var_1163[, 1:10], as.numeric)
Calculating the total sales for variant-1163

var_1163$total<-0
for(i in 1:length(var_1163[,1])){
for (j in 1:10){
var_1163$total[i]<-var_1163$total[i]+var_1163[i,j]
}
}
Subsetting value sales and calculating total value sales

var_1163_val<-subset(fmcg,Variant=="variant-1163")
var_1163_val<-var_1163_val[1:10,c(23:46)]
var_1163_val<-as.data.frame(t(var_1163_val))
var_1163_val[,1:10]<-sapply(var_1163_val[, 1:10], as.character)
var_1163_val[,1:10]<-sapply(var_1163_val[, 1:10], as.numeric)
var_1163_val$total<-0
var_1163_media$vol<-var_1163$total
for(i in 1:length(var_1163_val[,1])){
for (j in 1:10){
var_1163_val$total[i]<-var_1163_val$total[i]+var_1163_val[i,j]
}
}
var_1163_media$val<-var_1163_val$total
Calculating price/l for variant-1163

var_1163_media$price=var_1163_media$val/var_1163_media$vol
var_1163_media[is.na(var_1163_media)]<-0
Subsetting Number of stores and calculating total number of stores

var_1163_str<-subset(fmcg,Variant=="variant-1163")
var_1163_str<-var_1163_str[,95:118]
var_1163_str<-as.data.frame(t(var_1163_str))
var_1163_str[,1:10]<-sapply(var_1163_str[, 1:10], as.character)
var_1163_str[,1:10]<-sapply(var_1163_str[, 1:10], as.numeric)
var_1163_str$total<-0
for(i in 1:length(var_1163_str[,1])){
for (j in 1:10){
var_1163_str$total[i]<-var_1163_str$total[i]+var_1163_str[i,j]
}
}
var_1163_media$stores<-var_1163_str$total
var_1163_media$stores<-var_1163_media$stores*1000
var_1163_media$price<-var_1163_media$price*1000
Correlation between volume sales and other factors

cor(var_1163_media[5:24,2:6])
## brand.434..variant.1163 vol val

## brand.434..variant.1163 1.0000000 0.2424643 0.2761167
## vol 0.2424643 1.0000000 0.9951129
## val 0.2761167 0.9951129 1.0000000
## price -0.0669006 -0.7315885 -0.6688280
## stores 0.3547530 0.9244893 0.9520305
## price stores
## brand.434..variant.1163 -0.0669006 0.3547530
## vol -0.7315885 0.9244893
## val -0.6688280 0.9520305
## price 1.0000000 -0.4702280
## stores -0.4702280 1.0000000
Regression for 1163 volume and price
reg_vol<-lm(var_1163_media$vol[var_1163_media$vol!=0]~var_1163_media$price[va
r_1163_media$price!=0],data = var_1163_media)
summary(reg_vol)
##
## Call:
## lm(formula = var_1163_media$vol[var_1163_media$vol != 0] ~ var_1163_media$
price[var_1163_media$price !=
## 0], data = var_1163_media)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12076 -1162 -3 1614 5354
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 114727.83 22806.67
## var_1163_media$price[var_1163_media$price != 0] -120.12 26.38
## t value Pr(>|t|)
## (Intercept) 5.030 8.69e-05 ***
## var_1163_media$price[var_1163_media$price != 0] -4.553 0.000247 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3643 on 18 degrees of freedom
## Multiple R-squared: 0.5352, Adjusted R-squared: 0.5094
## F-statistic: 20.73 on 1 and 18 DF, p-value: 0.0002466
anova(reg_vol)
## Analysis of Variance Table

##
## Response: var_1163_media$vol[var_1163_media$vol != 0]
## Df Sum Sq Mean Sq
## var_1163_media$price[var_1163_media$price != 0] 1 275106415 275106415
## Residuals 18 238898168 13272120
## F value Pr(>F)
## var_1163_media$price[var_1163_media$price != 0] 20.728 0.0002466 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can say that with 99.99% probability of being correct that the Price is having
some effect on volume sales, which can be explained by the equation with a
standard error of 3643 L.[Y= 114727.83 -120.12x].The Volume sales accounted
for 53.52% of change in volume sales.
Plot
plot(var_1163_media$price[var_1163_media$price!=0],var_1163_media$vol[var_116
3_media$vol!=0],
ylab="Volume sold in L",xlab="Price",
main="Variant-1163(Volume vs price)")
abline(reg_vol)
Regression for 1163 media spends Vs No of stores

reg_stores<-lm(var_1163_media$vol[var_1163_media$vol!=0]~var_1163_media$store
s[var_1163_media$stores!=0],data = var_1163_media)
summary(reg_stores)
##
## Call:
stores[var_1163_media$stores !=
## 0], data = var_1163_media)
##
## Residuals:
## -2620.2 -1078.1 -326.7 367.8 6685.6
##
## Coefficients:
## (Intercept) -3175.6875 1447.3520
## var_1163_media$stores[var_1163_media$stores != 0] 0.1131 0.0110
## t value Pr(>|t|)
## (Intercept) -2.194 0.0416 *
## var_1163_media$stores[var_1163_media$stores != 0] 10.289 5.75e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2037 on 18 degrees of freedom
## F-statistic: 105.9 on 1 and 18 DF, p-value: 5.75e-09
anova(reg_stores)

##
## var_1163_media$stores[var_1163_media$stores != 0] 1 439309713 439309713
## Residuals 18 74694870 4149715
## F value Pr(>F)
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can say that with 99.99% probability of being correct that the store
distribution is having some effect on volume sales, which can be explained by
the equation with a standard error of 3643 L. [Y= -3175.7 +113.1x].The Volume
sales accounted for 85.47% of change in volume sales.
plot(var_1163_media$stores[var_1163_media$stores!=0],var_1163_media$vol[var_1
163_media$vol!=0],
xlab="Number of stores",ylab="Volume",
main="Variant-1163(Volume vs No of stores)")
abline(reg_stores)
Multiple regression
reg_mult<-lm(var_1163_media$vol[var_1163_media$vol!=0]~
var_1163_media$stores[var_1163_media$stores!=0]+
var_1163_media$price[var_1163_media$price!=0],
data = var_1163_media)
summary(reg_mult)
##
## Call:
stores[var_1163_media$stores !=
## 0] + var_1163_media$price[var_1163_media$price != 0], data = var_1163_
media)
##
## Residuals:
## -2197.8 -563.4 -117.1 616.2 1850.2
##
## Coefficients:
## (Intercept) 5.363e+04 7.379e+03
## var_1163_media$stores[var_1163_media$stores != 0] 9.120e-02 6.032e-03
## var_1163_media$price[var_1163_media$price != 0] -6.258e+01 8.093e+00
## t value Pr(>|t|)
## (Intercept) 7.267 1.32e-06 ***
## var_1163_media$price[var_1163_media$price != 0] -7.733 5.78e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 986.2 on 17 degrees of freedom
## F-statistic: 255.7 on 2 and 17 DF, p-value: 2.057e-13
anova(reg_mult)

##
## var_1163_media$stores[var_1163_media$stores != 0] 1 439309713 439309713
## var_1163_media$price[var_1163_media$price != 0] 1 58159316 58159316
## Residuals 17 16535553 972680
## F value Pr(>F)
## var_1163_media$price[var_1163_media$price != 0] 59.793 5.780e-07 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can say that with 99.99% probability of being correct that the store
distribution and price is having some effect on volume sales, which can be
explained by the equation with a standard error of 986.2 L. [Y= 53626.006 +
0.912 x1 - 62.583x2.The Number of stores and price accounted for 96.78% of
change in volume sales.
The F-value is greater than the table value in Anova table. Hence null is rejected.
So we can say that our model is a significant model.
To check independence of error

library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':

##
## as.Date, as.Date.numeric
dwtest(reg_mult)
##
## Durbin-Watson test
##
## data: reg_mult
## DW = 1.5166, p-value = 0.0718
## alternative hypothesis: true autocorrelation is greater than 0
To prove independence of error i.e. there is no autocorrelation between the

errors Durbin Watson test is done.The Durbin Watson value should lie between
0 and 4 to prove independence of error. Here the value is 1.5166. So we can say
that there is no autocorrelation between the errors.
For homoscedasticity
bptest(reg_mult)
##
## studentized Breusch-Pagan test
##
## data: reg_mult
## BP = 0.47847, df = 2, p-value = 0.7872
plot(predict(reg_mult),residuals(reg_mult))
Normality of residuals
shapiro.test(residuals(reg_mult))
##
## Shapiro-Wilk normality test
##
## data: residuals(reg_mult)
## W = 0.96833, p-value = 0.7194
The errors must be normally distributed. This is done to ensure the randomness
of the errors. Shapiro wilk’s test is done to check normality. Here the null
hypothesis is the residuals are normally distributed and the alternate
hypothesis is the residuals are not normally distributed.The P-value is greater
than the alpha. So we can accept the null hypothesis which says that the errors
are normally distributed. This can further be supported by graphs.
pp_plot<-rstandard(reg_mult)
qqnorm(pp_plot,
ylab="Standardized Residuals",xlab="Normal Scores",
main="Normal PP-Plot of Regression standardized Residual")
qqline(pp_plot)
hist(residuals(reg_mult))
CONCLUSION:
By using the above equation, on increasing the price by 10% i.e. from
864.555(December 2013) to 951.011, and keeping the store distribution same as
December 2013 , the volume sales is decreasing by 38.45%(9987.327±986.2 L)
Hence it is not recommended to increase the price of the product.

Multi Regression Model

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Multi Regression Model

Загружено:

Авторское право:

Доступные форматы

Regression model-FMCG data

Reading the data

Description about the data

The following question has to be answered from the data:

Subsetting media spends for variant-1163

Subsetting the volume sales

Calculating the total sales for variant-1163

Subsetting value sales and calculating total value sales

Calculating price/l for variant-1163

Subsetting Number of stores and calculating total number of stores

Correlation between volume sales and other factors

## brand.434..variant.1163 vol val

## Analysis of Variance Table

Regression for 1163 media spends Vs No of stores

## Analysis of Variance Table

## Analysis of Variance Table

To check independence of error

## Loading required package: zoo

## The following objects are masked from 'package:base':

To prove independence of error i.e. there is no autocorrelation between the

Hence it is not recommended to increase the price of the product.

Вам также может понравиться