Вы находитесь на странице: 1из 10

Regression model-FMCG data

Reading the data


fmcg<-read.csv("F:/fmcg.csv")
media_spends<-read.csv("F:/Media_spends.csv")
fmcg[is.na(fmcg)]<-0

Description about the data


The data is from FMCG domain. We have data on volume sales, value sales and
the number of stores for different SKU’s of different brands for a particular
category. Data for total spends on TV media is also available for leading brands.

The following question has to be answered from the data:


Brand-434 is a popular brand for use in bodypart-1. It has recently launched a
variant called “variant-1163”. The brand is interested in increasing price of
“variant-1163” by 10% in the next couple of months. Would you recommend it?

Subsetting media spends for variant-1163


var_1163_media<-media_spends[1:24,c(1,6)]

Subsetting the volume sales


var_1163<-subset(fmcg,Variant=="variant-1163")
var_1163<-var_1163[1:10,c(6,59:82)]

var_1163<-as.data.frame(t(var_1163))
var_1163<-var_1163[2:25,]
var_1163[,1:10]<-sapply(var_1163[, 1:10], as.character)
var_1163[,1:10]<-sapply(var_1163[, 1:10], as.numeric)

Calculating the total sales for variant-1163


var_1163$total<-0
for(i in 1:length(var_1163[,1])){
for (j in 1:10){
var_1163$total[i]<-var_1163$total[i]+var_1163[i,j]
}
}

Subsetting value sales and calculating total value sales


var_1163_val<-subset(fmcg,Variant=="variant-1163")
var_1163_val<-var_1163_val[1:10,c(23:46)]
var_1163_val<-as.data.frame(t(var_1163_val))
var_1163_val[,1:10]<-sapply(var_1163_val[, 1:10], as.character)
var_1163_val[,1:10]<-sapply(var_1163_val[, 1:10], as.numeric)
var_1163_val$total<-0
var_1163_media$vol<-var_1163$total
for(i in 1:length(var_1163_val[,1])){
for (j in 1:10){
var_1163_val$total[i]<-var_1163_val$total[i]+var_1163_val[i,j]
}
}
var_1163_media$val<-var_1163_val$total

Calculating price/l for variant-1163


var_1163_media$price=var_1163_media$val/var_1163_media$vol
var_1163_media[is.na(var_1163_media)]<-0

Subsetting Number of stores and calculating total number of stores


var_1163_str<-subset(fmcg,Variant=="variant-1163")
var_1163_str<-var_1163_str[,95:118]
var_1163_str<-as.data.frame(t(var_1163_str))
var_1163_str[,1:10]<-sapply(var_1163_str[, 1:10], as.character)
var_1163_str[,1:10]<-sapply(var_1163_str[, 1:10], as.numeric)

var_1163_str$total<-0
for(i in 1:length(var_1163_str[,1])){
for (j in 1:10){
var_1163_str$total[i]<-var_1163_str$total[i]+var_1163_str[i,j]
}
}
var_1163_media$stores<-var_1163_str$total
var_1163_media$stores<-var_1163_media$stores*1000
var_1163_media$price<-var_1163_media$price*1000

Correlation between volume sales and other factors


cor(var_1163_media[5:24,2:6])

## brand.434..variant.1163 vol val


## brand.434..variant.1163 1.0000000 0.2424643 0.2761167
## vol 0.2424643 1.0000000 0.9951129
## val 0.2761167 0.9951129 1.0000000
## price -0.0669006 -0.7315885 -0.6688280
## stores 0.3547530 0.9244893 0.9520305
## price stores
## brand.434..variant.1163 -0.0669006 0.3547530
## vol -0.7315885 0.9244893
## val -0.6688280 0.9520305
## price 1.0000000 -0.4702280
## stores -0.4702280 1.0000000
Regression for 1163 volume and price
reg_vol<-lm(var_1163_media$vol[var_1163_media$vol!=0]~var_1163_media$price[va
r_1163_media$price!=0],data = var_1163_media)
summary(reg_vol)

##
## Call:
## lm(formula = var_1163_media$vol[var_1163_media$vol != 0] ~ var_1163_media$
price[var_1163_media$price !=
## 0], data = var_1163_media)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12076 -1162 -3 1614 5354
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 114727.83 22806.67
## var_1163_media$price[var_1163_media$price != 0] -120.12 26.38
## t value Pr(>|t|)
## (Intercept) 5.030 8.69e-05 ***
## var_1163_media$price[var_1163_media$price != 0] -4.553 0.000247 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3643 on 18 degrees of freedom
## Multiple R-squared: 0.5352, Adjusted R-squared: 0.5094
## F-statistic: 20.73 on 1 and 18 DF, p-value: 0.0002466

anova(reg_vol)

## Analysis of Variance Table


##
## Response: var_1163_media$vol[var_1163_media$vol != 0]
## Df Sum Sq Mean Sq
## var_1163_media$price[var_1163_media$price != 0] 1 275106415 275106415
## Residuals 18 238898168 13272120
## F value Pr(>F)
## var_1163_media$price[var_1163_media$price != 0] 20.728 0.0002466 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can say that with 99.99% probability of being correct that the Price is having
some effect on volume sales, which can be explained by the equation with a
standard error of 3643 L.[Y= 114727.83 -120.12x].The Volume sales accounted
for 53.52% of change in volume sales.

Plot
plot(var_1163_media$price[var_1163_media$price!=0],var_1163_media$vol[var_116
3_media$vol!=0],
ylab="Volume sold in L",xlab="Price",
main="Variant-1163(Volume vs price)")
abline(reg_vol)

Regression for 1163 media spends Vs No of stores


reg_stores<-lm(var_1163_media$vol[var_1163_media$vol!=0]~var_1163_media$store
s[var_1163_media$stores!=0],data = var_1163_media)
summary(reg_stores)

##
## Call:
## lm(formula = var_1163_media$vol[var_1163_media$vol != 0] ~ var_1163_media$
stores[var_1163_media$stores !=
## 0], data = var_1163_media)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2620.2 -1078.1 -326.7 367.8 6685.6
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -3175.6875 1447.3520
## var_1163_media$stores[var_1163_media$stores != 0] 0.1131 0.0110
## t value Pr(>|t|)
## (Intercept) -2.194 0.0416 *
## var_1163_media$stores[var_1163_media$stores != 0] 10.289 5.75e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2037 on 18 degrees of freedom
## Multiple R-squared: 0.8547, Adjusted R-squared: 0.8466
## F-statistic: 105.9 on 1 and 18 DF, p-value: 5.75e-09

anova(reg_stores)

## Analysis of Variance Table


##
## Response: var_1163_media$vol[var_1163_media$vol != 0]
## Df Sum Sq Mean Sq
## var_1163_media$stores[var_1163_media$stores != 0] 1 439309713 439309713
## Residuals 18 74694870 4149715
## F value Pr(>F)
## var_1163_media$stores[var_1163_media$stores != 0] 105.86 5.75e-09 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can say that with 99.99% probability of being correct that the store
distribution is having some effect on volume sales, which can be explained by
the equation with a standard error of 3643 L. [Y= -3175.7 +113.1x].The Volume
sales accounted for 85.47% of change in volume sales.
plot(var_1163_media$stores[var_1163_media$stores!=0],var_1163_media$vol[var_1
163_media$vol!=0],
xlab="Number of stores",ylab="Volume",
main="Variant-1163(Volume vs No of stores)")
abline(reg_stores)
Multiple regression
reg_mult<-lm(var_1163_media$vol[var_1163_media$vol!=0]~
var_1163_media$stores[var_1163_media$stores!=0]+
var_1163_media$price[var_1163_media$price!=0],
data = var_1163_media)
summary(reg_mult)

##
## Call:
## lm(formula = var_1163_media$vol[var_1163_media$vol != 0] ~ var_1163_media$
stores[var_1163_media$stores !=
## 0] + var_1163_media$price[var_1163_media$price != 0], data = var_1163_
media)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2197.8 -563.4 -117.1 616.2 1850.2
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 5.363e+04 7.379e+03
## var_1163_media$stores[var_1163_media$stores != 0] 9.120e-02 6.032e-03
## var_1163_media$price[var_1163_media$price != 0] -6.258e+01 8.093e+00
## t value Pr(>|t|)
## (Intercept) 7.267 1.32e-06 ***
## var_1163_media$stores[var_1163_media$stores != 0] 15.120 2.73e-11 ***
## var_1163_media$price[var_1163_media$price != 0] -7.733 5.78e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 986.2 on 17 degrees of freedom
## Multiple R-squared: 0.9678, Adjusted R-squared: 0.964
## F-statistic: 255.7 on 2 and 17 DF, p-value: 2.057e-13

anova(reg_mult)

## Analysis of Variance Table


##
## Response: var_1163_media$vol[var_1163_media$vol != 0]
## Df Sum Sq Mean Sq
## var_1163_media$stores[var_1163_media$stores != 0] 1 439309713 439309713
## var_1163_media$price[var_1163_media$price != 0] 1 58159316 58159316
## Residuals 17 16535553 972680
## F value Pr(>F)
## var_1163_media$stores[var_1163_media$stores != 0] 451.649 1.107e-13 ***
## var_1163_media$price[var_1163_media$price != 0] 59.793 5.780e-07 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can say that with 99.99% probability of being correct that the store
distribution and price is having some effect on volume sales, which can be
explained by the equation with a standard error of 986.2 L. [Y= 53626.006 +
0.912 x1 - 62.583x2.The Number of stores and price accounted for 96.78% of
change in volume sales.
The F-value is greater than the table value in Anova table. Hence null is rejected.
So we can say that our model is a significant model.

To check independence of error


library(lmtest)

## Loading required package: zoo

##
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':


##
## as.Date, as.Date.numeric

dwtest(reg_mult)

##
## Durbin-Watson test
##
## data: reg_mult
## DW = 1.5166, p-value = 0.0718
## alternative hypothesis: true autocorrelation is greater than 0

To prove independence of error i.e. there is no autocorrelation between the


errors Durbin Watson test is done.The Durbin Watson value should lie between
0 and 4 to prove independence of error. Here the value is 1.5166. So we can say
that there is no autocorrelation between the errors.

For homoscedasticity
bptest(reg_mult)

##
## studentized Breusch-Pagan test
##
## data: reg_mult
## BP = 0.47847, df = 2, p-value = 0.7872

plot(predict(reg_mult),residuals(reg_mult))

Normality of residuals
shapiro.test(residuals(reg_mult))
##
## Shapiro-Wilk normality test
##
## data: residuals(reg_mult)
## W = 0.96833, p-value = 0.7194

The errors must be normally distributed. This is done to ensure the randomness
of the errors. Shapiro wilk’s test is done to check normality. Here the null
hypothesis is the residuals are normally distributed and the alternate
hypothesis is the residuals are not normally distributed.The P-value is greater
than the alpha. So we can accept the null hypothesis which says that the errors
are normally distributed. This can further be supported by graphs.
pp_plot<-rstandard(reg_mult)
qqnorm(pp_plot,
ylab="Standardized Residuals",xlab="Normal Scores",
main="Normal PP-Plot of Regression standardized Residual")
qqline(pp_plot)

hist(residuals(reg_mult))
CONCLUSION:
By using the above equation, on increasing the price by 10% i.e. from
864.555(December 2013) to 951.011, and keeping the store distribution same as
December 2013 , the volume sales is decreasing by 38.45%(9987.327±986.2 L)

Hence it is not recommended to increase the price of the product.

Вам также может понравиться