Вы находитесь на странице: 1из 7

# ASSIGNMENT 3 CORRELATION AND REGRESSION ANALYSIS

Correlation
The hypothesis will be formulated as under: Ho : There is no significant relationship between sales and the ith factor H1 : There is a significant relationship between sales and the ith factor i = 1,2,3,4,5,6 Level of significance () = 0.05 The following is the SPSS output displaying the Karl Pearsons correlation coefficients of sales with each of the independent variables and also amongst the variables.

It can be observed that in a majority of the places, there is a high degree of positive correlation between the variables. It can be seen the p-values for correlation coefficient between sales and each of the factors Del Boys, Ad Cost, Outlets, Variants and No of Ext Customer is less than 0.05, so it can be said that there is sufficient evidence to reject the null hypothesis.

Thus, concluding that at 95% confidence level, there is a significant relationship between sales and each of the before mentioned variables.

Also, the p-value for correlation coefficient between Sales and Comp. Int is greater than 0.05, so it can be said that we do not have sufficient evidence to reject null hypothesis. Thus, we may conclude that at 95% confidence level, there is no significant relationship between sales and the variable Comp. Int. We can also infer that there is a significant inter-relationship between the six variables going by the same hypothesis, thereby providing evidence of a possible high degree of multicollinearity.

Regression
Steps involved in designing the regression model are as follows: Step 1: F test for testing if the model is a good fit to the given data or not Hypothesis formulation: Ho : The regression model is not a good fit H1 : The regression model is a good fit Level of significance () = 0.05
ANOVA Model Regression 1 Residual Total a. Dependent Variable: Sales b. Predictors: (Constant), No of Ext Customer, Comp. Int, Variants, Del Boys, Outlets, Ad cost Sum of Squares 6408.864 313.536 6722.400 df 6 8 14
a

## Mean Square 1068.144 39.192

F 27.254

Sig. .000
b

From the above ANOVA table we can figure that since the p-value is less than 0.05, we have enough evidence to reject null hypothesis and conclude that the regression model is a good fit to the given data.

Step 2: Checking the coefficient of determination (R2) The following is the SPSS output displaying R2
Model Summary Model R R Square Adjusted R Square 1 .976
a

## Std. Error of the Estimate

.953

.918

6.260

a. Predictors: (Constant), No of Ext Customer, Comp. Int, Variants, Del Boys, Outlets, Ad cost

R2 = 0.953 or 95.3% i.e., the six independent variables chosen explain 95.3% variation in the dependent variable, which is sales. This further shows that the model fitted is a good one. Step 3: T-test for telling us which of the independent variables are significantly important for the regression model Hypothesis formulation Ho : There is no significant effect of the ith factor on sales H1 : There is a significant effect of the ith factor on sales i = 1,2,3,4,5,6 Level of significance () = 0.05 The following is the SPSS output displaying the test statistic values for the t-test and also the corresponding p-values.

Coefficients Model

Unstandardized Coefficients

Standardized Coefficients

Sig.

B (Constant) Del Boys Ad cost 1 Outlets Variants Comp. Int No of Ext Customer a. Dependent Variable: Sales 6.372 .919 .699 1.620 -1.978 .067 .242

## Std. Error 32.586 .910 1.303 .618 2.310 2.211 .299

Beta .196 .189 .152 .617 -.147 .003 .182 1.010 .537 2.621 -.856 .030 .808 .850 .342 .606 .031 .417 .977 .442

The p-value for correlation coefficient between Sales and Outlets is less than 0.05, so we can say that we have sufficient evidence to reject null hypothesis. Thus, we may conclude that at 95% confidence level, there is a significant effect of the variable Outlets on sales. Also, we can see the p-values for each of the factors Del Boys, Ad Cost, Variants, Comp. Int and No of Ext Customer is greater than 0.05, so we do not have sufficient evidence to reject our null hypothesis. Thus, concluding that at 95% confidence level, there is no significant effect of each of these variables on sales. The regression model is given by => Y = 6.372 + 0.919X1 + 0.699X2 + 1.620X3 1.978X4 +0.067X5 + 0.242X6 Where, Y = Sales X4 = Variants

## X1 = Del Boys X5 = Comp. Int

X3 = Outlets

X6 = No of Ext Customer

If we take only Outlets as the independent variable in our model, the regression equation becomes => Y = 6.372 + 1.620X3

Step-wise Regression
We now perform a stepwise regression analysis to see the effect of the variable on sales independently.

1) Forward Step-wise Regression In this we take the most significantly effective variable in the beginning in our model, and keep on adding one variable at a time in order of their significance to the model and reach an optimum level.
Model Summary Model R R Square Adjusted R Square 1 2 .953 .970
a b

.908 .940

.900 .930

6.913 5.789

## a. Predictors: (Constant), Outlets b. Predictors: (Constant), Outlets, Del Boys

ANOVA Model Regression 1 Residual Total Regression 2 Residual Total a. Dependent Variable: Sales b. Predictors: (Constant), Outlets c. Predictors: (Constant), Outlets, Del Boys Sum of Squares 6101.176 621.224 6722.400 6320.215 402.185 6722.400 df

## Mean Square 1 13 14 2 12 14 3160.108 33.515 6101.176 47.786

F 127.676

Sig. .000
b

94.288

.000

Again following the same steps as done previously in the main regression model, we get the equation as => (i) With only outlets as the independent variable (Not the optimum model yet) Y = -13.013 + 2.503X3 R2 = 90.8% (ii) With outlets and Del Boys as the independent variables (Optimum model) Y = -11.817 + 1.753X3 + 1.640X1 R2 = 94.0%

2) Backward Step-wise Regression In this we take all the independent variables in the beginning in our model, and keep on removing one variable at a time in reverse order of their significance to the model and reach an optimum level.
Model Summary Model R R Square Adjusted R Square 1 2 3 4 5 .976 .976
a b c

## .975 .971 .970

d e

a. Predictors: (Constant), No of Ext Customer, Comp. Int, Variants, Del Boys, Outlets, Ad cost b. Predictors: (Constant), No of Ext Customer, Variants, Del Boys, Outlets, Ad cost c. Predictors: (Constant), No of Ext Customer, Variants, Del Boys, Outlets d. Predictors: (Constant), No of Ext Customer, Del Boys, Outlets e. Predictors: (Constant), Del Boys, Outlets ANOVA Model Regression 1 Residual Total Regression 2 Residual Total Regression 3 Residual Total Regression 4 Residual Total Regression 5 Residual Total Sum of Squares 6408.864 313.536 6722.400 6408.828 313.572 6722.400 6393.881 328.519 6722.400 6339.099 383.301 6722.400 6320.215 402.185 6722.400 df 6 8 14 5 9 14 4 10 14 3 11 14 2 12 14 3160.108 33.515 94.288 .000
f a

## Mean Square 1068.144 39.192

F 27.254

Sig. .000
b

1281.766 34.841

36.789

.000

1598.470 32.852

48.657

.000

2113.033 34.846

60.640

.000

a. Dependent Variable: Sales b. Predictors: (Constant), No of Ext Customer, Comp. Int, Variants, Del Boys, Outlets, Ad cost c. Predictors: (Constant), No of Ext Customer, Variants, Del Boys, Outlets, Ad cost d. Predictors: (Constant), No of Ext Customer, Variants, Del Boys, Outlets e. Predictors: (Constant), No of Ext Customer, Del Boys, Outlets f. Predictors: (Constant), Del Boys, Outlets

The optimum regression model is again obtained as the one found out using forward step-wise regression technique (including only Outlets and Del Boys) Y = -11.817 + 1.753X3 + 1.640X1 R2 = 94.0% Thus, we can say that only the variables Outlets and Del Boys have significant effect on sales. Beyond this, adding further variables may improve our regression model but only marginally (R2 changes only from 94.0% with two variables to 95.3% if we use all the variables). Recommendations: By using only two variables - Outlets and Del Boys we can ascertain or estimate sales to quite a large extent. Therefore, owner of Pizza Corner, Gurgaon can do away with the remaining variables, i.e. - Ad Cost, Variants, Comp. Int and No of Ext Customer - which would help him/her in saving time and money as he/she would have to collect less data.