Вы находитесь на странице: 1из 5

ANSWERS TO TEST EXERCISE -1

(a) Make the scatter diagram with sales on the vertical axis and advertising on the
horizontal axis. What do you expect to find if you would fit a regression line to these
data?

55

50

45

40
SALES

35

30

25

20
4 6 8 10 12 14 16 18

ADVERTISING

In the above scatter diagram sales are portrayed on the vertical axis and advertisement on the
horizontal axis.
The scatter diagram clearly depicts positive and linear relationship between sales and
advertisement. But, there is one outlier in the sales which shows sales volume of 50 at the
advertisement of just 6. It is an abnormal observation and the reasons for such abnormal
happening of sales are to be explored.

If the regression line is fitted for this data, the outlier present in the dataset would distort the
results of OLS regression.

The scatter diagram with regression line looks as below


55

50

45

40
SALES

35

30

25

20
4 6 8 10 12 14 16 18

ADVERTISING

The observations below the regression line are more than the observations above the regression
line. However, distance of individual observations from the regression line is not very long except
for one observation which was identified as outlier.

(b) Estimate the coefficients a and b in the simple regression model with sales as
dependent variable and advertising as explanatory factor. Also compute the
standard error and t -value of b. Is b significantly different from 0?

The following table shows the results of simple regression model wherein sales
as dependent variable and advertising as explanatory factor.

Dependent Variable: SALES


Method: Least Squares
Date: 01/14/18 Time: 13:42
Sample: 1 20
Included observations: 20

Variable Coefficient Std. Error t-Statistic Prob.

ADVERTISING -0.324575 0.458911 -0.707272 0.4885


C 29.62689 4.881527 6.069185 0.0000

R-squared 0.027039 Mean dependent var 26.30000


Adjusted R-squared -0.027014 S.D. dependent var 5.759203
S.E. of regression 5.836474 Akaike info criterion 6.460770
Sum squared resid 613.1598 Schwarz criterion 6.560344
Log likelihood -62.60770 Hannan-Quinn criter. 6.480208
F-statistic 0.500234 Durbin-Watson stat 1.993831
Prob(F-statistic) 0.488454

The results of the regression reveal that advertisement does not have significant
impact on sales (p>0.05). In other words, the coefficient of advertising i.e., ‘b’ is
not significantly different from ‘0’.

The R-squared value is very poor indicating that only 2.7% of the variations in
the dependent variable can be explained by the regression model. Moreover, F-
statistic is not significant which indicates that prediction from the regression will
not be significant.

(c) Compute the residuals and draw a histogram of these residuals. What
conclusion do you draw from this histogram?

The below shown graph is the histogram of the residuals derived from the above mentioned
regression model.

8
Series: RESID
7 Sample 1 20
Observations 20
6
Mean 4.33e-15
5 Median -1.381144
Maximum 22.32056
4 Minimum -4.679444
Std. Dev. 5.680807
3 Skewness 3.178012
Kurtosis 13.28260
2
Jarque-Bera 121.7758
1 Probability 0.000000

0
-5 0 5 10 15 20

The histogram of the residuals clearly shows that the distribution of the residuals is not
normal. All the values of residuals are concentrated between the range of -5 to 5 except
one residuals which was in the range of 20. This residual belongs to the outlier which was
identified as such earlier.

The null hypothesis of Jarque-Bera test is “residuals are normally distributed”. As the ‘p’
value is less than 0.01, null hypothesis is rejected inferring that residuals are not normally
distributed.

(d) Apparently, the regression result of part (b) is not satisfactory. Once you
realize that the large residual corresponds to the week with opening hours
during the evening, how would you proceed to get a more satisfactory
regression model?

As observed in question (b) and (c), the results of the regression disclose that the
impact of advertisement is not significant and also, the residuals are not normally
distributed. Such insignificant results are attributed to the presence of outliers in the
distribution of data. Hence, it is advisable to re-run the regression model by excluding
the outlier in the data.
(e) Delete this special week from the sample and use the remaining 19 weeks to
estimate the coefficients a and b in the simple regression model with sales as
dependent variable and advertising as explanatory factor. Also compute the
standard error and t -value of b. Is b significantly different from 0?

The following table shows results of regression analysis done after deleting the outlier observation from the data
set.

Dependent Variable: SALES


Method: Least Squares
Date: 01/14/18 Time: 14:26
Sample: 1 19
Included observations: 19

Variable Coefficient Std. Error t-Statistic Prob.

ADVERTISING 0.375000 0.088196 4.251873 0.0005


C 21.12500 0.954848 22.12394 0.0000

R-squared 0.515372 Mean dependent var 25.05263


Adjusted R-squared 0.486864 S.D. dependent var 1.470967
S.E. of regression 1.053705 Akaike info criterion 3.041803
Sum squared resid 18.87500 Schwarz criterion 3.141217
Log likelihood -26.89713 Hannan-Quinn criter. 3.058628
F-statistic 18.07842 Durbin-Watson stat 1.749172
Prob(F-statistic) 0.000538

The results of the regression reveals that advertisement has significant positive impact on the
sales (p<0.05). As the ‘p’ value associated with the coefficient ‘b’ (i.e., coefficient of
advertising) is less than 0.05, it can be inferred that ‘b’ is significantly different from ‘0’.
Adjusted R-squared value discloses that 48.7% of the variations in the sales can be explained
by the regression model. F-statistic is significant (p<0.05). It indicates that the predictability of
the model is significant.

Diagnosis of the residuals


6
Series: RESID
Sample 1 19
5 Observations 19

4 Mean 3.15e-15
Median 3.55e-15
Maximum 1.750000
3 Minimum -2.250000
Std. Dev. 1.024017
Skewness -0.251657
2
Kurtosis 2.933101

1 Jarque-Bera 0.204092
Probability 0.902988

0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
The histogram reveals that there is no strong concentration of the residuals. Residuals are
distributed across all the values on the graph. Besides, the null hypothesis of Jarque-Berra test
i.e., “Residuals are normally distributed” is accepted (p>0.05). So, one of the importance
properties of CLRM have been satisfied for this model.

(f) Discuss the differences between your findings in parts (b) and (e). Describe in
words what you have learned from these results.

Remarkable improvement in the t-statistic of both advertising and constant can be observed. It is
due to drastic fall in the standard error. So, it is understood that outliers inflate the standard error.
Increase in standard error results in decrease in t-statistic. Consequently, lower value of t-statistic
makes the explanatory variable insignificant.

Apart from this, remarkable increase in R-squared and adjusted R-squared can be observed when
outlier is deleted from the data. Moreover, F-statistic has become significant after deleting the
outlier.

From this discussion, it can be concluded that outliers distort the results of the regression model by
inflating the standard error.

Вам также может понравиться