Вы находитесь на странице: 1из 41

BUSI3007

BUSINESS RESEARCH METHODS

LECTURE 10: CORRELATION AND LINEAR


REGRESSION
REFERENCE

Statistical techniques in Business and Economics, Lind, Marchal


and Wathen, 16th edition

Chapter 13 (pg. 427 - 454)

2
OVERVIEW
Correlation Analysis
Dependent and Independent Variables
The Correlation Coefficient
Testing the Significance of the Correlation Coefficien
t
Regression Analysis
Least Square Principle
Testing the Significance of the Slope

3
OVERVIEW
The Standard Error of Estimate
The Coefficient of Determination
ANOVA Table in Regression Analysis

4
CORRELATION ANALY
SIS
Correlation analysis refers to a group of techniques us
ed to measure the relationship between two variables.
Scatter diagram
Correlation coefficient

5
DEPENDENT AND INDEPEN
DENT VARIABLES
The Dependent Variable is the variable being predicted or estimat
ed.

The Independent Variable provides the basis for estimation. It is t


he predictor variable.

6
CORRELATION ANALY
SIS
The sales manager of Copier Sales of A
merica has a large sales force througho
ut the United States and Canada and w
ants to determine whether there is a rel
ationship between the number of sales
calls made in a month and the number o
f copiers sold that month. The manager
selects a random sample of 15 represe
ntatives and determines the number of s
ales calls each representative made last
month and the number of copiers sold.
Determine if the number of sales calls a
nd copiers sold are correlated.

7
CORRELATION ANALY
SIS
To report the relationship between the two variables, the usual firs
t step is to plot the data in a scatter diagram.
We refer to number of sales calls as the independent variable and
the number of copiers sold as the dependent variable.

There appears to be a positive


relationship between the two
variables.

8
THE CORRELATION C
OEFFICIENT
The Coefficient of Correlation (r) is a measure of the strength of
the relationship between two variables.
The sample correlation coefficient is identified by the lowerca
se letter r.
It shows the direction and strength of the linear relationship.
It ranges from -1 up to and including +1.
A value near 0 indicates there is little linear relationship betw
een the variables.
A value near +1 indicates a direct or positive linear relationshi
p between the variables.
A value near -1 indicates an inverse or negative linear relatio
nship between the variables.

9
THE CORRELATION C
OEFFICIENT

10
THE CORRELATION C
OEFFICIENT

11
THE CORRELATION C
OEFFICIENT

Correlation Coefficient:

12
THE CORRELATION C
OEFFICIENT
Using the Copier Sales of Ameri
ca data, compute the correlatio
n coefficient.

Let the number of sales calls be


x, number of copiers sold be y.

13
THE CORRELATION C
OEFFICIENT

14
THE CORRELATION C
OEFFICIENT

What does correlation of 0.865 mean?


There is a positive relationship between the number of sales calls
and the number of copiers sold. The value of 0.865 is fairly close t
o 1.00, so we conclude that the association is strong.

15
TESTING THE SIGNIFICANCE
OF THE CORRELATION COEF
FICIENT
H0: = 0 (the correlation in the population is 0)
H1: 0 (the correlation in the population is not 0)

t test for the correlation coefficient

16
TESTING THE SIGNIFICANCE
OF THE CORRELATION COEF
FICIENT
Using the copier sales example, can we conclude that the correlat
ion in the population is different from 0? Use a 0.05 significance le
vel

Step 1: State the null and alternate hypotheses.


H0: = 0 (the correlation in the population is 0)
H1: 0 (the correlation in the population is not 0)

Step 2: Select the level of significance.


The 0.05 significance level is selected.

17
TESTING THE SIGNIFICANCE
OF THE CORRELATION COEF
FICIENT
Step 3: Determine the appropriate test statistic.
We can use t-distribution as the test statistic.

Step 4: Formulate a decision rule.


df = 15 2 = 13
Reject H0 if t > 2.160 or t < -2.160

18
TESTING THE SIGNIFICANCE
OF THE CORRELATION COEF
FICIENT
Step 5: Compute the value of t and make a decision.

The t-test statistic, 6.216, is greater than 2.160. Therefore, reject


the null hypothesis that the correlation coefficient is zero.

19
TESTING THE SIGNIFICANCE
OF THE CORRELATION COEF
FICIENT
Step 6: Interpret the result.
The data indicate that there is a significant correlation between th
e number of sales calls and copiers sold. We can also observe t
hat the correlation coefficient is .865, which indicates a strong, po
sitive relationship. In other words, more sales calls are strongly r
elated to more copier sales. Please note that this statistical analy
sis does not provide any evidence of a causal relationship. Anoth
er type of study is needed to test that hypothesis.

20
REGRESSION ANALYSI
S
In regression analysis we use the independent variable (x) to esti
mate the dependent variable (y).
The relationship between the variables is linear.
The least squares criterion is used to determine the equation.

REGRESSION EQUATION
An equation that expresses the linear relationship between two va
riables.

21
REGRESSION ANALYSI
S
General Form of Linear Regression Equation:

where
is the estimated value of the y variable for a selected x value.
a is the y-intercept. It is the estimated value of y when x = 0.
b is the slope of the line, or the average change in for each chan
ge of one unit in the independent variable x.
x is any value of the independent variable that is selected.

22
REGRESSION ANALYSI
S
In regression analysis, our objective is to use the data to position
a line that best represents the relationship between the two variab
les.

23
LEAST SQUARES PRIN
CIPLE
The least squares principle is used to obtain a and b.

LEAST SQUARES PRINCIPLE: Determining a regression equati


on by minimizing the sum of the squares of the vertical distances
between the actual y values and the predicted values of y.

24
COMPUTING THE SLOPE OF
THE LINE AND THE Y-INTERC
EPT

25
REGRESSION EQUATI
ON
Recall the example involving Co
pier Sales of America. The sale
s manager gathered information
on the number of sales calls ma
de and the number of copiers s
old for a random sample of 15 s
ales representatives. Use the le
ast squares method to determin
e a linear equation to express th
e relationship between the two
variables.
What is the expected number of
copiers sold by a representative
who made 100 calls?

26
REGRESSION EQUATI
ON
Step 1: Find the slope (b) of the line.

Step 2: Find the y-intercept (a).

27
REGRESSION EQUATI
ON
Hence the regression equation is

When x = 100,

Hence if a salesperson makes 100 calls, he or she can expect to


sell 46.0432 copiers.

28
TESTING THE SIGNIFICANCE
OF THE SLOPE
H0: = 0 (the slope of the linear model is 0)
H1: 0 (the slope of the linear model is not 0)

t test for the slope

b is the estimate of the regression lines slope calculated from the


sample information
sb is the standard error of the slope estimate

29
TESTING THE SIGNIFICANCE
OF THE SLOPE
Using the previous result of the copier sales example, assuming t
he standard error of the slope is 0.042. Can we conclude that the
slope of the regression line is more than zero at a 0.05 significanc
e level?

30
TESTING THE SIGNIFICANCE
OF THE SLOPE
Step 1: State the null and alternate hypotheses.
H0: 0
H1: > 0

Step 2: Select the level of significance.


The 0.05 significance level is selected.

Step 3: Determine the appropriate test statistic.


We can use t-distribution as the test statistic.

31
TESTING THE SIGNIFICANCE
OF THE SLOPE
Step 4:: Formulate a decision rule.
df = 15 2 = 13
Reject H0 if t > 1.771

Step 5: Compute the value of t and make a decision.

Since t > 1.771, we can reject H0.

32
TESTING THE SIGNIFICANCE
OF THE SLOPE
Step 6: Interpret the result.
Based on the sample evidence, we can conclude that the slope of
the regression is more than zero. The independent variable, num
ber of sales call, is useful in estimating copier sales.

33
THE STANDARD ERRO
R OF ESTIMATE
The standard error of estimate measures the scatter, or dispersio
n, of the observed values around the line of regression for a given
value of x.
Formulas used to compute the standard error:

34
THE STANDARD ERRO
R OF ESTIMATE
Recall the example involving Copier Sales of America. The sales
manager determined the least squares regression equation is giv
en below.

Determine the standard error of estimate as a measure of how we


ll the values fit the regression line.

35
Sales Calls () Copiers Sold ()

96 41 45.000 16.000
40 41 30.395 112.462
104 51 47.086 15.316
128 60 53.346 44.281
164 61 62.734 3.008
76 29 39.784 116.295
72 39 38.741 0.067
80 50 40.827 84.140
36 28 29.352 1.828
84 43 41.870 1.276
180 70 66.907 9.565
132 56 54.389 2.596
120 45 51.259 39.178
44 31 31.438 0.192
84 30 41.870 140.906

587.111

36
THE STANDARD ERRO
R OF ESTIMATE
The standard error of estimate is computed as:

37
COEFFICIENT OF DET
ERMINATION
The coefficient of determination (r2) is the proportion of the total v
ariation in the dependent variable (y) that is explained or account
ed for by the variation in the independent variable (x). It is the squ
are of the coefficient of correlation.

38
COEFFICIENT OF DET
ERMINATION
Determine the coefficient of determination for the Copier Sales of
America example.

r = 0.865,
the coefficient of determination - r2 = (0.865)2 = 0.748

This is a proportion or a percent; we can say that 74.8 percent of t


he variation in the number of copiers sold is explained, or account
ed for, by the variation in the number of sales calls.

39
ANOVA TABLE IN REG
RESSION ANALYSIS
Regression analysis is usually conducted using regression softwa
re and the output is as follow:

40
ANOVA TABLE IN REG
RESSION ANALYSIS
Regression Sum of Squares = SSR = = 1738.89
Residual or Error Sum of Squares = SSE = = 587.11
Total Sum of Squares = SS Total = = 2326.00

41

Вам также может понравиться