Вы находитесь на странице: 1из 13

Regression and Correlation

Correlation Coefficient
Testing the Correlation Coefficient
Simple Linear Regression

Correlation Coefficient
The linear correlation coefficient, denoted by , is a measure of the strength of the
linear relationship existing between two variables, say X and Y, that is independent
of their respective scales of measurement.

Correlation Coefficient
To visualize the possible underlying linear relationship between X and Y, we can plot
individual pairs of observations on a two-dimensional graph called a scatter
diagram.

Correlation Coefficient
Properties:
A linear correlation coefficient can only assume values between -1 and 1,
inclusive of endpoints.
The sign of describes the direction.
A positive value means that the line slopes upward to the right, and so as X
increases, the value of Y increases.
A negative value means that the line slopes downward to the right, and so as X
increases, the value of Y decreases.

If = 0, then there is no linear correlation between X and Y. However, this does


not mean a lack of association.
It is possible to obtain a zero correlation even if X and Y are related, though their
relationship is nonlinear, such as quadratic relationship.

Correlation Coefficient
Properties:
When is -1 or 1, there is a perfect linear relationship between X and Y and all
the points (x,y) fall on a straight line. A that is close to -1 or 1 indicates a strong
linear relationship.
A strong linear relationship does not necessarily imply that X causes Y or Y causes
X. It is possible that a third variable may have caused the change in both X and Y,
producing the observed relationship.

Correlation Coefficient
A point estimator of is the Pearson product moment correlation coefficient.
The Pearson product moment correlation coefficient between X and Y, denoted by
r, is defined as:

r=

n
n

n n
X
Y
(
X
)(
i=1 i i
i=1 i
i=1 Yi )
2 (n X )2 )(n n Y2 (n Y )2 )
(n n
X
i=1 i
i=1 i
i=1 i
i=1 i

Correlation Coefficient
r=

n
n

n n
X
Y
(
X
)(
i
i
i
i=1
i=1
i=1 Yi )

2
n
2 )(n n Y2 (n Y )2 )

(n n
X
(
X
)
i=1 i
i=1 i
i=1 i
i=1 i

Example: Hypothetical Data


x
y

1
3.5

2
4.5

2
4

2
3.5

3
6.5

3
8

3
6

4
7.9

4
7

x
y

5
9.4

6
9.3

6
11

6
7
7
10.5 12.4 11.5

7
10

8
15

8
11

8
13.7

Answer: 0.9511

3
7

Testing the Correlation Coefficient


Tests of Hypotheses for
Ho

Ha

= o

< o
> o
o

Test Statistic

t=

(r o ) n2
1r2 )

Region of
Rejection
t < t (v = n 2)
t > t (v = n 2)
|t| > t (v = n 2)
2

Example: Use the hypothetical data. Suppose that the linear correlation coefficient
between X and Y in the past is 0.90. We want to determine if the correlation has
significantly increased compared to the past. Test at 5% level of significance.

Simple Linear Regression


The simple linear regression model is given by the equation
Yi = 0 + 1 Xi + i
where
Yi is the value of the response variable for the ith element
Xi is the value of the explanatory variable for the ith element
0 is a regression coefficient that gives the Y-intercept of the regression line
1 is a regression coefficient that gives the slope of the line
i is the random error term for the ith element, where the i s are independent,
normally distributed with mean 0 and variance 2 for i = 1, 2, , n
n is the number of elements

Estimation using the Method of Least Squares


Formulas for b0 (estimate for 0 ) and b1 (estimate for 1 ):

b1 =

n
n

n n
X
Y
(
X
)(
i
i
i
i=1
i=1
i=1 Yi )
2 (n X )2
n n
X
i=1 i
i=1 i

b0 =
Y b1
X

= b0 + b1 X.
Thus, the estimated regression equation is given by Y

Estimation using the Method of Least Squares


Example: Use the hypothetical data.
= 2.016 + 1.383X
Answer: Y
We can see that as X increases by one unit, the mean of the response variable, Y, is
estimated to increase by 1.383.
2.016 has no meaningful interpretation because X = 0 is not within the range of
values we used in estimation.
Substitute x = 4. Substitute x = 5.

Estimation using the Method of Least Squares


A 100(1 )% CI estimate for 1 is (b1 t

v=n2

A 100(1 )% CI estimate for 0 is (b0 t

v = n2

Sb1 , b1 + t (v = n 2)Sb1 )
2

Sb0 , b0 + t (v = n 2)Sb0 )
2

To determine whether or not there is significant linear relationship between Y & X,


we test Ho: 1 = 0 against Ha: 1 0.
To assess whether 0 is different from 0, we test Ho: 0 = 0 against Ha: 0 0.

Estimation using the Method of Least Squares


The coefficient of determination, denoted by R2, is defined as the proportion of
the variability in the observed values of the response variable that can be explained
by the explanatory variable through their linear relationship.
r2 will be between 0 and 1 because 1 r 1. If a model has perfect predictability,
then R2 = 1. If a model has no predictive capability, then R2 = 0.
Example: Use the hypothetical data.
R2 = (0.9511)2 = 0.9046
90.46% of the variability in Y can be explained by X through the SLRM.