1
MODULE 3: CORRELATION &
REGRESSION ANALYSIS
Name of Institution
2
CORRELATION
When the relationship is of quantitative nature, the appropriate
statistical tool for discovering and measuring the relationship and
expressing it in a brief formula is known as correlation.
The measure of correlation called the coefficient of correlation
indicates the strength & direction of relationship between two
variables.
The coefficient between two variables x and y is denoted by r or r
xy
or .
It lies between 1 to + 1.
If r = 0, then the variables are said to be independent.
Name of Institution
3
TYPES OF
CORRELATION
I) Based on Direction:
Positive Correlation: When increase/decrease in the value of one
variable results in a corresponding increase/ decrease in the
value of other variable.
Negative Correlation: When increase/ decrease in the value of
one variable results in a corresponding decrease/ increase in the
value of other variable.
II) Based on Degree:
High
Moderate
Low
Name of Institution
4
METHODS OF
STUDYING CORRELATION
1) Scatter Diagram Method.
2) Karl Pearsons Coefficient of Correlation.
3) Spearmans Rank Correlation Coefficient.
Name of Institution
5
SCATTER DIAGRAM
The simplest method for studying correlation in two variables is a
special type of dot chart called Dotogram or Scatter Diagram.
In this method given data are plotted in the form of dots, for each
pair of X and Y.
The more the plotted points scatter over the chart, the lesser is the
degree of relationship between two variables.
The more nearly the points come to the line, the higher the degree of
relationship.
Name of Institution
Y
X
V= 1 V= 1
Y
X
V = 0 V = 0
Y
X
V= 1 V= 1
Y
X
V = 0 V = 0
Perfect negative
Correlation
No Correlation
Perfect Positive
Correlation
No Correlation
Name of Institution
7
Advantages:
1. It is readily comprehensive and enables us to form a rough idea of the
nature of relationship between the two variables x and y.
2. It is not affected by extreme observations.
Disadvantages:
1. It is not a suitable method if the number of observations is fairly large.
2. It is only a rough measure of correlation where the exact magnitude
cannot be known.
Name of Institution
8
KARL PEARSON COEFFICIENT OF
CORRELATION
Also known as Pearsonian Coefficient of Correlation.
It describes the degree & direction of relationship between two
variables X and Y.
It is denoted by the symbol r.
The value of Pearsons coefficient of correlation lies between 1 to
+1.
If X and Y are independent variables then coefficient of correlation
is zero.
Name of Institution PEARSON FORMULA
Correlation coefficient is denoted by r given by the formula:
) )
,

,

=
=

=

=
n
y
y
n
x
x
n
y x
xy
r or
form Third
y y x x
y y x x
r
form Second
y x Cov
y x
y x Cov
r
form First
y x
2
2
2
2
2 2
) (
) ( ) (
) )( (
) , .(
var var
) , .(
W W
Name of Institution
10
Ques 1. Calculate Karl Pearson coefficient of correlation.
X Y
12 14
9 8
8 6
10 9
11 11
13 12
7 3
Name of Institution
11
Ques 2. A financial analyst wanted to find out whether inventory turnover
influences any companys earnings per share.Random sample of 7 companies
listed in stock exchange were selected and the following data was recorded for
each.Find the correlation coefficient.
Company Inventory
turnover
Earnings
per share
(%)
A 4 11
B 5 9
C 7 13
D 8 7
E 6 13
F 3 8
G 5 8
Name of Institution
12
Ques 3. The following table gives the indices of industrial production and number
of registered unemployed people (in lakhs). Calculate Karl Pearsons coefficient
of correlation.
Index of
production
No. of
unemployed
100 15
102 12
104 13
107 11
105 12
112 12
103 19
99 26
Name of Institution
SPEARMAN CORRELATION
Rank X and Y separately.
The largest value gets rank 1 and the second
largest 2 and so on.
Formula is:
For tied ranks:
Y Rank X Rank d where
n n
d
=
=
;
) 1 (
* 6
1
2
2
V
.
) 1 (
....... ) (
12
1
) (
12
1
* 6
1
2
2
3
2 1
3
1
2
repeated is value a times of number the is m Here
n n
m m m m d
+ + +
=
V
Name of Institution
Question1) Calculate the coefficient of correlation for the following heights in
inches of fathers(X) and sons(Y).
X Y
65 67
66 68
67 65
67 68
68 72
69 72
70 69
72 71
Name of Institution
15
Question 2) Find rank correlation coefficient between x and y.
X Y
85 18.3
91 20.8
56 16.9
72 15.7
95 19.2
76 18.1
89 17.5
51 14.9
59 18.9
90 15.4
Name of Institution
Question 3) obtain the rank correlation coefficient for the following data.
X Y
68 62
64 58
75 68
50 45
64 81
80 60
75 68
40 48
55 50
64 70
Name of Institution
17
Name of Institution
REGRESSION
Regression analysis provides a mathematical model of
the relationship between two variables, in which one is
independent and one is dependent.
If X and Y are two variables, then we have two
regression lines:
(a) Regression line of X on Y.
(b) Regression line of Y on X.
Name of Institution
Regression line X on Y.
The regression line of X on Y is given by:
X= a + b Y
where, b is called regression coefficient X on Y, denoted by
b
xy
Here, Y is the independent variable and X is dependent
variable.
Normal equations to estimate a and b are:
+ =
+ =
2
Y b Y a XY
Y b na X
Name of Institution
Another form of regression equation X on Y is :
) )
y
x
xy
y
x
r b Here
Y Y r X X
W
W
W
W
* ,
*
=
=
Name of Institution
Regression line Y on X.
The regression line of Y on X is given by:
Y= a + b X
where, is called regression coefficient X on Y, denoted by
b
yx
Here, X is the independent variable and Y is dependent
variable.
Normal equations to estimate a and b are:
+ =
+ =
2
X b X a XY
X b na Y
Name of Institution
Another form of regression equation Y on X is :
) )
x
y
yx
x
y
r b Here
X X r Y Y
W
W
W
W
* ,
*
=
=
Name of Institution
Properties of regression lines and
coefficients
Both the regression lines passes through the point
The correlation coefficient is the geometric mean of two regression
coefficients of X and Y
i.e
If one of the regression coefficients is greater than 1,the other must be
less than 1.
b
xy
and b
yx
and correlation coefficient (r) have the same sign.
for eg:if b
xy
= 0.664 and b
yx
= 0.234
then r = (0.664*0.234)
1/2
= 0.394
) y x,
yx xy
b b r  =
Name of Institution
QUESTION 1) You are given the following information about advertising
expenditure and sales.
Advertisement(x) Sales(y)
A.M 10 90
S.D 3 12
And r = 0.8
(a)Obtain the two regression lines.
(b)Find the likely sales when advertisement budget is Rs
15 lakhs?
Name of Institution
QUESTION 2) The two regression lines are given by:
3 X + 12 Y = 19
9 X +3 Y = 46
And
x
= 4.
Obtain:
(a). Mean values of X and Y.
(b) The value of correlation coefficient.
(c) Standard deviation of y.
Name of Institution
26
Question 3. For the following data,
Obtain the two regression equations and hence find the correlation coefficient.
X 1 2 3 4 5
Y 2 5 3 8 7
Name of Institution
27
Question 4. The following data gives the ages and blood pressure of 10 women.
(i) Find the correlation coefficient between age and blood pressure.
(ii) Determine the regression equation of blood pressure on age.
(iii) Estimate the blood pressure of a woman whose age is 45 years.
Age 56 42 36 47 49 42 60 72 63 55
B.P 147 125 118 128 145 140 155 160 149 150