Вы находитесь на странице: 1из 6

QUANTITATIVE METHODS

LESSON 30: CORRELATION & REGRESSION CORRELATION


In the last lecture we have discussed what is correlation and what is the need of that.Today we will be discussing different methods for finding correlation. (14, 12), (0, 2), (1,5), (7,3), (15,9), (2,8), (12, 18), (9,11), (5,3), (17, 13), (19, 18), (11, 7), (10,13), (13, 16), (16,14), (6, 10), (4,1), (11, 14), (8,3), (9,15), (13,11), (14,17), (10,10), (11, 7), (15,15). Prepare a two-way table taking the magnitude of each class interval as 4 marks, the first being equal to 0 and less than 4.
Solution

Computation of Correlation Coefficient


We can find out correlation by one or more of the following methods: 1. Two-way Frequency Table, 2. Scatter Diagram, 3. Co-variance Method, 4. Rank Method, 5. Concurrent Deviation Method.

1. Two way Table


This is the simplest method of judging association between two variables. The two-way frequency table is prepared by indicating one variable in rows (horizontal) and the other variable in the columns (vertical). The frequencies are shown in the respective squares which are equal to m x n, when m and n are the respective sizes of classes of rows and columns. The nature of concentration of frequencies in various squares reveals the type of correlation: positive or negative. As regards the degree of correlation the table gives only a very rough idea and it is difficult to quantitatively indicate the degree of correlation. Therefore, it is not considered to be a scientific method. Given below is a two-way frequency table dealing with marks in Economics in the rows and marks in Statistics in the columns.

2. Scatter Diagram
This is a method of representing bivariate data graphically which is particularly useful for ungrouped data. The procedure of drawing a scatter diagram is given below: 1. On a squared paper, draw two axes at right angles, one axis corresponding to the variable X and the other to the variable Y. To each of the data there will correspond a pair of values of X,Y which in turn will correspond to a point whose abscissa on the diagram is X and whose ordinate is Y. Thus the population, when ordinate is Y. Thus the population, when represented in this way, will give a scatter of points on the diagram. We can interpret the ways in which these points cluster or scatter as the properties of the relationship between the two variables. The graphical representation of the data in a square region representing the variables in two axes, X,Y by way of points is called the Scatter diagram. The following are some scatter diagrams depicting the various cases of the sample correlation coefficient, viz, r = +1, r = +0.95, r = -0.57, r = 0.
(a) r=+1 Y (x) (b) r= - 1 Y (x)

Marks in Economics 0-10 10-20 20-30 30-40 40-50 Total

010 6

1020 8 6

Marks in Statistics 20- 30- 4030 40 50 10 6 4 3 3 10

Total 14 16 10 6 4 50

3 1 4

14

16

The scatter of values from top left to bottom right reveals that lower marks in Economics are associated with lower marks in Statistics. Note : - The marks scored by 25 students in Statics and Economics are given below. The first figure in brackets indicated the marks in Statistics and the second, marks in Economics.
Example

The marks scored by 25 students in Statistics and Economics are given below. The first figure in brackets indicates the marks in Statistics and the second marks in Economics.

11.502

Copy Right: Rai University

149

Solution
Y (c) r = + .95 Y (x) Y (d) r = - .57 Y (x) X X
X 65 66 67 67 68 69 70 72 544 Sales (Rs. Lakhs) Adverti sing Expen diture (Rs. 000) Y 67 68 65 68 72 72 69 71 552 Deviatio ns from the Av. Sales X-X= x -3 -2 -1 -1 0 +1 +2 +4 0 Square of deviations Deviation from the Av. Adv. ependiture Y-Y = y -2 -1 -4 -1 -3 -3 0 -2 0 Squares of deviations Product of deviations

QUANTITATIVE METHODS

(e) r=0 Y (x)

(f) r=0 Y (x)

X2 9 4 1 1 0 1 4 16 36

Y2 4 1 16 1 9 9 0 4 44

Xy 6 2 4 1 0 3 0 8 24

X=

SX
n

544 = 68 8

Y=

Sy
n

552 8

69

The method is only a very rough measure of correlation where the exact magnitude cannot be known. To remedy this weakness, we have two other methods. What is Karl Pearsons co variance method

As per Karl Pearson, coefficient of correlation

3. Karl Pearsons Co-Variance method


Co-variance measures joint variations of two variables. The formula for ungrouped data is. Co-variance = xy where x signifies the deviation from the mean of variable X. n and y signifies the deviation from the mean of the variable i.e.Co-variance =

Thus, there is a fair degree of positive correlation between the volume of sales and the advertising expenditure.
Example

With the following data in 6 cities calculate the coefficient of correlation by Pearsons method between the density of population and the death rate.
Cities A B C D E F
Solution

( x x )( y
n

y)

This is an absolute measure. For a relative measure, we have to divide it by the product of their standard deviations which is called the coefficient of correlation. The range of measurement in it is between + 1 and 1. The formula given by Karl Pearson is given below. xy Co-efficient of correlation r = Sxy
n xy x y n x y

Area in Sq. Kms. 150 180 100 60 120 80

Population in 000 30 90 40 42 72 24

No. of deaths 300 1440 560 840 1224 312

Using Density =
Population , No. of Deaths , = Area and Death rate = Populations we find the following data:

[x is the standard deviation of the variable X,and y is the standard deviation of the variable Y: x. y are the deviation from the respective means]. We will study the use of this method in case of individual distribution and the group distribution. First we take the case of an individual distribution.
Example

Density X (per Sq. Km.)

200

500 400 16
Y=Y Y -5 1 -1 5 2 -2 0 X2 62 5 25 25 62 5 22 5 22 5 17 50

700 20
xy 125 5 5 125 30 30 320

600 17

300 13

Death rate Y (per thousand)10


X 200 500 400 700 600 300 2700 Y 10 16 14 20 17 13 90 Y X -25 5 -5 25 15 -15 0

14
Y2 25 1 1 25 4 4 60

x=

10

Find the correlation between sales and advertising expenditure from the following data: Sales (Rs. Lakh) (Rs. 000) 65 66 67 67 68 69 70 72 65 68 72 72 69 71 Advertising Expendiute 67 68

150

Copy Right: Rai University

11.502

QUANTITATIVE METHODS

X = 2700
6

= 450,

Y=

90 6

= 15,

8 x 2160 48 x 108

r =
8 x 1530 (48)2 8 x 3468 (108)2

Being a pure number, the correlation coefficient is not affected by shift of origin and change of scale. Therefore, the required coefficient of correlation is given by

In each of the above two examples, the average size of the variable was an integer and therefore there was no difficulty in calculating deviations of the individual items form their respective means. If, however, the average is not an integer, we make use of an assumed average. When deviations are taken from the assumed average and not the actual average, the corrected formula is as followsy

There is thus a significan1t positive correlation between the two variables.


Example

In two sets of variables X and Y with 50 observations each, the following data were observed: X = 10, Y = 6, S.D. of X = 3, S.D. of Y = 2,

r=

NS d xd y (S d x) (S d y) y {NS d 2 ( S d )2 } {NS d 2 (S d)2 }


X X X y

Coefficient of correlation between X and Y is 0.3. However on subsequent verification it was found that one value of X (=10) and one value of Y (=6) were inaccurate and hence weeded out. With the remaining 49 pairs of values, how is the original value of correlation coefficient affected?
Solution

Sd x d y x

( Sd x )( Sd y ) N N
x

We know that
r = 1
n

Sd 2 - (S dx)2 S d2 - ( Sdy )2
N
or,

S(x x)(y y)
s
x

Let us take an example to illustrate the use of this formula.


Example

1 S(x x)(y y)= rs n

Calculate coefficient of correlation between X and Y


or,

S ( x x )( y y ) = n r s

X Y

78 125

89 137

97

69

59

79

68

61
or,

= 50 x 0.3 x 3 x 2 = 90 SXY XSY YSX + SXY =90

156 112

107 136

123 108

Solution

SX Y = 90 + n XY = 90 + 50 x 10 x 6 = 3090

[S Y = nY, S X = and

S XY = nXY

X Y 78 125 89 137 97 156 69 112 59 107 79 136 68 123 61 108 N=8 N=8

dx +9 +20 +28 0 -10 +10 -1 -8 +48

dy +13 +25 +44 0 -5 +24 +11 -4 +108

d2

d2y 169 625 1936 0 25 576 121 16 +3468

81 400 784 0 100 100 1 64 +1530

dxdy +117 +500 +1232 0 +50 +240 -11 +32 +2160

Corrected

S XY =3090 10 x 6 = 3030

S X = nX 10 = 50 x 10 10 = 49 0
S Y = nY 6 = 50 x 6 6 =294

(dx is deviatiation from assured average 69 and dy is deviation from assumed average 112) Substituting the values in the above formula, we have

11.502

Copy Right: Rai University

151

From the given data, we find

QUANTITATIVE METHODS

With the actual values of the variables X, Y, the correlation coefficient is found from

Corrected value of the correlation coefficient with the remaining 49 pairs of values is therefore given by

10 x 60 40 x 0 = {10 x 180 (40)2 } { 10 x 215 (0) 2}

600 0 (1800 1600) (2150-0)

600 200 x 2150

3060 = 5350 (490)2 49

490 x 294 49 1964 (294)2 49


=

6 = 2 x 215

6 = 43

6 = 0.915 6557

49 (3030 2940) 49(5350 4900) (1964 1764)

Now we take an example of group distribution of two variables. The formula given earlier has also to be used here, but the method of finding xy is somewhat tedious.
Example

Find the coefficient of correlation between the grouped distribution of two variables presented in the form of a two way frequency table:
Profit in 0000 rupees 50-55 55-60 60-65 65-70 Total Sales (in lakh rupees 80-90 1 2 1 4 90100 3 4 5 3 15 100110 7 10 12 8 37 110120 5 7 10 6 28 120130 2 4 7 3 16 Total 18 27 35 20 100

90 450 x 200

90 90000

9 = 0.3 30

Therefore, original value of r remains unaffected. Note: It may be noted that the formula used above does not make use of deviations of items. It makes use of actual values of the variables.
Example

Calculate correlation coefficient from the following results:

Solution

Calculation can conveniently be done by forming a table as given in Page no.66 Pearsons formula with frequencies of both the variables is:

Solution

Since the correlation coefficient r is not affected by change of origin of reference. r = r where u = x 10, and v = y - 15 xy uv2
152 Copy Right: Rai University 11.502

31 x 100 37 x 57 123 x 100 (37)2 133 x 100 (57) 2

991 10931 x 10051

It is not necessary to use the same change of location and scale for both the variables. All the entries in the table are selfexplanatory, perhaps excepting, U and V. These are explained below: U entries are (-2) 11 + (-1) 62 + (0) 19 + (1) 3 + (2) 1 = -79 (-2) 6 + (-1) 220 + (0) 190 + (1) 34 + (2) 6 + (3) 2 = - 180 etc. and V entries are (-2) 11 + (-1) 6 = -28 (-2) 62 + (-1) 200 + (0) 46 + (1) 3 + (2) 1 = -339 etc

QUANTITATIVE METHODS

Thus correlation is not significant. The tabular method of calculation presented here is convenient in finding out coefficient of correlation between two variables having frequency distribution. The combined group frequencies are there in the central squares in the table. In the right hand side and at the bottom are given the respective frequencies for the pair of values. The middle figure in each

8090 85 -20 -2 5055 52.5 57.5 Mid-Points 62.5 37.5 -5 0 +5 +10 -1 +2 1 0 0 2 +1 1 6570 +2 0 d x dx2 d xd
y

Sales Class Intervals (Rs.) 90100- 110100 110 120 MidPoints 95 -10 -1 105 0 115 +10 +1 -5 -1 5 0 0 10 0 12 8 0 37 0 0 0 0 0 0 7 10 +1 10 12 +2 6 28 28 28 17 0

120130 125 +20 +2 -4 -2 2 0 0 4 14 +2 7 3 12 4 16 32 64 22 18 20 100 37 123 31 40 57 80 31 133 35 35 35 17 27 0 0 0 18 d x -18 d 2y 18 dxd


y

Deviations (d u)

dx = du / c
0 3 +1 3 0 0 4 -2 -2 5 0 -4 4 -8 16 0 3 -6 -2 15 -15 15 -8 -5 -1 0 7 +0 0 2

-4

Class Intervals

PROFITS

5560 6065

Square is the product of the deviations of both the variables. The product of the frequency and the two deviations is given at the top on the right hand side of each square. These are summed up in the extreme column and row designated as dx d y. With this data we can use usual formula to calculate the coefficient of correlation.
Example

Further, from the table


u = 124 1000 1834 1000 = 0.124 v = 371 1000 1531 1000 = - 0.371

F2 = u

-(0.124)2

F2 = v

-(- 0.371)2

For 1000 marriages the ages of bridegroom (X) and bride (Y) in years are grouped in the following correlation table with class intervals of 5 years for each. Let x and y denote the mid-points of the class-intervals. Make further transformations.
u =
x 27.5 5 and

= 1.8186 F = 1.349 u

= 1.3934

y =

y 27.5 5

Using the formula

v = 118

11.502

Copy Right: Rai University

153

QUANTITATIVE METHODS

r=

1 n ?uv uv FF u v

r=

1.109 (0.124)(-0.371) 1.349 x 1.18

= 0.726

In case, switching back to the original variables x and y is needed, it is possible without any complications as follows: X = 27.5 + (0.124) = 28.1 Y = 27.5 + 5 (-0.371) = 25.645 Fx = 5 x 1.349 = 6.745 Fy = 5 x 1.18 = 5.90

It can easily be checked that the value of r remaining the same; i.e., r (x,y)= r (u,v). Notes

154

Copy Right: Rai University

11.502

Вам также может понравиться