Вы находитесь на странице: 1из 14

Regression And Correlation

Regression
The Process to find out (Predict) the values of one variable (dependent variable with the help of given
values of other variable or group of related variable called independent variable is known as
regression.
Simple linear regression
The regression is said to be simple linear regression if there is one independent and one dependent
variable and observed values tends to cluster around straight line.
Difference between regressand and regressor
In regression analysis the independent variable is called regressor and the dependent variable is called
regressand.
SIMPLE LINEAR REGRESSION EQUATIONS
Following are the two equations of simple linear regression:
 Regression line Y on X
 Regression line X on Y
Regression Line Y on X
Equation of regression line Y on X is
Y = a + bX
Where
Y → Dependent variable (OR) Regressand
X → Independent variable (OR) Regressor
b → Regression coefficient (OR) Slope of the line
a → Y – intercept at X = 0
Regression Line X on Y
Equation of regression line X on Y is
X = c + dy
Where
X → Dependent variable (OR) Regressand
Y → Independent variable (OR) Regressor
d → Regression coefficient (OR) Slope of the line
c → X – intercept at Y = 0
Properties of regression line
Following are the properties of regression line:

 Least square regression line is always passes through the point of means i.e. X , Y 
 The sum of residuals (errors) between the expected values and the observed values is
 
ˆ =0
always equal to zero. i.e.  Yi  Y
Results: (i) ˆ =0
 Yi   Y  ˆ
 Yi =  Y
(ii) Dividing by “n” on both the sides, we get
 Yi Y ˆ
=  Y=Y ˆ
n n
 The sum of squares of residuals (errors) between the expected values and the observed
2

values is minimum (least). i.e.  Y  Y
i 
ˆ is least.

1
Resource Person: M. Rashad Younus
Regression And Correlation
 It is the line of best fit because “a” and “b” are unbiased estimators of the population
parameters “” and “”.
Regression coefficient
The regression coefficient indicates the expected change in dependent variable associated with
one unit increase in the value of the independent variable.
Properties of regression coefficient
Following are the properties of regression coefficient:
 The regression coefficients are not symmetrical w.r.t. X & Y.
 The regression coefficients are independent of change of origin but not of scale.
 The G.M. of two regression coefficients is equal to the correlation coefficient.
 The regression coefficients are only describing the direction of relationship between the
two variables.
 Both regression coefficients have same signs.

FORMULAE OF REGRESSION COEFFICIENTS


Yon X X on Y
n  XY    X   Y  n  XY    X   Y 
b= 2 2
d= 2
n  X   X n  Y2    Y 
 XY  n X Y  XY  n X Y
b= d=
 X2  n X2  Y2  n Y2
  X  X  Y  Y    X  X  Y  Y 
b= 2
d= 2
 X  X  Y  Y
n  Dx Dy    Dx    Dy  n  Dx D y    D x    D y 
b= 2
d= 2
n  D 2x    D x  n  D 2y    D y 
Where D x = X  A and D y =Y  A
n  UV    U   V  n  UV    U   V 
b= 2
d= 2
n  U2    U  n  V2    V 
XA X  A
Where U= , V=
h k
Sxy Sxy
b= d=
S2
x
S2y
2
1   X   Y   1  X 
Where Sxy =   XY  , S =  X2 
2
x  and
n n  n  n 
2
1  Y 
S =  Y 2 
2
y 
n  n 
Sy Sy
b=r× b=r×
Sx Sx

2
Resource Person: M. Rashad Younus
Regression And Correlation
Method of least square
A procedure in which regression equation is obtained by minimizing the sum of square of residuals
(errors). The parameter values obtained are called least square estimates.
Normal equations of least square regression line
Equation of least square regression line Y on X is
Y = a + bX ……………. ( 1 )
Multiplying equation (1) by the coefficient of “a”
Y = a + bX
Applying summation on both the sides, we get
Y = na + bX
This the normal equation for “a”
Multiplying equation (1) by the coefficient of “b”
XY = aX + bX2
Applying summation on both the sides, we get
XY = aX + bX2
This the normal equation for “b”
NOTE: We can write the regression coefficients as
b = byx & d = bxy

Error
The measure of deviation from the actual value is called an error.

PROBLEM 1
Fit a regression line Y on X from percentages of marks scored by 12 students in Statistics (X) &
Economics (Y):
X 30 34 26 49 60 62 65 51 44
Y 27 18 34 28 26 30 32 30 28
SOLUTION
Equation of regression line Y on X is
Y = a + bX
Y  b X n  XY    X   Y 
We have a= and b= 2
n n  X2    X 

X Y XY X2
30 27 810 900
34 18 612 1156
26 34 884 676
49 28 1372 2401
60 26 1560 3600
62 30 1860 3844
65 32 2080 4225
51 30 1530 2601
44 28 1232 1936
421 253 11940 21339

3
Resource Person: M. Rashad Younus
Regression And Correlation

9(11940)  (421)(253)
b=
9(21339)  (421) 2
107460  106513
b=
192051  177241
947
b=  b = 0.064
14810
253  0.064 (421)
a=  a = 25.12
9
Hence
The fitted regression line Y on X is
Ŷ = 25.12 + 0.064X
PROBLEM 2
Price indices of cotton (X) and Wool (Y) are given below for the 12 months of a year. Obtain the equation of
regression line Y on X between indices:
Price index of cotton (X) 78 77 85 88 83 83 82 78 76 83 97 98
Price index of Wool (Y) 84 80 82 83 88 90 88 91 83 89 78 96
SOLUTION
X Y XY X2
78 84 6552 6084
77 80 6160 5929
85 82 6970 7225
88 83 7304 7744
83 88 7304 6889
83 90 7470 6889
82 88 7216 6724
78 91 7098 6084
76 83 6308 5776
83 89 7387 6889
97 78 7566 9409
98 96 9408 9604
1008 1032 86743 85246
Equation of regression line Y on X is
Y = a + bX
 XY  n X Y
We have a=Y  bX and b=
 X2  n X2
X 1008
X= , X=  X = 84
n 12
Y 1032
Y= , Y=  Y = 86
n 12
86743  12(84)(86)
Now b=
85246  12 (84) 2

4
Resource Person: M. Rashad Younus
Regression And Correlation
55
b=  b = 0.096
574
and a = 86 – 0.096(84)
a = 77.936
Hence

The fitted regression line Y on X is Ŷ = 77.936 + 0.096X


PROBLEM 3
Fit a regression line X on Y from percentages of marks scored by 12 students in Statistics (X) & Economics
(Y):
X 30 34 26 49 60 62 65 51 44
Y 27 18 34 28 26 30 32 30 28
SOLUTION
Equation of regression line X on Y is
X = c + dy
X  d Y n  XY    X   Y 
We have c= and d= 2
n n  Y2    Y 
X Y XY Y2
30 27 810 729
34 18 612 324
26 34 884 1156
49 28 1372 784
60 26 1560 676
62 30 1860 900
65 32 2080 1024
51 30 1530 900
44 28 1232 784
421 253 11940 7277
9(11940)  (421)(253)
d=
9(7277)  (253) 2
107460  106513
d=
65493  64009
947
d=  d = 0.64
1484
421  0.64 (253)
c=  c = 28.79
9
Hence
The fitted regression line X on Y is

X̂ = 28.79 + 0.64Y
PROBLEM 4
Price indices of cotton (X) and Wool (Y) are given below for the 12 months of a year. Obtain the equation of
regression line X on Y between indices:

5
Resource Person: M. Rashad Younus
Regression And Correlation
Price index of cotton (X) 78 77 85 88 83 83 82 78 76 83 97 98
Price index of Wool (Y) 84 80 82 83 88 90 88 91 83 89 78 96

SOLUTION
X Y XY Y2
78 84 6552 7056
77 80 6160 6400
85 82 6970 6724
88 83 7304 6889
83 88 7304 7744
83 90 7470 8100
82 88 7216 7744
78 91 7098 8281
76 83 6308 6889
83 89 7387 7921
97 78 7566 6084
98 96 9408 9216
1008 1032 86743 89048
Equation of regression line X on Y is
X = c + dY
We have
 XY  n X Y
c=X  dY and d=
 Y2  n Y2
X
X=
n
1008
X=  X = 84
12
Y
Y=
n
1032
Y=  Y = 86
12
86743  12(84)(86)
Now d=
89048  12 (86) 2
55
d=  d = 0.186
296
and c = 84 – 0.186(86)
c = 68
Hence
The fitted regression line X on Y is
X̂ = 68 + 0.186Y

6
Resource Person: M. Rashad Younus
Regression And Correlation
Correlation

 The interdependence of two or more related variables is called correlation.


(OR)
 Correlation measures the closeness of relationship between the variables.

TYPES OF CORRELATION

1) Positive or direct correlation


2) Negative or inverse correlation.
3) Zero or Null correlation
Positive or Direct Correlation
When both the variables move in the same direction then correlation is said to be positive ie, if one
variable increases other also increases or if one variable decreases other also decreases. The value of
correlation coefficient for positive correlation is between 0 and 1 i.e. 0 < r < 1.
Examples
 Relationship between lung’s cancer and smoking habits.
 Increase in temperature in summer increase the sale of room coolers.
 The length of an iron bar will increase as the temperature increases.
 Increase in the heights of children is accompanied by increase in their weights.
NOTE:
In case of positive correlation the least square regression line have positive slope.
Numerical Example
Here the two variables X & Y move in the same direction.
X 12 15 20 27 30 39 44
Y 8 10 12 19 25 35 40

X 40 35 32 30 26 23 18
Y 38 30 26 23 20 18 14
Negative or Inverse Correlation
When both variables move in opposite direction then correlation is said to be negative i.e. if one
variable increases other decreases and vice versa. The value of correlation coefficient for negative
correlation is between –1 and 0 i.e. – 1 < r < 0.
Examples
 The volume of gas will decrease as the pressure increase.
 Increase in supply of a commodity decreases its price.
 The decrease in temperature in summer increases the sale of overcoats.
NOTE
In case of negative correlation the least square regression line have negative slope.
Numerical Example
Here the two variables X & Y move in the opposite direction.

7
Resource Person: M. Rashad Younus
Regression And Correlation
X 15 18 25 30 33 39 43
Y 40 35 30 25 20 16 10
Zero or Null Correlation
The absence of any relation between the variables is called zero correlation i.e. when the
variables are independent for this situation r = 0
Example
Amount of rainfall and the head sizes.
NOTE
In case of no correlation one least square regression line horizontal and the other least square
regression line is vertical.
Numerical Example
Here a change in variable X has no effect on variable Y.
X 1 2 3 4 5 6 7
Y 7 7 7 7 7 7 7
Limits of the correlation coefficient
The correlation coefficient always lies between – 1 and + 1
 If r = + 1the correlation is perfect +ve
 If r = – 1 the correlation is perfect –ve
 If r = 0 the two variables X and Y are independent.
Correlation Coefficient
A measure of the degree of correlation is given by the sample correlation coefficient denoted by “r”. It
is a measurement of the degree of interdependence between the variables.
The value of sample correlation coefficient is an estimate of population correlation coefficient denoted
by ρ (roh).
Properties of Correlation coefficient
 The correlation coefficient is symmetrical w.r.t X & Y i.e rxy = ryx
 The correlation coefficient is G.M. of two regression coefficient i.e

r=± b×d OR r=± b xy × b yx

 For two independent random variables correlation coefficient “r” is zero.


 The correlation coefficient is independent of origin and unit of measurement. i.e rxy = ruv
 The correlation coefficient lies between -1 and +1 i.e. – 1 ≤ r ≤ + 1.

Scatter Diagram
If we plot the values of the variables on a graph measuring one variable along X – axis and the other
along Y – axis, the resulting set of points is called a scatter diagram.
PROBLEM 7
Calculate the correlation coefficient between percentages of marks scored by 12 students in Statistics
(X) & Economics (Y):
X 30 34 26 49 60 62 65 51 44
Y 27 18 34 28 26 30 32 30 28
SOLUTION

8
Resource Person: M. Rashad Younus
Regression And Correlation
Correlation coefficient between X and Y is

n  XY    X   Y 
r=
 n X 2   X 2   n Y 2   Y  2 
       
X Y XY X2 Y2
30 27 810 900 729
34 18 612 1156 324
26 34 884 676 1156
49 28 1372 2401 784
60 26 1560 3600 676
62 30 1860 3844 900
65 32 2080 4225 1024
51 30 1530 2601 900
44 28 1232 1936 784
421 253 11940 21339 7277

9(11940)  (421)(253)
r=
9(21339)  (421)2  9(7277)  (253)2 

107460  106513
r=
192051  177241 65493  64009 
947
r=
14810 1484 
947
r=  r = 0.202
21978040

PROBLEM 8
Price indices of cotton (X) and Wool (Y) are given below for the 12 months of a year. Obtain the
correlation coefficient between X and Y & obtain the equations of the lines of regression between
indices:

Price index
of cotton 78 77 85 88 83 83 82 78 76 83 97 98
(X)
Price index
of Wool 84 80 82 83 88 90 88 91 83 89 78 96
(Y)

SOLUTION
Correlation coefficient between X and Y is

9
Resource Person: M. Rashad Younus
Regression And Correlation
 XY  n X Y
r=
 X 2
 n X 2   Y 2  n Y 2 

X Y XY X2 Y2
78 84 6552 6084 7056
77 80 6160 5929 6400
85 82 6970 7225 6724
88 83 7304 7744 6889
83 88 7304 6889 7744
83 90 7470 6889 8100
82 88 7216 6724 7744
78 91 7098 6084 8281
76 83 6308 5776 6889
83 89 7387 6889 7921
97 78 7566 9409 6084
98 96 9408 9604 9216
1008 1032 86743 85246 89048
X
X=
n
1008
X=  X = 84
12
Y
Y=
n
1032
Y=  Y = 86
12
Now
86743  12 (84)(86)
r=
85246  12(84) 2  89048  12(86) 2 

55
r=
(574)(296)
55
r=  r = 0.134
169904
PROBLEM 9
Calculate the coefficient of correlation between percentage of marks scored by 12 students in statistics
and economics.
Marks in
50 54 56 59 60 61 62 65 67 71 71 74
Statistics
Marks in
22 25 34 28 26 30 32 30 28 34 36 60
Economics

SOLUTION

10
Resource Person: M. Rashad Younus
Regression And Correlation
Correlation coefficient between X and Y is

 XY 
  X   Y 
r= n
2 2
 2  X  2  Y 
 X    Y  
 n   n 

X Y XY X2 Y2
50 22 1100 2500 484
54 25 1350 2916 625
56 34 1904 3136 1156
59 28 1652 3481 784
60 26 1560 3600 676
61 30 1830 3721 900
62 32 1984 3844 1024
65 30 1950 4225 900
67 28 1876 4489 784
71 34 2414 5041 1156
71 36 2556 5041 1296
74 60 4440 5476 3600
750 385 24616 47470 13385
(750)(385)
24616 
r= 12
2 2
  750   13385   385
 47470   
 12   12 

24616  24062.5
r=
(47470  46875)(13385  12352.08)
553.5
r=  r = 0.706
(595)(1032.92)

PROBLEM 10
Calculate the coefficient of correlation between supply and demand from the following data:
Supply 400 200 700 100 500 300 600
Demand 60 30 70 10 40 20 50
SOLUTION
Correlation coefficient between X and Y is
  X  X  Y  Y 
r=
2 2
 X  X . Y  Y

11
Resource Person: M. Rashad Younus
Regression And Correlation
2 2
X Y X  X X  X Y  Y Y  Y X  X  Y  Y 
400 60 0 0 20 400 0
200 30 200 40000 10 100 2000
700 70 300 90000 30 900 9000
100 10 300 90000 30 900 9000
500 40 100 10000 0 0 0
300 20 100 10000 20 400 2000
600 50 200 40000 10 100 2000
2800 280 0 280000 0 2800 24000
X
X=
n
2800
X=  X = 400
7
Y
Y=
n
280
Y=  Y = 40
7
Now
24000 24
r=  r=  r = 0.857
(280000)(2800) 28

MISCELLANEOUS EXERCISE
Q. 1 Obtain the correlation coefficient between X and Y series. Also obtain the line of regression of
Y on X from the following series:
X 125 137 156 112 107 136 123 106
Y 78 89 97 69 59 79 68 53

Q. 2 Compute the coefficient correlation “ r ” from the following X and Y series and also the
equation of the line of regression X on Y:
X 80 45 55 56 58 60 65 68 70 75 85
Y 82 56 50 48 60 62 64 65 70 74 90

Q. 3 Compute means X and Y , Standard deviation of X and Y series, correlation coefficient,


regression coefficient Y on X. Also find the regression line of Y on X from the following data:
X 5 6 7 8 9 10 11 12 13 14 15
Y 9 7 10 3 13 11 14 10 14 12 18

Q. 4 Find the coefficient of linear correlation between the variables X and Y, also obtain the line of
regression equations of X on Y and Y on X from the following data:
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9

Q. 5 Calculate the coefficient of correlation from the following data and also obtain the equations of
two lines of regression X on Y and Y on X and predict the value of Y when X = 700:
X 400 200 700 100 500 300 600

12
Resource Person: M. Rashad Younus
Regression And Correlation
Y 50 60 20 70 40 30 10

Q. 6 Given below are the heights (X) and weights (Y) of group of ten students. Calculate the
correlation between them and also the equation of line of regression of Y on X and predict the
value of Y when X = 80.
X ( inches) 58 59 59 60 61 63 65 66 68 70
Y (pounds) 108 106 101 111 115 108 116 118 122 125

Q. 7 Calculate the coefficient of correlation between exports (X) and imports (Y) and obtain the
equation of the line of regression of X on Y from the following:
X
52.3 51.5 53.4 54.1 55.2 50.0 45.5 55.7 59.4 60.5
(Exports)
Y (Imports) 62.4 63.5 58.4 56.3 55.5 62.5 65.4 58.5 55.1 50.2

Q. 8 Find correlation between X and Y and the equation of the line of regression of Y on X and
predict the value of Y when X = 75.
X 65 63 67 68 62 70 66 67 69 71
Y 68 66 68 65 69 66 68 71 67 68

Q. 9
X 80 82 86 91 83 85 89 96 93
Y 145 140 130 124 133 127 120 110 116
Required: Calculate the coefficient of correlation and also the line of regression Y on X.

Q. 10 Fit the line of regression of Y on X from the following data:


X 5 7 6 12 17 19 20 29
Y 22 14 11 9 9 8 6 2

ANSWERS
Miscellaneous Exercise
Q. 1 Ŷ =  27.95 + 0.813X , r = 0.95

Q. 2 X̂ = 10.84 + 0.83Y , r = 0.92


Q. 3 X = 10 , Y = 11, Sx  3.162 , Sy = 3.789 , r = 0.698, b = 0.84, Ŷ = 2.6 + 0.84X

Q. 4 Ŷ = 0.52 + 0.64X , X̂ =  0.5 + 1.5Y , r = 0.977


Q. 5 Ŷ = 74.4  0.086X , Ŷ = 14.20 , X̂ = 742.8  8.57Y , r = – 0.857
Q. 6 Ŷ = 26.47 + 1.39X , Ŷ = 137.67 , r = 0.908

Q. 7 X̂ = 104.31  0.86Y , r = – 0.916

Q. 8 Ŷ = 75.35  0.116X , Ŷ = 67 approximately, r = – 0.197

Q. 9 Ŷ = 301.75  2.001X , r = – 0.96

Q. 10 Ŷ = 18.90  0.61X

13
Resource Person: M. Rashad Younus
Regression And Correlation

FORMULAE OF CORRELATION COEFFICIENT


Correlation Coeffivcient “r”
n  XY    X   Y 
r=
 n  X 2    X 2  n  Y 2    Y 2 
  
 XY  n X Y
r=
  X  n X 2   Y 2  n Y 2 
2

  X  X  Y  Y 
r=
2 2
 X  X . Y  Y
n  Dx Dy    Dx    Dy 
r=
2
 n  D 2x    D x 2   n  D 2y    D y  
   
Where D x = X  A and D y =Y  A
n  UV    U   V 
r=
 n  U 2    U  2   n  V 2    V 2 
  
XA X  A
Where U= , V=
h k
Sxy
r=
Sx S y
2
1   X   Y   1  X 
Where Sxy =   XY  , S =  X2 
2
x  and
n n  n  n 
2
1  Y 
S =  Y 2 
2
y 
n  n 
Sx Sy
r=b× r=d×
Sy Sx

14
Resource Person: M. Rashad Younus

Вам также может понравиться