Вы находитесь на странице: 1из 51

Correlation Analysis

Defination
The extent to which two or more things are
related to one another -- co-related
OR
If the change in one variable affects a change in
other variable, the variables are s.t.b. correlated
Correlation is a bivariate measure of association
(strength) of linear relationship between two
variables.
Thus, correlation analysis is a statistical tool
which is used to describe the degree to which one
variable is linearly related to another.
Univariate distribution - distribution
involving only one variable.
Some situations occur in which our focus
is simultaneously on two or more
variables - bivariate distribution
The movement in one variable is
accompanied by movements in other
variable.
EXAMPLE : husbands and wifes age
move together, price and demand of
commodities.








Types of Correlation :

1 Positive and negative (Based on the
direction of change)
2. Simple, Partial and multiple (Based
on number of variables)
3. Linear and non-linear (Based on
change in proportion)
Positive and negative correlation
If the two variables deviate in the same direction i.e.,
if the increase (or decrease) in one variable results in
a corresponding increase (or decrease) in the other,
correlation is s.t.b direct or positive
EXAMPLE : income and expenditure
height and weight of group of persons
If the two variables deviate in the opposite direction
i.e., if the increase (or decrease) in one variable
results in a corresponding decrease (or increase) in
the other, correlation is s.t.b diverse or negative.
EXAMPLE : price and demand of the commodity
volume and pressure of a perfect gas

Correlation is said to be perfect if the
deviation in one variable is followed by a
corresponding and proportional deviation in
the other.
Simple, Partial and Multiple
correlation
Simple correlation : When only two variables are
involved and relationship is studied between those
two variables
Partial correlation : More than two variables are
considered but relationship between two variables is
considered keeping other variables as constant.
EXAMPLE: Production of wheat depends on many
factors like rainfall, quality of seed, manure etc. If
the relation between production of wheat/hectare
and quality of seed is studied keeping rainfall and
manure constant, then correlation is said to be
partial.
Multiple correlation : Here, the relationship among
two or more variables is studied simultaneously.
In above example, if we study the relationship
between production and other factors
simultaneously, the relationship is called multiple
correlation.
Linear and non linear correlation
Linear correlation:
If two variables are plotted ,having
straight line.(Ratio of changes
between two variables is same)
Non-Linear correlation :
If two variables are plotted ,having
non linear (curve) .(Ratio of changes
between two variables is not same)


Methods of studying correlation
Graphical method
Mathematical method

Graphical method
( scatter diagram)
Scatter diagram is mainly used to represent
bivariate data.
These diagrams indicate the existence of a
relationship, as well as the strength of that
relationship.
It is an easy way to highlight any relationship
that may exist and its type, whether direct or
inverse.

Steps of drawing the Scatter diagram
Collect data on two variables, one
independent and the other dependent.
Draw a diagram with the cause or
independent variable labeled on the
horizontal (X) axis and the effect or
dependent variable labeled on vertical (Y)
axis.
11-7
11-2 Scatter Plots - Example
Construct a scatter plot for the data obtained in
a study of age and blood pressure of six
randomly selected subjects.
The data is given on the next slide.
11-8
11-2 Scatter Plots - Example
Subject Age, x Pressure, y
A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
11-9
11-2 Scatter Plots - Example
70 60 50 40
150
140
130
120
Age
P
r
e
s
s
u
r
e
70 60 50 40
150
140
130
120
Age
P
r
e
s
s
u
r
e
Positive Relationship
11-10
11-2 Scatter Plots - Other Examples
15 10 5
90
80
70
60
50
40
Number of absences
F
i
n
a
l

g
r
a
d
e
15 10 5
90
80
70
60
50
40
Number of absences
F
i
n
a
l

g
r
a
d
e
Negative Relationship
11-11
11-2 Scatter Plots - Other Examples
7 0 6 0 5 0 4 0 3 0 2 0 1 0 0
1 0
5
0
X
Y

7 0 6 0 5 0 4 0 3 0 2 0 1 0 0
1 0
5
0
x
y

No Relationship
Merits : Scatter diagram
Simple to calculate & understand.
Attractive method of finding correlation
Rough idea at glance for positive or
negative correlation
Not influenced by extreme items
First step in finding the correlation


Demerits :Scatter diagram
Not exact degree can be calculated.




Mathematical method
(Correlation Coefficient)
Provides the numerical description of strength
or degree to which two variables are linearly
related.
Karl Pearson's coefficient of correlation
Spearmans Rank coefficient of
correlation

Sample correlation coefficient, r.
Population correlation coefficient, .

Karl Pearson's coefficient of
correlation
It is a quantity that gives the amount of linear
relationship between the variables.
Mathematically,

Y X
Y X
r
o o
) , cov(
=
X
o
is standard deviations of X
Y
o
is standard deviations of Y

Cov ( x, y) : Combined variance of x and y
Formula for the Correlation Coefficient r
( ) ( ) ( )
( ) ( )
| |
( ) ( )
| |
r
n xy x y
n x x n y y
=








2
2
2
2
Where n is the number of data pairs
Range of Values for the Correlation
Coefficient
1 +1 0
Strong negative
relationship
Strong positive
relationship
No linear
relationship
Interpretation
Interpretation Value of r
High positive correlation 0.75 r < 1
Moderate positive
correlation

0.50 r < 0.75
Low positive correlation r < 0.5
Interpretation
Interpretation Value of r
High negative correlation - 0.75 r > - 1
Moderate negative
correlation
- 0.50 r > - 0.75
Low negative correlation

r > - 0.50
Karl Pearson correlation technique works best
with linear relationship
It does not work well with curvilinear
relationships (in which the relationship does
not follow a straight line.)
Example : age and health care.
They are related, but the relationship doesnt
follow a straight line say, young children and
older people both tend to use much more
health care than teenagers or young adults.
( Multiple regression is used to examine
curvilinear relationships)
Properties of correlation coefficient
Correlation coefficient measures the strength or degree of linear
relationship
The value of r lies between +1 and -1
r is independent of both change in origin( means subtracting some
constant from the given value of X and Y) and change in
scale(means dividing or multiplying every value of X and Y by
some constant. i.e .
if u = X-A / I and v = Y-B / J then

Correlation coefficient is symmetric.


Relation between x and y is same as y and x.
r r r
yx xy
= =
r(X,Y)= r(U,V)

11-15
11-3 Correlation Coefficient -
Example (Verify)
Compute the correlation coefficient for the
age and blood pressure data.
. 897 . 0

. 443 112 , 399 20
634 47 = , 819 = , 345
2 2
=
= =
=


r
gives r f or f ormula the in ng Substituti
y x
xy y x
Example 1
The following table
shows 10 years
data of
advertisement
expenditure and
sales of a
company. Calculate
the correlation
coefficient between
these two variables
for this company?

S.No. Ad.Expen. Sale
1 50 700
2 50 650
3 50 600
4 40 500
5 30 450
6 20 400
7 20 300
8 15 250
9 10 210
10 5 200
Example 2
Calculate the
correlation coefficient
from the following
data :

Export of
raw cotton
(crores)
Export of
manufactured
goods (crores)
42 56
44 49
58 53
55 58
89 65
98 76
66 58
Correlation of Bivariate grouped
data
Here, frequencies are involved.

r =
Example 3
Find the coefficient of correlation between the age and the sum
assured from the following table:
Age group Sum assured ( in Rs.)
10000 20000 30000 40000 50000
20-30 4 6 3 7 1
30-40 2 8 15 7 1
40-50 3 9 12 6 2
50-60 8 4 2 - -
Example 4
Find the coefficient of correlation from the following bivariate
frequency distribution :
Sales
revenue
(Rs. lakhs)
Advertising expenditure (Rs. 000)
5-10 10-15 15-20 20-25
75-125 4 1 - -
125-175 7 6 2 1
175-225 1 3 4 2
225-275 1 1 3 4
Coefficient of determination
Coefficient of determination represents
the percentage of variation in the
dependent variable explained by the
independent variable.
Coefficient of determination is
denoted by r
r = explained variance total variance
r will lie in 0 to 1


Interpretation
If r = 0.7, then r = 0.49
This implies that 49% of the variation in
the dependent variable can be attributed
to the independent variable.
In other words, 49% of the variability has
been explained and the remaining 51% is
unaccounted for.
Value of r close to one indicates that all
the variability in the dependent variable is
well accounted for by the independent
variable.

In example 1, r = 0.976
Coefficient of determination
r = 0.95
This means 95% of sales variation is
explained by advertising expenditure.

Coefficient of no-determination
Coefficient of non-determination is
ratio of unexplained variation to
total variation.
Coefficient of non-determination is
denoted by K
K = 1- r
K =unexplained variance total variance
Standard error
If r is the correlation coefficient
between the two variables X and Y,
for a sample of n observations, the
standard error of the correlation
coefficient, r is given by:
SE (r) = (1-r)n

Probable error
Def: The probable error of the coefficient
of correlation is an amount, which if
added to or subtracted from mean
correlation coefficient, produces amounts
within which the chances are even that a
coefficient of correlation from a series
selected at random will fall.




PE (r) =0.6745(1-r)N
Where :
r = coefficient of correlation
N = number of pairs
Limits of population correlation coefficient are:
= r PE(r)
Where : population correlation

Functions : Probable error
r > 6 PE( r) , r is significant
r < PE( r) , r is insignificant


Example 5
A student calculates the value of r
as 0.7 when the value of N is 5 and
concludes that r is highly
significant. Is he correct? Also
calculate the interval estimates for
the population coefficient
correlation.

Rank Correlation
It is another measure of correlation
It is used when the distribution of the data is such
that it is not possible to quantify it but only rank it in
a certain order on the basis of a certain attribute.
Helps to correlate two sets of qualitative
observations which are subject to ranking such as
qualitative productivity ratings (poor, fair, good,
very good etc.) for a group of workers by two
independent observers.

Spearmans rank correlation
coefficient is a distribution-free
measure (which does not make any
assumptions about the parameters
of the population), since no strict
assumptions are made about the
form of the population from which
sample observations are drawn.
Example
A large manufacturing firm wants to
determine whether relationship exists
between the number of work-hrs an
employee misses per year and the employees
annual wages (in thousands of rupees). A
sample of 15 employees produced the data
shown in the following table :
Example
Calculate
Spearmans rank
correlation
coefficient as a
measure of the
strength of the
relationship
between work-hrs
and annual wages.
Explye hrs wages
Emplye Hrs wages
1 49 15.8
2 36 17.5
3 127 11.3
4 91 13.2
5 72 13
6 34 14.5
7 155 11.8
8 11 20.2
9 191 10.8
10 6 18.8
11 63 13.8
12 79 12.7
13 43 15.1
14 57 24.2
15 82 13.9
Example
Ten competitors in a beauty contest were ranked by three judges
in the following order:










Use the method of rank correlation to determine which pair of
judges has the nearest approach to common tastes in beauty.(

First
judge
1 6 5 10 3 2 4 9 7 8
Second
judge
3 5 8 4 7 10 2 1 6 9
Third
judge
6 4 9 8 1 2 3 10 5 7
Tied / Repeated Ranks
When two or more individuals get the same rank with
respect to either of the two characteristics being
studied, common rank is assigned to the
observations, that are repeated.
This common rank is the average of the ranks which
these observations would have assumed if they had
been different from one another.
EXAMPLE : If two obs. Were ranked equal at the
fourth place, then both these obs. Would be ranked
as 4+5/2 = 4.5
the next obs. Would be ranked 6 and so on.
Then rank correlation coefficient is
given by :
Rs = 1- 6 { d + 1 m1(m1 1)

+ 1 m2(m2 1)+..}
12


n(n 1 )
Where, mi = no. of times the ith
repeated item is repeated. i = 1,2..

2
12
2
2
2
Example
The scores of 10
students on the
mid term
examination and
the final
examination are
given. Compute
the rank
correlation
coefficient.

Student Midterm
score
Final
exam
score
Neha 82 94
Chani 81 92
Aditi 80 85
Sumit 68 75
Aditya 70 73
Mohit 92 95
Reha 76 69
Rahul 80 86
Sakshi 86 90
charu 62 69
Example
An examination of 8
applicants for a clerical
post was taken by a
firm. From the marks
obtained by the
applicants in the
accounts and statistics
paper, compute the
rank correlation
coefficient.

Applican
ts
Marks in
accounts
Marks in
statistic
s
A 15 40
B 20 30
C 28 50
D 12 30
E 40 20
F 60 10
G 20 30
H 80 60

Вам также может понравиться