Analyzing Linear Relationships Between Variables

LOGO
CORRELATION
ANALYSIS
Introduction
 Correlation a LINEAR association between two random variables
 Correlation analysis show us how to determine both the nature and

strength of relationship between two variables
 When variables are dependent on time correlation is applied
 Correlation lies between +1 to -1

Examples
• Family income and expenditure on luxury items
• Yield of a crop and quantity of fertilizer used
• Sales revenue and expenses incurred on advertising
• Frequency of smoking and lung damage
• Weight and height of individuals

 A zero correlation indicates that there is no
relationship between the variables
 A correlation of –1 indicates a perfect negative

correlation
 A correlation of +1 indicates a perfect positive

correlation
Types of Correlation
 There are three types of correlation
Type
s
Type 1 Type 2 Type 3

Type1
Positive Negative No Perfect
 If two related variables are such that when

one increases (decreases), the other also
increases (decreases).
 If two variables are such that when one
increases (decreases), the other decreases
(increases)
 If both the variables are independent
Type 2
Linear Non – linear
 When plotted on a graph it tends to be a perfect

line
 When plotted on a graph it is not a straight line

Type 3
Simple Multiple
Partial
 Two independent and one dependent variable

 One dependent and more than one independent
variables
 One dependent variable and more than one
independent variable but only one independent
variable is considered and other independent
Methods of Studying Correlation
 Scatter Diagram Method
 Karl Pearson Coefficient Correlation

of Method
 Spearman’s Rank Correlation Method

Correlation: Linear
Relationships
Strong relationship = good linear fit
180
160
160
140
140
120
120
100 100
S ymptom
S ymptom
80 80
Index
Index
60 60
40
40
20
20
0
0 50 100 150 200 250 0
0 50 100 150 250
D r u g A (dose i n m g ) 200
Drug B (dose in mg)
Very good fit Moderate fit

Points clustered closely around a line show a strong correlation.
The line is a good predictor (good fit) with the data. The more
spread out the points, the weaker the correlation, and the less
good the fit. The line is a REGRESSSION line (Y = bX + a)
Coefficient of Correlation
 A measure of the strength of the linear relationship
between two variables that is defined in terms of the
(sample) covariance of the variables divided by
their (sample) standard deviations
 Represented by “r”
 r lies between +1 to -1
 Magnitude and Direction

 -1 < r < +1
 The + and – signs are used for positive linear

correlations and negative linear
correlations, respectively
n XY X
Y
rx n X ( X n Y (2
y ) 2
2
Y)2
Shared variability of X and Y variables on the
top
Individual variability of X and Y variables on the
bottom
Problem
Find the value of the correlation coefficient from the following table:
Subject Age x Glucose Level y

1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
.
Step 1:Make a chart. Use the given data, and add three
more columns: xy, x2, and y2
Glucos
Subject Age x e Level xy x2 y2
y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
• The range of the correlation coefficient is from -1 to 1.
• Our result is 0.5298 or 52.98%, which means the variables have a
moderate positive correlation.
Interpreting Correlation
Coefficient r
 strong correlation: r > .70 or r < –.70
 moderate correlation: r is between .30
&
.70
or r is between –.30 and –.70
 weak correlation: r is between 0 and .
30 or r is between 0 and –.30 .
Spearman’s Rank Correlation
• The Spearman’s Rank correlation, represented by ρ or by rR,
• It’s a nonparametric measure of the strength and direction of

the association between two ranked variables.
• It determines the degree to which a relationship is

monotonic.
• Monotonicity is “less restrictive” than that of a linear

relationship

Computation of
Rank Correlation
Spearman’s rank correlation coefficient
ρ can be calculated when
 Actual ranks given
 Ranks are not given but grades are given but not
repeated
 Ranks are not given and grades are given and
repeated
where n is the number of data points of the two variables and di is the
difference in the ranks of the ith element of each random variable considered.
The Spearman correlation coefficient, ρ, can take values from +1 to -1.
•A ρ of +1 indicates a perfect association of ranks
•A ρ of zero indicates no association between ranks and
•ρ of -1 indicates a perfect negative association of ranks.

Examples
• Question: The following table provides data about the percentage of
students who have free university meals and their CGPA scores.
Calculate the Spearman’s Rank Correlation between the two and
interpret the result.??

% of students % of students
State University having free scoring above
meals 8.5 CGPA
Pune 14.4 54
Chennai 7.2 64
Delhi 27.5 44
Kanpur 33.8 32
Ahmedabad 38.0 37
Indore 15.9 68
Guwahati 4.9 62
STEP 1
• Solution: Let us first assign the random variables to the required data
• X – % of students having free meals
• Y – % of students scoring above 8.5 CGPA

Step 2
State d = Ranks d = Ranks d = (dX – d2

University X X Y Y
d Y)
Pune 3 4 -1 1
Chennai 2 6 -4 16
Delhi 5 3 2 4
Kanpur 6 1 5 25
Ahmedab 7 2 5 25
ad
Indore 4 7 -3 9
Guwahati 1 5 -4 16
Σd2 = 96
Interpretation
• Such a strong negative coefficient of correlation
gives away an important implication – the
universities with the highest percentage of students
consuming free meals tend to have the least
successful results (and vice-versa).

Coefficient of Determination
 Coefficient of determination lies between 0 to 1
 Represented by r2
 The coefficient of determination is a measure of
how well the regression line represents the data
 If the regression line passes exactly through

every point on the scatter plot, it would be able
to explain all of the variation
 The further the line is away from the

points, the less it is able to explain
 r 2, is useful because it gives the proportion of the
variance (fluctuation) of one variable that is
predictable from the other variable
 It is a measure that allows us to determine how

certain one can be in making predictions from a
certain model/graph
 The coefficient of determination is the ratio of the

explained variation to the total variation
 The coefficient of determination is such that 0 < r 2 <

1, and denotes the strength of the linear association
between x and y
 The Coefficient of determination represents the
percent of the data that is the closest to the line of
best fit
 For example, if r = 0.922, then r 2 = 0.850
 Which means that 85% of the total variation in

y can be explained by the linear relationship
between x and y (as described by the regression
equation)
 The other 15% of the total variation in y remains

unexplained

Analyzing Linear Relationships Between Variables

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Analyzing Linear Relationships Between Variables

Загружено:

Авторское право:

Доступные форматы

LOGO

 Correlation analysis show us how to determine both the nature and

 When variables are dependent on time correlation is applied

 Correlation lies between +1 to -1

• Family income and expenditure on luxury items

• Yield of a crop and quantity of fertilizer used

• Sales revenue and expenses incurred on advertising

• Frequency of smoking and lung damage

• Weight and height of individuals

 A correlation of –1 indicates a perfect negative

 A correlation of +1 indicates a perfect positive

Type 1 Type 2 Type 3

Positive Negative No Perfect

 If two related variables are such that when

Linear Non – linear

 When plotted on a graph it tends to be a perfect

 When plotted on a graph it is not a straight line

 Two independent and one dependent variable

 Scatter Diagram Method

 Karl Pearson Coefficient Correlation

 Spearman’s Rank Correlation Method

Very good fit Moderate fit

 Magnitude and Direction

 The + and – signs are used for positive linear

Subject Age x Glucose Level y

• The Spearman’s Rank correlation, represented by ρ or by rR,

• It’s a nonparametric measure of the strength and direction of

• It determines the degree to which a relationship is

• Monotonicity is “less restrictive” than that of a linear

ρ can be calculated when

 Actual ranks given

 Ranks are not given and grades are given and

The Spearman correlation coefficient, ρ, can take values from +1 to -1.

•A ρ of +1 indicates a perfect association of ranks

•A ρ of zero indicates no association between ranks and

•ρ of -1 indicates a perfect negative association of ranks.

• Question: The following table provides data about the percentage of

Calculate the Spearman’s Rank Correlation between the two and

interpret the result.??

• Solution: Let us first assign the random variables to the required data

• X – % of students having free meals

• Y – % of students scoring above 8.5 CGPA

State d = Ranks d = Ranks d = (dX – d2

• Such a strong negative coefficient of correlation

gives away an important implication – the

universities with the highest percentage of students

consuming free meals tend to have the least

successful results (and vice-versa).

 If the regression line passes exactly through

 The further the line is away from the

 It is a measure that allows us to determine how

 The coefficient of determination is the ratio of the

 The coefficient of determination is such that 0 < r 2 <

 For example, if r = 0.922, then r 2 = 0.850

 Which means that 85% of the total variation in

 The other 15% of the total variation in y remains

Вам также может понравиться