Вы находитесь на странице: 1из 11

5801 Correlation

Learning objectives
1.! 2.! 3.! 4.! 5.! Pearson correlation Estimating the population Pearson correlation Misleading correlations Impact of range Two limitations in using correlation to infer causality

Copyright 2013 by Ernest Kwan

5801 Correlation

Relationships
! ! An important goal in statistics is to describe "relationships" between two variables. By describing relationships in a sample, we estimate the relationship in the population. What does it mean to say that "there is a relationship" between two variables (e.g., X and Y), or to say that "X and Y are related"? There are different ways of answering this question. On a strictly quantitative level, we may say X and Y are related

! ! !

Some common ways of saying "X and Y are related": ! "X and Y are associated", "an association between X and Y"; ! "X and Y are correlated", "a correlation between X and Y".

Copyright 2013 by Ernest Kwan

5801 Correlation

Pearson correlation coefficient


! ! To measure the relationship between two continuous variables, there are different indices available (they measure different aspects of such relationships). We will focus on:
n

Pearson correlation coefficient:

r=

# (x
i =1 n

" x )( yi " y )
2 n

=
i

cov(X ,Y ) var (X ) var (Y )

# (x
i =1

" x )

# (y
i =1

" y )

Important properties of r ! 1. ! It can only range from -1 to ! It measures the degree and direction of linear relationship between X and Y. ! r = 0 implies there is no linear relationship. ! The more r differs from 0, the greater the linear relationship. ! r > 0 is called a positive relationship, r = 1 is a "perfect" positive linear relationship. ! r < 0 is called a negative relationship, r = -1 is a "perfect" negative linear relationship.

Copyright 2013 by Ernest Kwan

5801 Correlation

Examples

3 2

20

2
1 0 -1

10

-2
-2 -3 -3

-1

Y
-2 -1 0 X 1 2 3

-10

-2

-1

0 X

-4 -3

-2

-1

0 X

-3 -3

-20 -3

-2

-1

0 X

3 2

4 2 0

20

2 1 0 -1 -2 -2 -3 -3 -4 -3 0

10

-2 -4 -6 -3 -10

Y
-2 -1 0 X 1 2 3

-2

-1

0 X

-2

-1

0 X

-20 -3

-2

-1

0 X

Copyright 2013 by Ernest Kwan

5801 Correlation

As the data look more and more like such a line, r will get closer and closer to -1.

As the data look more and more like such a line, r will get closer and closer to 1.

r = -1.00
3 2 1 0 -1 -2 -3 -3
3 2 1 0 -1 -2 -3 -3

r = 0.00
3 2 1 0 -1 -2 -3 -3

r = 1.00

-2

-1

0 X

-2

-1

0 X

-2

-1

0 X

negative relationship ! as X increases, Y decreases ! as X decreases, Y increases

positive relationship ! as X increases, Y increases ! as X decreases, Y decreases

Copyright 2013 by Ernest Kwan

5801 Correlation

r = 0.33 Which correlation is stronger?


Y

4 2 0

r = -0.80

-2
-2

-4 -6 -3

-4 -3

-2

-1

0 X

-2

-1

0 X

! r measures the degree (strength / magnitude) and direction of linear relationship. ! Degree of the relationship involves the absolute value of r. ! More different |r| is from 0, stronger is the linear relationship.

Copyright 2013 by Ernest Kwan

5801 Correlation

Correlation in the population


! So far we have discussed r as an index of the linear relationship in the sample data; but thinking beyond the sample data, there is always a population. The linear relationship between X and Y in the population is referred to as !, "rho". So if we could observe every person's value on X and Y in the population, then that linear relationship is represented by !. r is the sample estimate of the parameter !.

! ! !

Copyright 2013 by Ernest Kwan

5801 Correlation

Confidence intervals for !


! ! ! ! ! ! ! We previously discussed a CI for . ! is also a parameter, so accordingly, we could construct a CI for ! based on r. The same interpretations and principles are at work. The CI for ! however is more complicated to calculate. This is because the sampling distribution of r is not normal. Because of this complication, a CI for ! may not necessarily be symmetric around r. Let's take a look at some interesting examples of correlations.

Copyright 2013 by Ernest Kwan

5801 Correlation

Examples
Do you agree with the correlation coefficients?

r = 0.00

r = 0.00

Copyright 2013 by Ernest Kwan

5801 Correlation

Examples
Is there a positive linear relationship here? Is there a positive linear relationship here?

This small cluster of data has clearly created the positive relationship.

The overall relationship is positive, but the within-gender relationship is negative!

Copyright 2013 by Ernest Kwan

10

5801 Correlation

! !

Notice the "outliers" here are not outlying at all in terms of Y. These points are outliers in the sense of having undue influence on r. r = 0.40 So what is r for this sample? Is it 0.40 or 0.00?
Y

Copyright 2013 by Ernest Kwan

11

5801 Correlation

Using correlations in practice


! ! ! It is very easy to be misled by a correlation coefficient Just because r = 0.0 doesn't mean there is no relationship, and just because r = 0.9 may not mean there is a strong linear relationship. What can we do to prevent ourselves from being misled?

Copyright 2013 by Ernest Kwan

12

5801 Correlation

Example: Height and weight

100 80

Sample of n = 100 (various occupations)

100 80

Sample of n = 100 (basketball players)

(weight) KG

60 40 20 0 1.00 1.20 1.40 1.60 1.80 2.00

(weight) KG

60 40 20 0 1.90 1.92 1.94 1.96 1.98 2.00

(height) METER

(height) METER

Copyright 2013 by Ernest Kwan

13

5801 Correlation

Example: Studying and grades


Yes, more you study for a test, higher the grade Does this mean 70 hrs of studying will guarantee a perfect score?
100 100

Based on the left data, hard to speculate what happens when you study far beyond 15 hours.

Sample data
75 75

GRADE

50

GRADE
0 3 6 9 12 15

50

25

25 0 5 10 15 20 25

(amount of studying) HOUR

(amount of studying) HOUR

Copyright 2013 by Ernest Kwan

14

5801 Correlation

Problem of restricted range


! ! ! Previous examples illustrate the effect of range restriction on a correlation coefficient. An important issue to think about in the interpretation of your correlations: Do the data in fact contain the relevant range of the variable you want to infer about? For example, if you do want to assess the relationship between weight and height for basketball players, then there is nothing wrong with the data. Before comparing two correlation coefficients (assessing the same relationship), should make sure the two data sets cover the same relevant range of interest.

Copyright 2013 by Ernest Kwan

15

5801 Correlation

What does a relationship mean?


! At the level of measured variables (quantitative hypotheses), a relationship was previously defined as a systematic pattern between values of X and Y.
100

75

GRADE

50

25 0 5 10 15 20 25

(amount of studying) HOUR

! !

But unless we're doing statistics purely for the sake of statistics, a "relationship" has much more meaning to researchers. Let us now move beyond the statistical / quantitative level.

Copyright 2013 by Ernest Kwan

16

5801 Correlation

Beyond just the quantitative variables


! ! To social scientists, relationships have a substantive interpretation beyond just a systematic pattern between values of two variables. A relationship implies causality; to say there is a relationship between X and Y (or symbolically, X " Y) can imply one of three things:

Very often, a relationship is further used for explanations: If X and Y are related, we say "X explains Y", or "Y explains X". Both causality and explanation are actually complicated philosophical concepts, ! What does it mean exactly for A to cause B? ! How does an explanation work? We will not venture too philosophically, but we will consider some limitations in trying to use relationships we observe (e.g., r) for the causations we hope to infer.

Copyright 2013 by Ernest Kwan

17

5801 Correlation

Limitation 1: Quality of measurement


! ! Our desired level of inference occurs at a more abstract level than that of the variables we can observe. Social phenomena involve variables that cannot be observed directly.

$1
#

X1 r (!)

$2
! !

X2

Inference concerning how constructs are related depends on the quality of measurement; just how good are the indicators? Simply treating r (or !) as indicative of # without concern for the quality of measurement is a mistake.

Copyright 2013 by Ernest Kwan

18

5801 Correlation

Example: Air fresheners and Grades


A recent nation-wide survey collected data from university students across the country, with the aim of studying the consumer behavior of Canadian students. To the surprise of the researchers, a strong negative correlation was found between students' grades and the amount of money spent on air fresheners. These two variables are shown in the scatterplot below:
4

Y grade (overall GPA)

2 0 -2 -4 -6 -3

r = 0.66

Money spent on air fresheners X

-2

-1

Copyright 2013 by Ernest Kwan

19

5801 Correlation

Limitation 2: The crud factor


Let's consider what other variables may be related to the two we have observed:
$ spent on air fresheners crowdedness of living conditions noise level of living cond. attentiveness during study grades

! Many social science variables can be related to each other through such a "chain". ! This is attributable to the complexity (or interrelatedness) of social phenomena. ! It is not difficult to find variables that have a strong relationship sometimes, unexpected relationships will be stumbled upon. ! The prevalence of relationships between arbitrarily paired social variables was deemed the "crud factor" (e.g., Meehl, 1997).

Copyright 2013 by Ernest Kwan

20

5801 Correlation

How to refer to correlations


Be very careful then of how you talk about correlations. Acceptable descriptions: ! X is related to Y ! X is associated with Y ! X is correlated to(with) Y In the social sciences, however, we should be very cautious of claims of causality. Misleading descriptions (please avoid): ! X influences Y ! X causes Y ! X affects Y ! X determines Y To be able to make such claims is a highly desirable goal in science.

$1

$2

Copyright 2013 by Ernest Kwan

21

Вам также может понравиться