You are on page 1of 11

Measures of Relationship

Chapter 5 of the textbook introduced you to the two most widely used measures of
relationship: the Pearson product-moment correlation and the Spearman rank-order
correlation. We will be coverin these statistics in this section! as well as other
measures of relationship amon variables.
What is a Relationship?
Correlation coefficients are measures of the deree of relationship between two or
more variables. When we talk about a relationship! we are talkin about the manner in
which the variables tend to vary toether. "or example! if one variable tends to
increase at the same time that another variable increases! we would say there is a
positive relationship between the two variables. #f one variable tends to decrease as
another variable increases! we would say that there is a neative relationship between
the two variables. #t is also possible that the variables miht be unrelated to one
another! so that there is no predictable chane in one variable based on knowin about
chanes in the other variable.
$s a child rows from an infant into a toddler into a youn child! both the child%s
heiht and weiht tend to chane. &hose chanes are not always tihtly locked to one
another! but they do tend to occur toether. So if we took a sample of children from a
few weeks old to ' years old and measured the heiht and weiht of each child! we
would likely see a positive relationship between the two.
$ relationship between two variables does not necessarily mean that one variable
causes the other. When we see a relationship! there are three possible causal
interpretations. #f we label the variables $ and (! $ could cause (! ( could cause $!
or some third variable )we will call it C* could cause both $ and (. With the
relationship between heiht and weiht in children! it is likely that the eneral rowth
of children! which increases both heiht and weiht! accounts for the observed
correlation. #t is very foolish to assume that the presence of a correlation implies a
causal relationship between the two variables. &here is an extended discussion of this
issue in Chapter + of the text.
Scatter Plots and Linear Relationships
$ helpful way to visuali,e a relationship between two variables is to construct a
scatter plot! which you were briefly introduced to in our discussion of raphical
techni-ues. $ scatter plot represents each set of paired scores on a two dimensional
raph! in which the dimensions are defined by the variables. "or example! if we
wanted to create a scatter plot of our sample of .// children for the variables of heiht
and weiht! we would start by drawin the 0 and 1 axes! labelin one heiht and the
other weiht! and markin off the scales so that the rane on these axes is sufficient to
handle the rane of scores in our sample. 2et%s suppose that our first child is 3+ inches
tall and 3. pounds. We would find the point on the weiht axis that represents 3.
pounds and the point on the heiht axis that represents 3+ inches. Where these two
points cross! we would put a dot that represents the combination of heiht and weiht
for that child! as shown in the fiure below.
We then continue the process for all of the other children in our sample! which miht
produce the scatter plot illustrated below.
#t is always a ood idea to produce scatter plots for the correlations that you compute
as part of your research. 4ost will look like the scatter plot above! suestin a linear
relationship. 5thers will show a distribution that is less orani,ed and more scattered!
suestin a weak relationship between the variables. (ut on rare occasions! a scatter
plot will indicate a relationship that is not a simple linear relationship! but rather
shows a complex relationship that chanes at different points in the scatter plot. &he
scatter plot below illustrates a nonlinear relationship! in which Y increases
as X increases! but only up to a point6 after that point! the relationship reverses
direction. 7sin a simple correlation coefficient for such a situation would be a
mistake! because the correlation cannot capture accurately the nature of a nonlinear
Pearson Product-Moment Correlation
&he Pearson product-moment correlation was devised by 8arl Pearson in .9:5!
and it is still the most widely used correlation coefficient. &his history behind
the mathematical development of this index is fascinatin. &hose interested in that
history can click on the link. (ut you need not know that history to understand how
the Pearson correlation works.
&he Pearson product-moment correlation is an index of the deree of linear
relationship between two variables that are both measured on at least an ordinal scale
of measurement. &he index is structured so the a correlation of /.// means that there
is no linear relationship! a correlation of ;..// means that there is a perfect positive
relationship! and a correlation of -..// means that there is a perfect neative
relationship. $s you move from ,ero to either end of this scale! the strenth of the
relationship increases. 1ou can think of the strenth of a linear relationship as how
tihtly the data points in a scatter plot cluster around a straiht line. #n a perfect
relationship! either neative or positive! the points all fall on a sinle straiht line. We
will see examples of that later. &he symbol for the Pearson correlation is a
lowercase r! which is often subscripted with the two variables. "or example! rxy would
stand for the correlation between the variables X and Y.
&he Pearson product-moment correlation was oriinally defined in terms of Z-scores.
#n fact! you can compute the product-moment correlation as the averae cross-
product Z! as show in the first e-uation below. (ut that is an e-uation that is difficult
to use to do computations. &he more commonly used e-uation now is the second
e-uation below. $lthouh this e-uation looks much more complicated and looks like
it would be much more difficult to compute! in fact! this second e-uation is by far the
easier of the two to use if you are doin the computations with nothin but a
1ou can learn how to compute the Pearson product-moment correlation either by hand
or usin SPSS for Windows by clickin on one of the buttons below. 7se the browser%s
return arrow key to return to this pae.
Compute the Pearson product-moment
correlation by hand
Compute the Pearson product-moment
correlation usin SPSS
Spearman Ran&-rder Correlation
&he Spearman ran&-order correlation provides an index of the deree of linear
relationship between two variables that are both measured on at least an ordinal scale
of measurement. #f one of the variables is on an ordinal scale and the other is on an
interval or ratio scale! it is always possible to convert the interval or ratio scale to an
ordinal scale. &hat process is discussed in the section showin you how to compute
this correlation by hand.
&he Spearman correlation has the same rane as the Pearson correlation! and the
numbers mean the same thin. $ ,ero correlation means that there is no relationship!
whereas correlations of ;..// and -..// mean that there are perfect positive and
neative relationships! respectively. &he formula for computin this correlation is
shown below. &raditionally! the lowercase r with a subscript s is used to desinate the
Spearman correlation )i.e.! rs*. &he one term in the formula that is not familiar to you
is d! which is e-ual to the difference in the ranks for the two variables. &his is
explained in more detail in the section that covers the manual computation of the
Spearman rank-order correlation.
Compute the Spearman rank-order
correlation by hand
Compute the Spearman rank-order
correlation usin SPSS
The Phi Coefficient
&he Phi coefficient is an index of the deree of relationship between two variables
that are measured on a nominal scale. (ecause variables measured on a nominal scale
are simply classified by type! rather than measured in the more eneral sense! there is
no such thin as a linear relationship. <evertheless! it is possible to see if there is a
relationship. "or example! suppose you want to study the relationship between
reliious backround and occupations. 1ou have a classification systems for reliion
that includes Catholic! Protestant! 4uslim! 5ther! and $nostic=$theist. 1ou have also
developed a classification for occupations that include 7nskilled 2aborer! Skilled
2aborer! Clerical! 4iddle 4anaer! Small (usiness 5wner! and Professional=7pper
4anaement. 1ou want to see if the distribution of reliious preferences differ by
occupation! which is >ust another way of sayin that there is a relationship between
these two variables.
&he Phi Coefficient is not used nearly as often as the Pearson and Spearman
correlations. &herefore! we will not be devotin space here to the computational
procedures. ?owever! interested students can consult advances statistics textbooks for
the details. you can compute Phi easily as one of the options in
the crosstabs procedure in SPSS for Windows. Click on the button below to see how.
7sin Crosstabs in SPSS for Windows
"d'anced Correlational Techni(ues
Correlational techni-ues are immensely flexible and can be extended dramatically to
solve various kinds of statistical problems. Coverin the details of these advanced
correlational techni-ues is beyond the score of this text and website. ?owever! we
have included brief discussions of several advanced correlational techni-ues on
the Student Resource Website! includin multidimensional scalin! path
analysis! taxonomic search techni-ues! and statistical analysis of neuroimaes.
%onlinear Correlational Procedures
&he vast ma>ority of correlational techni-ues used in psycholoy are linear
correlations. ?owever! there are times when one can expect to find nonlinear
relationships and would like to apply statistical procedures to capture such complex
relationships. &his topic is far too complex to cover here. &he interested student will
want to consult advanced statistical textbooks that speciali,e in reression analyses.
&here are two words of caution that we want to state about usin such nonlinear
correlational procedures. $lthouh it is relatively easy to do the computations usin
modern statistical software! you should not use these procedures unless you actually
understand them and their pitfalls. #t is easy to misuse the techni-ues and to be fooled
into believin thins that are not true from a naive analysis of the output of computer
&he second word of caution is that there should be a stron theoretical reason to
expect a nonlinear relationship if you are oin to use nonlinear correlational
procedures. 4any psychophysioloical processes are by their nature nonlinear! so
usin nonlinear correlations in studyin those processes makes complete sense. (ut
for most psycholoical processes! there is no ood theoretical reasons to expect a
nonlinear relationship.
Linear Re)ression
$s you learned in Chapters 5 and + of the text! the value of correlations is that they
can be used to predict one variable from another variable. &his process is called linear
re)ression or simply re)ression. #t involves fittin mathematically a straiht line to
the to the data from a scatter plot. (elow is a scatter plot from our discussion of
correlations. We have added a reression line to that scatter plot to illustrate how
reression works. We compute the reression line with formulas that we will present
to you shortly. &he reression line is based on our data. 5nce we have the reression
line! we can then use it to predict Y from knowin X. &he scatter plot below shows the
relationship of heiht and weiht in youn children )birth to three years old*. &he line
that runs throuh the data points is called the reression line. #t is determined by an
e-uation! which we will discuss shortly. #f we know the value of X )in this case!
weiht* and we want to predict Y from X! we draw a line straiht up from our value
of X until it intersects the reression line! and then we draw a line that is parallel to
the X-axis over to the Y-axis. We then read from the Y-axis our predicted value
for Y )in this case! heiht*.
#n order to fit a line mathematically! there must be some stated mathematical criteria
for what constitutes a ood fit. #n the case of linear reression! that mathematical
criteria is called least s(uares criteria* which is shorthand for the line bein
positioned so that the sum of the s-uared distances from the score to the predicted
score is as small as it can be. #f you are predictin Y! you will compute a reression
line that minimi,ed the sum of the (Y-Y')
. &raditionally! a predicted score is referred
to by usin the letter of the score and addin a sinle -uotation after it )Y% is
read Y prime or Y predicted*. &o illustrate this concept! we removed most of the clutter
of data points from the above scatter plot and showed the distances that are involved
in the least s-uares criteria. <ote that it is the vertical distance from the point to the
prediction line--that is! the difference from the predicted Y )alon the reression line*
and the actual Y )represented by the data point*. $ common misconception is that you
measure the shortest distance to the line! which will be a line to the point that is at
riht anles to the reression line. #t may not be immediately obvious! but if you were
tryin to predict X from Y! you would be minimi,in the sum of the s-uared
distances X-X%. &hat means that the reression line for predictin Y from X may not be
the same as the reression line for predictin X from Y. #n fact! it is rare that they are
exactly alike.
&he first e-uation below is the basic form of the reression line. #t is simply the
e-uation for a straiht line! which you probably learned in hih school math. &he two
new notational items are byx and ayx which are the slope and the intercept of the
reression line for predictin Y from X. &he slope is how much the Y scores increase
per unit of X score increase. &he slope in the fiure above is approximately .9/. "or
every ./ units movement alon the line on the X axis! the Y axis moves about 9 units.
&he intercept is the point at which the line crosses the Y axis )i.e.! the point at
which X is e-ual to ,ero. &he e-uations for computin the slope and intercept of the
line are listed as the second and third e-uations! respectively. #f you want to
predict X from Y! simple replace all the Xs with Ys and the Ys with Xs in the e-uations
$ careful inspection of these e-uations will reveal a couple of important ideas. "irst! if
you look at the first version of the e-uation for the slope )the one usin the correlation
and the population variances*! you will see that the slope is e-ual to the correlation if
the population variances are e-ual. &hat would be true either for
predictin X from Y or Y from X. What is less clear! but is also true! is that the
reression lines for predictin X or predictin Y will be identical if the population
variances are e-ual. &hat is the 5<21 situation in which the reression lines are the
same. Second! if the correlation is ,ero )i.e.! no relationship between X and Y*! then
the slope will be ,ero )look at the first part of the second e-uation*. #f you are
predictin 1 from X! your reression line will be hori,ontal! and if you are
predictin X from Y! your reression line will be vertical. "urthermore! if you look at
the third e-uation! you will see that the hori,ontal line for predictin Y will be at the
mean of Y and the vertical line for predictin X will be at the mean ofX. &hink about
that for a minute. #f X and Y are uncorrelated and you are tryin to predict Y! the best
prediction that you can make is the mean of Y. #f you have no useful information
about a variable and are asked to predict the score of a iven individual! your best bet
is to predict the mean. &o the extent that the variables are correlated! you can make a
better prediction by usin the information from the correlated variable and the
reression e-uation.