Вы находитесь на странице: 1из 2

1 QM3 (EBS2001) 2016-2017: SPEARMAN’S RANK CORRELATION

Spearman’s rank correlation coefficient

1. Introduction to nonparametric methods


With the exception of the chi-square tests discussed in Sharpe’s Chapter 14 (which apply to nominal
variables), all the inferential methods discussed in QM2 were parametric. Such methods are only legitimate if
two requirements hold:
i) the variable(s) of interest are quantitative, and
ii) their population distributions satisfy particular properties: always some form of normality (although this can
sometimes be overcome by a large enough sample size), and often some form of constant variance.
If, in a particular application, the required assumptions don’t hold, then our parametric methods are no
longer legitimate. E.g., the variable of interest might be ordinal rather than quantitative; or, while being
quantitative, it might badly violate the normality or the constant variance assumption. Fortunately, for such
situations, alternative nonparametric tools are available. These methods do not involve any assumptions about
the nature of the underlying population distribution(s) at all: the only requirement is that the variable(s) of
interest are not nominal, i.e. at least ordinal.
All these nonparametric methods are essentially based on ranking the sample values rather than on their
exact sizes. If the variable(s) of interest are ordinal, then the raw measurements are themselves already a kind
of ranks, so the applicability of tools based on ranking the data should be intuitively obvious. For quantitative
data, this ranking obviously implies that some information is lost. Consequently, if the required assumptions are
satisfied, the parametric methods will typically be more powerful (i.e.: yield lower p-values) than the
corresponding nonparametric methods. So if the relevant assumptions seem plausible in a particular application,
we should in principle prefer a parametric method.
In QM3, we will only address one of these nonparametric methods: the Spearman rank correlation, which
provides a nonparametric alternative to the well-known Pearson correlation coefficient.

2. Spearman’s rank correlation coefficient


We’ll illustrate this tool in terms of the Electronics World Case. Electronics
World, a chain of stores selling audio and video equipment, wishes to study the
relationship between a store’s sales volume in July of last year (y, measured in
thousands of dollars), and the number of households in its “Hinterland” (x,
measured in thousands). Data for a sample of n = 8 stores are reported in the
screenshot alongside.
In Sharpe’s chapter 4, we have seen how the “degree of linear relationship” between two quantitative
variables x and y can be measured using Pearson’s correlation coefficient r. Sharpe’s basic definition of r
involves the standardized variables zx and zy (see the yellow formula at the end of p. 102), but the following
mathematically equivalent formula (see the start of p. 103) is more common:
n
å ( xi - x )( y i - y )
i =1
r= (1)
n n
å ( xi - x ) × å ( y i - y )
2 2
i =1 i =1

In the case at hand, we get r = 0.935, indicating a very strong positive linear relation between x and y. By itself,
this calculation does not involve any distributional assumptions; Pearson’s correlation coefficient can be used
to measure the degree of linear relation between any pair of quantitative variables without further qualification.
However, the obvious next question is: what does the sample correlation r imply in terms of the population
correlation r? In the case at hand, where basic economic logic implies that the relation “should” be positive, we
are interested in the following pair of hypotheses:
H0: r = 0 vs. HA: r > 0 (2)
The corresponding test (see Sharpe’s section 15.2, pp. 489-490) is parametric; it requires the assumption that
the two variables x and y have a bivariate normal distribution in the population. If we fear that this assumption
is badly violated, Spearman’s rank correlation rs provides a nonparametric alternative to Pearson’s correlation.
Apart from avoiding the normality assumption, Spearman’s rank correlation has two further advantages:
QM3 (EBS2001) 2016-2017: SPEARMAN’S RANK CORRELATION 2

1) unlike the Pearson correlation, Spearman’s correlation is able to detect monotonic but nonlinear patterns;
2) compared to Pearson’s correlation, Spearman’s correlation is less sensitive to outliers.
The calculation of Spearman’s rank correlation is extremely simple, as the table below shows. Its upper half
repeats the information from the screenshot on the previous page. Next, we rank the eight values of x and y
separately from 1 to 8. To illustrate: the lowest x-value of 99 translates into a rank of 1, the second-lowest x-
value of 101 becomes 2, etcetera (if specific values occur more than once, we break such a tie by assigning the
average of the consecutive ranks which would otherwise be assigned, e.g. by 6.5 instead of 6 and 7).
Spearman’s rank correlation is now simply the Pearson correlation between the ranks of x and y, rather than
between the original values of x and y! In the case at hand, the result is rS = 0.905; quite close to its parametric
counterpart, and again indicating a strong positive relation.

x 99 164 221 179 214 101 206 205


y 93.28 153.51 241.74 201.54 229.78 135.22 195.29 197.82
Ranks x 1 3 8 4 7 2 6 5
Ranks y 1 3 8 6 7 2 4 5

Now consider the following pair of hypotheses in terms of the population rank correlation, rS:
H0: rS = 0 vs. HA: rS > 0 (3)
This test can be conducted in a very straightforward fashion, using the critical values ra that are reported in the
appendices of most statistics books (but not in Sharpe). For n = 8, such tables show that r0.005 = 0.881; since our
rS is larger, we can reject the null against the stated one-sided alternative at the 0.5% significance level.
Alternatively, we conclude that the two-sided p-value must be smaller than 1%. Clearly, there is massive
evidence against the “no correlation” null hypothesis (3).
To implement Spearman’s rank correlation in
SPSS, use the menu path Analyze > Correlate >
Bivariate to open the Bivariate Correlations dialog
box. Move both “households” and “sales” to the
Variables box, just as you would do for the normal
Pearson correlation coefficient. In fact, “Pearson” is
the default option in the “Correlation Coefficients”
field. All we have to do now is to check “Spearman”
(see the screenshot alongside), and press OK. We
will not discuss “Kendall’s tau-b” in this text.
Altogether, we get the output that is printed
below. Two straightforward tables are shown:
- Under the header “Correlations”, the first
“Correlations” table shows us the normal Pearson
correlation coefficient, r = 0.935 as stated,
together with the two-sided p-value for the “no
correlation” null hypothesis (2).
- The header “Nonparametric Correlations” offers another “Correlations” table containing Spearman’s rank
correlation coefficient of rS = 0.905, together with a two-sided p-value of 0.002 for the “no correlation” null
hypothesis (3). This p-value is smaller than 1%, which is consistent with our manual analysis.
Correlations Nonparametric Correlations

Вам также может понравиться