Академический Документы
Профессиональный Документы
Культура Документы
Consider Table 1, which contains measurements on two variables for ten people: the
number of months the person has owned an exercise machine and the number of hours
the person spent exercising in the past week.
If you display these data pairs as points in a scatter plot (see Figure 1), you can see a
definite trend. The points appear to form a line that slants from the upper left to the
lower right. As you move along that line from left to right, the values on the vertical axis
(hours of exercise) get smaller, while the values on the horizontal axis (months owned)
get larger. Another way to express this is to say that the two variables are inversely
related: The more months the machine was owned, the less the person tended to
exercise.
These two variables are correlated. More than that, they are correlated in a particular
direction—negatively. For an example of a positive correlation, suppose that instead of
displaying “hours of exercise” on the vertical axis, you put the person's score on a test
that measures cardiovascular fitness (see Figure 2). The pattern of these data points
suggests a line that slants from lower left to upper right, which is the opposite of the
direction of slant in the first example. Figure 2 shows that the longer the person has
owned the exercise machine, the better his or her cardiovascular fitness tends to be.
This might be true in spite of the fact that time spent exercising decreases the longer
the machine has been owned because purchasers of exercise machines might be
starting from a point of low fitness, which may improve only gradually.
If two variables are positively correlated, as the value of one increases, so does the
value of the other. If they are negatively (or inversely) correlated, as the value of one
increases, the value of the other decreases.
A third possibility remains: that as the value of one variable increases, the value of the
other neither increases nor decreases. Figure 3 is a scatter plot of months the exercise
machine has been owned (horizontal axis) by the person's height (vertical axis). No line
trend can be seen in the plot. New owners of exercise machines may be short or tall,
and the same is true of people who have had their machines longer. These two
variables appear to be uncorrelated.
You can go even further in expressing the relationship between variables. Compare the
two scatter plots in Figure 4. Both plots show a positive correlation because, as the
values on one axis increase, so do the values on the other. But the data points in Figure
4(b) are more closely packed than the data points in Figure 4(a), which are more spread
out. If a line were drawn through the middle of the trend, the points in Figure 4(b) would
be closer to the line than the points in Figure 4. In addition to direction (positive or
negative), correlations also can have strength, which is a reflection of the closeness of
the data points to a perfect line. Figure 4(b) shows a stronger correlation than Figure
4(a).
Figure 4. (a) Weak and (b) strong correlations.
and
where Σ xy is the sum of the xy cross‐products (each x multiplied by its paired y), n is
the size of the sample (the number of data pairs), Σ x and Σ y are the sums of
the x and y values, s x and s y are the sample standard deviations
of x and y, and and are the means.
formula:
You might want to know how significant an r of –0.899 is. The formula to test the null
where r is the sample correlation coefficient, and n is the size of the sample (the number
of data pairs). The probability of t may be looked up in Table 3 (in "Statistics Tables")
using n − 2 degrees of freedom.
The probability of obtaining a t of –5.807 with 8 df (drop the sign when looking up the
value of t) is lower than the lowest listed probability of 0.0005. If the correlation between
months of exercise‐machine ownership and hours of exercise per week were actually 0,
you would expect an r of –0.899 or lower in fewer than one out of a thousand random
samples.
Bear in mind also that the correlation coefficient measures only straight‐line
relationships. Not all relationships between variables trace a straight line. Figure 5
shows a curvilinear relationship such that values of y increase along with values of x up
to a point, then decrease with higher values of x. The correlation coefficient for this plot
is 0, the same as for the plot in Figure 3. This plot, however, shows a relationship that
Figure 3 does not.
Correlation does not imply causation. The fact that Variable A and Variable B are
correlated does not necessarily mean that A caused B or that B caused A (though either
may be true). If you were to examine a database of demographic information, for
example, you would find that the number of churches in a city is correlated with the
number of violent crimes in the city. The reason is not that church attendance causes
crime, but that these two variables both increase as a function of a third variable:
population.