Академический Документы
Профессиональный Документы
Культура Документы
LEARNING GOAL
Be able to define correlation, recognize positive and
negative correlations on scatter diagrams
Be aware of important cautions concerning the
interpretation of correlations
Become familiar with the concept of a best-fit line for a
correlation, recognize when such lines have predictive
value and when they may not
Copyright 2009 Pearson Education, Inc.
Definition
A correlation exists between two variables when
higher values of one variable consistently go with
higher values of another variable or when higher
values of one variable consistently go with lower
values of another variable.
Slide 7.1- 2
Slide 7.1- 3
Slide 7.1- 4
Scatter Diagrams
Definition
A scatter diagram (or scatterplot) is a graph in
which each point represents the values of two
variables.
Slide 7.1- 5
Slide 7.1- 6
Slide 7.1- 8
Figure 7.2
Slide 7.1- 9
Slide 7.1- 10
Types of Correlation
(Note: detailed descriptions of these graphs appear in the next few slides.)
Slide 7.1- 11
Slide 7.1- 12
Slide 7.1- 13
Slide 7.1- 14
Slide 7.1- 15
Types of Correlation
Positive correlation: Both variables tend to increase (or
decrease) together.
Negative correlation: The two variables tend to change
in opposite directions, with one increasing while the other
decreases.
No correlation: There is no apparent (linear) relationship
between the two variables.
Nonlinear relationship: The two variables are related,
but the relationship results in a scatter diagram that does
not follow a straight-line pattern.
Slide 7.1- 16
Slide 7.1- 17
Slide 7.1- 18
Slide 7.1- 19
Slide 7.1- 20
Slide 7.1- 21
Slide 7.1- 22
Slide 7.1- 23
Slide 7.1- 24
Beware of Outliers
If you calculate
the correlation coefficient
for these data, youll find
that it is a relatively high
r = 0.880, suggesting a
very strong correlation.
Figure 7.10
However, if you cover the data point in the upper right corner of
Figure 7.10, the apparent correlation disappears.
In fact, without this data point, the correlation coefficient is r = 0.
Copyright 2009 Pearson Education, Inc.
Slide 7.2- 25
Slide 7.2- 26
Slide 7.2- 27
Solution: (cont.)
We might therefore suspect that these two women either recorded
their data incorrectly or were not following their usual habits
during the two-week study. If we can confirm this suspicion, then
we would have reason to delete the two data points as invalid.
Figure 7.12 shows that the correlation
is quite strong without those two
outlier points, and suggests that the
number of calories consumed rises by
a little more than 500 calories for
each hour of cycling.
Figure 7.12 The data from Figure
Of course, we should not remove
7.11 without the two outliers.
the outliers without confirming our
suspicion that they were invalid data points, and we should report
our reasons for leaving them out.
Copyright 2009 Pearson Education, Inc.
Slide 7.2- 28
Slide 7.2- 29
Slide 7.2- 30
Figure 7.14 These scatter diagrams show the same data as Figure 7.13,
separated into the two groups identified in Table 7.4.
Slide 7.2- 31
Figure 7.15 Scatter diagram for the car weight and price data.
Slide 7.2- 32
Slide 7.2- 33
Slide 7.2- 34
Definition
The best-fit line (or regression line) on a scatter
diagram is a line that lies closer to the data points
than any other possible line (according to a
standard statistical measure of closeness).
Slide 7.3- 35
Slide 7.3- 36
Slide 7.3- 37
Slide 7.3- 38
Slide 7.3- 39
Slide 7.3- 40
Slide 7.3- 41
Slide 7.3- 42
Slide 7.3- 43
Slide 7.3- 44
Slide 7.3- 45
Slide 7.3- 46
You may recall that the equation of any straight line can be written
in the general form
y = mx + b
where m is the slope of the line and b is the y-intercept of the line.
The formulas for the slope and y-intercept of the best-fit line are as
follows:
sy
slope = m = r s
x
y-intercept = b = y (m x)
x
In the above expressions, r is the correlation coefficient, sx denotes
the standard deviation of the x values (or the values of the first
variable), sy denotes the standard deviation of the y values, x
represents the mean of the values of the variable x, and yy
represents the mean of the values of the variable y.
Copyright 2009 Pearson Education, Inc.
Slide 7.3- 47
Slide 7.3- 48