You are on page 1of 48

Correlation and Regression

1
PROF. MYLLAH D. GARCIA

Prof. Myllah D. Garcia

Correlation and Regression

Objectives:
2

After completing this chapter, you should be able to


Draw a scatter plot for a set of ordered pairs.
Compute the correlation coefficient
Test the hypothesis
Compute the equation of the regression line.
Compute the coefficient of determination
Compute the standard error of the estimate
Find the prediction interval
Be familiar with the concept of multiple regression
Prof. Myllah D. Garcia

Correlation and Regression

Correlation
3

A correlation exists between two variables when the

values of one variable are somehow associated with


the values of the other variable.
A Correlation Analysis is a group of techniques to
measure the strength of the association between two
variables.

Prof. Myllah D. Garcia

Correlation and Regression

Scatter Diagram
4

A chart that portrays the relationship between two

variables.
Example: Construct a scatter plot for the data
obtained in a study of age and systolic blood pressure
of six randomly selected subjects.

Prof. Myllah D. Garcia

Subje Age Pressure


ct
(x)
(y)
A
43
128
B
48
120
C
56
135
D
61
143
E
67
141
F
70
152

Correlation and Regression

Scatter Plot
5

160
140
120
100
80T itle
Axis
60
40
20

Age and Systolic Blood Pressure


0
40

45

50

55

60

65

70

75

Axis T itle

Prof. Myllah D. Garcia

Correlation and Regression

Perfect Positive Association


6

The variables are positively related, since larger

values of the dependent variable are associated with


larger values of the independent variable, and vice
versa. The values are on a straight line, and therefore
one can say that there is a perfect positive
association between the variables. Perfect
association rarely occurs when sample data are
collected.

Prof. Myllah D. Garcia

Correlation and Regression

Perfect negative association


7

shows variables that are negatively related, since

larger values of the dependent variable are


associated with smaller values of the independent
variable, and vice versa. The values are on a straight
line, and therefore one can say that there is a perfect
negative association between the variables. Again,
perfect association rarely occurs with sample data.

Prof. Myllah D. Garcia

Correlation and Regression

Very strong positive association


8

shows variables that are positively related. The

values are not on a straight line, but they are


somewhat closely packed together in a linear
manner, and so one can say that there is a very
strong positive association between the variables.

Prof. Myllah D. Garcia

Correlation and Regression

Very strong negative association


9

shows variables that are negatively related. The

values are not on a straight line, but they are


relatively closely packed together in a somewhat
linear pattern, and so one can say that there is a very
strong negative association between the variables.

Prof. Myllah D. Garcia

Correlation and Regression

No association
10

does not show any noticeable pattern. The values are

scattered around, and so one can say that there is


very little association between the variables.

Prof. Myllah D. Garcia

Correlation and Regression

Nonlinear association
11

does not show any noticeable linear pattern. The

scatter plot displays a nonlinear or curvilinear


relationship. We will not study such relationships in
this text but will concentrate only on linear
relationships between two variables.

Prof. Myllah D. Garcia

Correlation and Regression

Correlation Coefficient
12

Correlation Coefficient describes the strength of the

relationship between two variables.


The linear correlation coefficient r measures the
strength of the linear correlation between paired
quantitative x- and y-values in a sample.
The linear correlation coefficient is sometimes
referred to as the Pearson product moment
correlation coefficient in honor of Karl Pearson
(1857-1936), who originally developed it.

Prof. Myllah D. Garcia

Correlation and Regression

Coefficient of Correlation
13

Prof. Myllah D. Garcia

Correlation and Regression

Properties of the Correlation Coefficient


14

The range of the correlation coefficient is from -1 to +l.


If there is a perfect positive linear relationship between the

variables, the value of r will be equal to +l.


If there is a perfect negative linear relationship between
the variables, the value of r will be equal to -1.
If there is a strong positive linear relationship between the
variables, the value of r will be close to +l
If there is a strong negative linear relationship between the
variables, the value of r will be close to -1.
If there is little or no linear relationship between the
variables, the value of r will be close to 0.
Prof. Myllah D. Garcia

Correlation and Regression

Range of the correlation coefficient r


15

Prof. Myllah D. Garcia

Correlation and Regression

Example
16

Compute the correlation coefficient for the following

set of observations for the independent variable x


and the dependent variable y.

Prof. Myllah D. Garcia

Correlation and Regression

Answer
17

The value of r suggests a strong negative relationship

between the independent variable x and the dependent


variable y. That is, the higher the value of x, the lower the
value of y.
Prof. Myllah D. Garcia

Correlation and Regression

Example
18

Find the value of the linear correlation coefficient r

for the paired pizza / subway fare costs.

Prof. Myllah D. Garcia

Correlation and Regression

Answer
19

Prof. Myllah D. Garcia

Correlation and Regression

Example
20

Compute the value of correlation coefficient for the

data obtained in the study of age and blood pressure.


Subje Age Pressure
ct
(x)
(y)
A
43
128
B
48
120
C
56
135
D
61
143
E
67
141
F
70
152
Prof. Myllah D. Garcia

Correlation and Regression

Answer
21

Subje Age Pressure


ct
(x)
(y)

xy

43

128

5,504 1,849 16,384

48

120

5,760 2,304 14,400

56

135

7,560 3,136 18,225

61

143

8,723 3,721 20,449

67

141

9,447 4,489 19,881

F
70 coefficient
152 suggests
10,640 a
4,900
The correlation
strong23,104
positive
Prof. Myllah D. Garcia
Correlation and Regression
relationship
between age and blood pressure.

The Significance of the Correlation Coefficient


22

Step 1:
Step 2:
Step 3:
Step 4:
Step 5:

Prof. Myllah D. Garcia

State the hypothesis


Find the critical values
Compute the test value
Make the decision
Summarize the results.

Correlation and Regression

Population Correlation Coefficient


23

The Population Correlation Coefficient

is the
correlation computed by using all possible pairs of
data values (x, y) taken from a population.

This null hypothesis means that there is no

correlation between x and y in the population.


This alternative hypothesis means that there is a

significant correlation between the variables in the


population.
Prof. Myllah D. Garcia

Correlation and Regression

Hypothesis Test for Correlation


24

Prof. Myllah D. Garcia

Correlation and Regression

Hypothesis Test for Correlation


25

Prof. Myllah D. Garcia

Correlation and Regression

Formula for the t Test for the Correlation


Coefficient
26

with degrees of freedom equal to n-2

Prof. Myllah D. Garcia

Correlation and Regression

Example
27

Test the significance of the correlation coefficient in

age / blood pressure problem. Use


Solution:

Step 1:
Step 2: From t Table, the critical values are
Step 3:

Step 4: Reject the null hypothesis


Step 5: There is a significant relationship between the
variables of age and blood pressure.

Prof. Myllah D. Garcia

Correlation and Regression

Example: Pizza / Subway Fare Costs


28

Use the paired pizza / subway fare data to test the

claim that there is a linear correlation between the


costs of a slice of pizza and the subway fares. Use
significance level.

Prof. Myllah D. Garcia

Correlation and Regression

Answer
29

Step 1:
Step 2: From t Table at n= 6 and
Step 3:

, the critical values are

Step 4: Reject the null hypothesis


Step 5: We conclude that there is sufficient evidence to
support the linear correlation between costs of a
slice pizza and subway fares.

Prof. Myllah D. Garcia

Correlation and Regression

Regression
30

Regression Analysis is technique used to develop the

equation for the straight line and make these


predictions.
Regression Equation an equation that defines the
relationship between two variables.

Prof. Myllah D. Garcia

Correlation and Regression

Line of Best Fit


31

Given a scatter plot, one must be able to draw the

line of best fit. Best fit means that the sum of the
squares of the vertical distances from each point to
the line is at a minimum.
The reason one needs a line of best fit is that the
values of y will be predicted from the values of x;
hence, the closer the points are to the line, the better
the fit and the prediction will be.

Prof. Myllah D. Garcia

Correlation and Regression

Least Square Principles


32

Determining the regression equation by minimizing

the sum of the squares of the vertical distances


between the actual Y values and the predicted values
of Y.

Prof. Myllah D. Garcia

Correlation and Regression

Regression equation
33

Given a collection of paired sample data, the

regression equation
algebraically describes the relationship between the
two variables x and y. The graph of the regression
equation is called the regression line (or line of
the best fit, or least-squares line).

Prof. Myllah D. Garcia

Correlation and Regression

Formulas for the Regression Line


34

Where a is the y intercept and b is the slope of the

line.

Prof. Myllah D. Garcia

Correlation and Regression

Example: Age and Blood pressure


35

Find the equation of the regression line for the data,

and graph the line on the scatter plot of the data.


Subje Age Pressure
ct
(x)
(y)

Prof. Myllah D. Garcia

xy

43

128

5,504 1,849 16,384

48

120

5,760 2,304 14,400

56

135

7,560 3,136 18,225

61

143

8,723 3,721 20,449

67

141

9,447 4,489 Correlation


19,881and Regression

Answer
36

Prof. Myllah D. Garcia

Correlation and Regression

Regression Line
37

Chart Title
160
f(x) = 0.96x + 81.05

140

120

100

Axis T itle

80

60

40

20

0
40

Prof. Myllah D. Garcia

45

50

55

60
Axis T itle

65

70

Correlation and Regression

75

Example
38

Using the equation of the regression line found in

Age/blood pressure example, predict the blood


pressure for a person who is 50 years old.
Solution:
at x = 50
In other words, the predicted systolic blood pressure
for a 50-year-old person is 129.

Prof. Myllah D. Garcia

Correlation and Regression

Assumptions for Valid Predictions in Regression


39

For any specific value of the independent variable


x, the value of the dependent variable y must be
normally distributed about the regression line.
2. The standard deviation of each of the depended
variables must be the same for each value of the
independent variable.
1.

Prof. Myllah D. Garcia

Correlation and Regression

Exercise 1: Testing for a Linear Correlation


40

Construct a scatter plot, find the value of the Linear

Correlation Coefficient r, and find the critical value


of r using
. Determine whether there is
sufficient evidence to support a claim of a linear
correlation between the two variables.
Listed below are systolic blood pressure
measurements (in mm Hg) obtained from the same
woman. Is there sufficient evidence to conclude that
there is a linear correlation between right and left
arm systolic blood pressure measurement?
Prof. Myllah D. Garcia

Correlation and Regression

Exercise
41

Finding the equation of the Regression Line and

Making Predictions
Find the best predicted systolic blood pressure in the
left arm given that the systolic blood pressure in the
right arm is 100 mm Hg.

Prof. Myllah D. Garcia

Correlation and Regression

Exercise 2: Testing for a Linear Correlation


42

Construct a scatter plot, find the value of the Linear

Correlation Coefficient r, and find the critical value of r


using
. Determine whether there is sufficient
evidence to support a claim of a linear correlation between
the two variables.
Listed below are the brain sizes (cm) and Wechsler IQ
score of subjects. Is there sufficient evidence to conclude
that there is a linear correlation between brain size and IQ
score? Does it appear that people with larger brains are
more intelligent?

Prof. Myllah D. Garcia

Correlation and Regression

Exercise
43

Brain Size and Intelligence


Find the best predicted IQ score of someone with a
brain of 1,275 cm.

Prof. Myllah D. Garcia

Correlation and Regression

Prof. Myllah D. Garcia

44

Correlation and Regression

Prof. Myllah D. Garcia

45

Correlation and Regression

Prof. Myllah D. Garcia

46

Correlation and Regression

Prof. Myllah D. Garcia

47

Correlation and Regression

Prof. Myllah D. Garcia

48

Correlation and Regression