You are on page 1of 48

# Correlation and Regression

1
PROF. MYLLAH D. GARCIA

Objectives:
2

## After completing this chapter, you should be able to

Draw a scatter plot for a set of ordered pairs.
Compute the correlation coefficient
Test the hypothesis
Compute the equation of the regression line.
Compute the coefficient of determination
Compute the standard error of the estimate
Find the prediction interval
Be familiar with the concept of multiple regression
Prof. Myllah D. Garcia

Correlation
3

## values of one variable are somehow associated with

the values of the other variable.
A Correlation Analysis is a group of techniques to
measure the strength of the association between two
variables.

Scatter Diagram
4

## A chart that portrays the relationship between two

variables.
Example: Construct a scatter plot for the data
obtained in a study of age and systolic blood pressure
of six randomly selected subjects.

ct
(x)
(y)
A
43
128
B
48
120
C
56
135
D
61
143
E
67
141
F
70
152

Scatter Plot
5

160
140
120
100
80T itle
Axis
60
40
20

0
40

45

50

55

60

65

70

75

Axis T itle

6

## values of the dependent variable are associated with

larger values of the independent variable, and vice
versa. The values are on a straight line, and therefore
one can say that there is a perfect positive
association between the variables. Perfect
association rarely occurs when sample data are
collected.

7

## larger values of the dependent variable are

associated with smaller values of the independent
variable, and vice versa. The values are on a straight
line, and therefore one can say that there is a perfect
negative association between the variables. Again,
perfect association rarely occurs with sample data.

8

## values are not on a straight line, but they are

somewhat closely packed together in a linear
manner, and so one can say that there is a very
strong positive association between the variables.

9

## values are not on a straight line, but they are

relatively closely packed together in a somewhat
linear pattern, and so one can say that there is a very
strong negative association between the variables.

No association
10

## scattered around, and so one can say that there is

very little association between the variables.

## Correlation and Regression

Nonlinear association
11

## scatter plot displays a nonlinear or curvilinear

relationship. We will not study such relationships in
this text but will concentrate only on linear
relationships between two variables.

## Correlation and Regression

Correlation Coefficient
12

## relationship between two variables.

The linear correlation coefficient r measures the
strength of the linear correlation between paired
quantitative x- and y-values in a sample.
The linear correlation coefficient is sometimes
referred to as the Pearson product moment
correlation coefficient in honor of Karl Pearson
(1857-1936), who originally developed it.

## Correlation and Regression

Coefficient of Correlation
13

14

## The range of the correlation coefficient is from -1 to +l.

If there is a perfect positive linear relationship between the

## variables, the value of r will be equal to +l.

If there is a perfect negative linear relationship between
the variables, the value of r will be equal to -1.
If there is a strong positive linear relationship between the
variables, the value of r will be close to +l
If there is a strong negative linear relationship between the
variables, the value of r will be close to -1.
If there is little or no linear relationship between the
variables, the value of r will be close to 0.
Prof. Myllah D. Garcia

15

Example
16

## set of observations for the independent variable x

and the dependent variable y.

Answer
17

## between the independent variable x and the dependent

variable y. That is, the higher the value of x, the lower the
value of y.
Prof. Myllah D. Garcia

Example
18

Answer
19

Example
20

## data obtained in the study of age and blood pressure.

Subje Age Pressure
ct
(x)
(y)
A
43
128
B
48
120
C
56
135
D
61
143
E
67
141
F
70
152
Prof. Myllah D. Garcia

Answer
21

ct
(x)
(y)

xy

43

128

48

120

56

135

61

143

67

141

## 9,447 4,489 19,881

F
70 coefficient
152 suggests
10,640 a
4,900
The correlation
strong23,104
positive
Prof. Myllah D. Garcia
Correlation and Regression
relationship
between age and blood pressure.

22

Step 1:
Step 2:
Step 3:
Step 4:
Step 5:

## State the hypothesis

Find the critical values
Compute the test value
Make the decision
Summarize the results.

23

## The Population Correlation Coefficient

is the
correlation computed by using all possible pairs of
data values (x, y) taken from a population.

## correlation between x and y in the population.

This alternative hypothesis means that there is a

## significant correlation between the variables in the

population.
Prof. Myllah D. Garcia

24

25

Coefficient
26

Example
27

## age / blood pressure problem. Use

Solution:

Step 1:
Step 2: From t Table, the critical values are
Step 3:

## Step 4: Reject the null hypothesis

Step 5: There is a significant relationship between the
variables of age and blood pressure.

28

## claim that there is a linear correlation between the

costs of a slice of pizza and the subway fares. Use
significance level.

## Correlation and Regression

Answer
29

Step 1:
Step 2: From t Table at n= 6 and
Step 3:

## Step 4: Reject the null hypothesis

Step 5: We conclude that there is sufficient evidence to
support the linear correlation between costs of a
slice pizza and subway fares.

Regression
30

## equation for the straight line and make these

predictions.
Regression Equation an equation that defines the
relationship between two variables.

31

## Given a scatter plot, one must be able to draw the

line of best fit. Best fit means that the sum of the
squares of the vertical distances from each point to
the line is at a minimum.
The reason one needs a line of best fit is that the
values of y will be predicted from the values of x;
hence, the closer the points are to the line, the better
the fit and the prediction will be.

32

## the sum of the squares of the vertical distances

between the actual Y values and the predicted values
of Y.

## Correlation and Regression

Regression equation
33

## Given a collection of paired sample data, the

regression equation
algebraically describes the relationship between the
two variables x and y. The graph of the regression
equation is called the regression line (or line of
the best fit, or least-squares line).

34

line.

35

## and graph the line on the scatter plot of the data.

Subje Age Pressure
ct
(x)
(y)

xy

43

128

48

120

56

135

61

143

67

141

## 9,447 4,489 Correlation

19,881and Regression

Answer
36

## Correlation and Regression

Regression Line
37

Chart Title
160
f(x) = 0.96x + 81.05

140

120

100

Axis T itle

80

60

40

20

0
40

45

50

55

60
Axis T itle

65

70

75

Example
38

## Age/blood pressure example, predict the blood

pressure for a person who is 50 years old.
Solution:
at x = 50
In other words, the predicted systolic blood pressure
for a 50-year-old person is 129.

39

## For any specific value of the independent variable

x, the value of the dependent variable y must be
normally distributed about the regression line.
2. The standard deviation of each of the depended
variables must be the same for each value of the
independent variable.
1.

40

## Correlation Coefficient r, and find the critical value

of r using
. Determine whether there is
sufficient evidence to support a claim of a linear
correlation between the two variables.
Listed below are systolic blood pressure
measurements (in mm Hg) obtained from the same
woman. Is there sufficient evidence to conclude that
there is a linear correlation between right and left
arm systolic blood pressure measurement?
Prof. Myllah D. Garcia

Exercise
41

## Finding the equation of the Regression Line and

Making Predictions
Find the best predicted systolic blood pressure in the
left arm given that the systolic blood pressure in the
right arm is 100 mm Hg.

42

## Correlation Coefficient r, and find the critical value of r

using
. Determine whether there is sufficient
evidence to support a claim of a linear correlation between
the two variables.
Listed below are the brain sizes (cm) and Wechsler IQ
score of subjects. Is there sufficient evidence to conclude
that there is a linear correlation between brain size and IQ
score? Does it appear that people with larger brains are
more intelligent?

Exercise
43

## Brain Size and Intelligence

Find the best predicted IQ score of someone with a
brain of 1,275 cm.

44

45

46

47

48