Вы находитесь на странице: 1из 24

1

0 2 4 6 8 10 12 14 16 18 20
0
10
20
30
40
50
60
Correlation and
Regression
Elementary Statistics
Larson Farber
Chapter
9
Hours of Training
Accidents

Ch. 9 Larson/Farber 2
Correlation
What type of relationship exists between the
two variables and is the correlation
significant?
x y
Cigarettes smoked per day
Score on SAT
Height
Hours of Training
Explanatory
(Independent)
Variable
Response
(Dependent)
Variable
A relationship between two variables.
Number of Accidents
Shoe Size
Height
Lung Capacity
Grade Point Average
IQ

Ch. 9 Larson/Farber 3
0 2 4 6 8 10 12 14 16 18 20
0
10
20
30
40
50
60
Hours of Training
A
c
c
i
d
e
n
t
s
Accidents
Negative Correlation
as x increases, y decreases
x = hours of training
y = number of accidents
Scatter Plots and Types
of Correlation

Ch. 9 Larson/Farber 4
300 350 400 450 500 550 600 650 700 750 800
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75
4.00
Math SAT
Positive Correlation
as x increases y increases
x = SAT score
y = GPA
GPA
Scatter Plots and Types
of Correlation

Ch. 9 Larson/Farber 5
80 76 72 68 64 60
160
150
140
130
120
110
100
90
80
Height
I
Q
IQ
No linear correlation
x = height
y = IQ
Scatter Plots and Types
of Correlation

Ch. 9 Larson/Farber 6
Ex
x
y
8
78
2
92
5
90
12
58
15
43
9
74
6
81
Absences Grade
Application
0 2 4 6 8 10 12 14 16
40
45
50
55
60
65
70
75
80
85
90
95
x
Final
Grade
Absences

Ch. 9 Larson/Farber 7
Correlation Coefficient
A measure of the strength and direction of a
linear relationship between two variables
( )
2 2
2
2
) ( y y n x x n
y x xy n
r
E E E E
E E E
=
The range of r is from -1 to 1.
If r is close
to 1 there
is a strong
positive
correlation
If r is close
to -1 there is
a strong
negative
correlation
If r is
close to 0
there is no
linear
correlation
-1
0
1

Ch. 9 Larson/Farber 8

6084
8464
8100
3364
1849
5476
6561


624
184
450
696
645
666
486

Computation of r
57 516 3751 579 39898
x y
1 8 78
2 2 92
3 5 90
4 12 58
5 15 43
6 9 74
7 6 81
= - 0.975
( )
2 2
2
2
) ( y y n x x n
y x xy n
r
E E E E
E E E
=
13030 804
3155
= r

64
4
25
144
225
81
36
xy x
2
y
2
( )
2
2
) 516 ( ) 39898 ( 7 57 ) 579 ( 7
) 516 )( 57 ( ) 3751 ( 7


= r

Ch. 9 Larson/Farber 9
Test for the
Significance of r
r is the correlation correlation for the sample. The
correlation coefficient for the population is (rho).
Hypothesis test for the significance of r.
H
a
: r < 0 significant negative correlation (left tail)
H
0
: r > 0 No significant negative correlation
H
a
: r > 0 significant positive correlation (right tail)
H
0
: r s 0 No significant positive correlation
H
a
: r = 0 significant correlation (two tail)
H
0
: r = 0 No significant correlation
The sampling distribution for r is a t-distribution
with n-2 degrees of freedom.
2
1
0
2

=
n
r
r r
t
r
o
Standardized test
statistic

Ch. 9 Larson/Farber 10
Test for Significance of r
In finding the correlation between the number of
times absent and a final grade, you used seven pairs
of data to find r = - 0.975. Test the
significance of this correlation.
Use o = 0.01.
H
a
: r = 0 significant correlation (two tail)
H
0
: r = 0 No significant correlation
2. State the level of significance
1. Write the null and alternative hypothesis
o = 0.01
3. Identify the sampling distribution
A t-distribution with 6 degrees of freedom.

Ch. 9 Larson/Farber 11
t
0
4. Find the
critical value
Critical Values t
0
3.707
-3.707
6. Find the test statistic
811 . 9
2 7
975 . 0 1
975 . 0
2
1
2 2
=

=
n
r
r
t
5. Find the
rejection region
Rejection Regions

Ch. 9 Larson/Farber 12
7. Make your decision
8. Interpret your decision
t
0
-3.707
3.707
t = -9.811 falls in the rejection region. Reject
the null hypothesis.
There is a significant correlation between the
number of times absent and final grades.

13
180
190
200
210
220
230
240
250
260
1.5 2.0 2.5 3.0
Ad $
(x
i
,y
i
)
)

, (
i i
y x
d
i

i i i
y y d

= Called a residual
(x
i
,y
i
) = a data point
)

, (
i i
y x = a point on the line with same x-value
2
d E
is a minimum
revenue

Ch. 9 Larson/Farber 14
From algebra-the equation of a line may be written as
y = mx + b
where m is the slope of the line and b is the y-intercept
The line of regression is:
b mx y + =

The slope m is found by


2 2
) ( x x n
y x xy n
m
E E
E E E
=
The y-intercept is
x m y b =
The Line of Regression
Once you know there is a significant linear
correlation, you can write an equation describing
the relationship between the x and y variables.
This equation is called the line of regression or
least squares line.

Ch. 9 Larson/Farber 15
57 516 579

624
184
450
696
645
666
486

x y
1 8 78
2 2 92
3 5 90
4 12 58
5 15 43
6 9 74
7 6 81

64
4
25
144
225
81
36

6084
8464
8100
3364
1849
5476
6561

39898
xy x
2
y
2
3751
2 2 2
) 57 ( ) 579 ( 7
) 516 )( 57 ( ) 3751 ( 7
) (

=
E E
E E E
=
x x n
y x xy n
m
) 143 . 8 )( 924 . 3 ( 714 . 73 =
= x m y b
= -3.924
Calculate m and b
Write the equation f the line of regression with
x = number of times absent and y = final grade.
The line of regression is:
667 . 105 924 . 3

+ = x y
=105.667

Ch. 9 Larson/Farber 16
0 2 4 6 8 10 12 14 16
40
45
50
55
60
65
70
75
80
85
90
95
x
Absences
Final
Grade
Line of Regression
m = -3.924 and b = 105.667
The line of regression is:
667 . 105 924 . 3

+ = x y
Note that the point (8.143, 73.714) is on the line

Ch. 9 Larson/Farber 17
Predicting Values
The regression line can be used to predict values
of y for values of x within the range of the data.
The regression equation for number of times
absent and final grade is:
667 . 105 924 . 3

+ = x y
Use this equation to predict the expected
grade for a student with
(a) 3 absences
(b) 12 absences
(a)
895 . 93 667 . 105 ) 3 ( 924 . 3

= + = y
579 . 58 667 . 105 ) 12 ( 924 . 3

= + = y
(b)

Ch. 9 Larson/Farber 18
The Coefficient of
Determination
The coefficient of determination, r
2
is the ratio of the
explained variation to the total variation.
variation Total
variation Explained
2
= r
The correlation coefficient of number of times absent
and final grade is r = - 0.975. Then the coefficient of
determination is (-0.975)
2
= 0.9506.
Interpretation: About 95% of the variation in final
grades can be explained by the number of times a
student is absent. The other 5% is unexplained and
can be due to sampling error or other variables such
as intelligence, amount of time studied etc.

Ch. 9 Larson/Farber 19
1 8 78 74.275 13.8756
2 2 92 97.819 33.8608
3 5 90 86.047 15.6262
4 12 58 58.579 0.3352
5 15 43 46.807 14.4932
6 9 74 70.351 13.3152
7 6 81 82.123 1.2611
The Standard Error of
Estimate
5
767 . 92
=
e
s
92.767
= 4.307
y

The Standard Error of Estimate s


e
is the
standard deviation of the observed y
i

values about the predicted value.



2
)

(
2

E
=
n
y y
s
i i
e
y

2
)

( y y
x y

Ch. 9 Larson/Farber 20
Prediction Intervals
Given a specific linear regression equation and x
0
a
specific value of x, a c-prediction interval for y is:


E y y E y + < <

2 2
2
0
) (
) ( 1
1
x x n
x x n
n
s t E
e c
E E

+ + =
where
Use a t-distribution with n-2 degrees of freedom.
The point estimate is and E is the
maximum error of estimate.
y


21
Application
Construct a 90% confidence interval for a final grade
when a student has been absent 6 times.
1. Find the point estimate:
123 . 82
667 . 105 ) 6 ( 924 . 3
667 . 105 924 . 3

=
+ =
+ = x y
The point (6, 82.123) is the point on the
regression line with x-coordinate of 6.

22
Application
Construct a 90% confidence interval for a final grade
when a student has been absent 6 times.
2. Find E
438 . 9 18273 . 1 ) 307 . 4 ( 015 . 2
) 57 ( ) 579 ( 7
) 14 . 8 6 ( 7
7
1
1 ) 307 . 4 ( 015 . 2
) (
) ( 1
1
2
2
2 2
2
0
= =

+ + =
E E

+ + =
x x n
x x n
n
s t E
e c
At the 90% level of confidence, the maximum
error of estimate is 9.438

23
Application
Construct a 90% confidence interval for a final grade
when a student has been absent 6 times.
561 . 91 685 . 72 < < y
When x = 6, the 90% confidence
interval is from 72.685 to 91.586
3. Find the endpoints
685 . 72 438 . 9 123 . 82

= = E y
561 . 91 438 . 9 123 . 82

= + = + E y

Ch. 9 Larson/Farber 24
Minitab Output
Regression Analysis


The regression equation is
y = 106 - 3.92 x

Predictor Coef StDev T P
Constant 105.668 3.655 28.91 0.000
x -3.9241 0.4019 -9.76 0.000

S = 4.307 R-Sq = 95.0% R-Sq(adj) = 94.0%

Вам также может понравиться