Вы находитесь на странице: 1из 52

SIMPLE LINEAR REGRESSION

AND CORRELATION

The Regression Model


y| x x
y x e
e y x
y y|x

Example 8.3.1.
A team of professional mental healt workers in a long stay
psyhiatric hospital wished to measure the level of response of
withdrawn patients to a program of remotivation therapy.
A sttandardized test available for this purpose, but it was
expensive and time consuming to administer.

To overcome this obstacle, the team developed a test that was


much easier to administer.
To test the usefulness of the new instrument for measuring the
level of patient response, the team decided to examine the
standardized test.
The objective was to use the new test if it could be shown that it
was a good predictor of a patients score on the standardized
test.

Example 8.3.1.
The team was interested only in carrying out the analysis for
standardized scores between 50 and 100 since a score below 50
did not represent a significant level of response and scores
above 100 were seldom made by the type of patient under
concideration.
The team also felt that the use of scores in increments of 5
would give a good coverage of the range of scores between 50
and 100.
Accordingly, 11 patients who had made scores on the new test of
50, 55, 60, 65, 70, 75, 80, 85, 90, 95, and 100 respectively, were
selected to take the standardized test.
The independent and variables, respectively, are scores made
on the new test and scores made on the standardized test.

Example 8.3.1.
Construct a regression model and analyze the data

PATIENT
NUMBER
1
2
3
4
5
6
7
8
9
10
11

SCORE ON
NEW TEST X
50
55
60
65
70
75
80
85
90
95
100

SCORES ON
STANDARDIZED TEST Y
61
61
59
71
80
76
90
106
98
100
114

The Sample Regression Equation


PATIENT NUMBER
1
2
3
4
5
6
7
8
9
10
11

SCORE ON NEW
TEST
50
55
60
65
70
75
80
85
90
95
100

SCORES ON
STANDARDIZED TEST
61
61
59
71
80
76
90
106
98
100
114

y a bx

y na b x
x y a x b x
i

2
i

PATIENT
NUMBER
1
2
3
4
5
6
7
8
9
10
11
total

SCORE ON
NEW TEST

SCORES ON
STANDARDIZED TEST

x2

y2

xy

50
55
60
65
70
75
80
85
90
95
100
825

61
61
59
71
80
76
90
106
98
100
114
916

2500
3025
3600
4225
4900
5625
6400
7225
8100
9025
10000
64625

3721
3721
3481
5041
6400
5776
8100
11236
9604
10000
12996
80076

3050
3355
3540
4615
5600
5700
7200
9010
8820
9500
11400
71790

916 11a 825b


71790 825a 64625b

a 0.9973

b 1.1236

y 0.9973 1.1236x

X 50
y 0.9973 1.1236 50 55 .1827
X 100
y 0.9973 1.1236 100 111 .3627

n xy x y
n x x
2

y b x

1171790 825 916


b
1.1236
2
1164625 825

916 1.1236825
a
0.9973
11

Evaluating The Regression


Equation

yi y y y yi y
total
deviation

explained
deviation

unexplained
deviation

y y y y y y
2

total
sum of
squares

explained
sum of
squares

unexplained
sum of
squares

SST SSR SSE


3798.1822 3471.8116 326.1455
3798.1822 3797.9571

y y y n

y y

2
i

x x

916
SST 612 612 ... 114 2

11

x x n
2
i

3798 .1818

825
2
2
2
2
SSR 1.1236 50 55 ... 100

11

3471.8116

SSE SST SSR


3798.1818 3471.8116
326.3702

PATIENT
NUMBER
1
2
3
4
5
6
7
8
9
10
11
total

SCORES ON
STANDARDIZED
TEST yi
61
61
59
71
80
76
90
106
98
100
114
916

y 83.2727

y
0.9973
1.1236 x

55.1827
60.8007
66.4187
72.0367
77.6547
83.2727
88.8907
94.5087
100.1267
105.7447
111.3627

yi y
-22.2727
-22.2727
-24.2727
-12.2727
-3.2727
-7.2727
6.7273
22.7273
14.7273
16.7273
30.7273
SST

yi y 2
496.07317
496.07317
589.16397
150.61917
10.710565
52.892165
45.256565
516.53017
216.89337
279.80257
944.16697
3798.1818

yi y yi y 2
5.8173
0.1993
-7.4187
-1.0367
2.3453
-7.2727
1.1093
11.4913
-2.1267
-5.7447
2.6373
SSE

33.841
0.0397
55.037
1.0747
5.5004
52.892
1.2305
132.05
4.5229
33.002
6.9554
326.15

y y

y y 2

-28.09
-22.472
-16.854
-11.236
-5.618
0
5.618
11.236
16.854
22.472
28.09
SSR

789.0481
504.9908
284.0573
126.2477
31.56192
0
31.56192
126.2477
284.0573
504.9908
789.0481
3471.812

Sample coefficient of determination

y y

y y

2
i

xi n

y n

2
i

3471.8116
r
0.91
3798.1818
2

SSR SST

Testing Ho: =0 with the F statistic

Ho : 0
HA : 0

0.05
SOURCE OF
VARIATION
Linear
Regression
Residual
Total

SOURCE OF
VARIATION
Linear
Regression
Residual
Total

SS

d.f.

MS

V.R

SSR

SSR/1

MSR/MSE

SSE
SST

n-2
n-1

SSE/(n-2)

SS

d.f.

MS

V.R

3471.8116

3471.8116

95.74

326.3702
3798.1818

9
10

36.2634

Population coefficient of determination


2

yi y n 2

2
~
r 1
2
yi y n 1

SOURCE OF
VARIATION
Linear
Regression
Residual
Total

SS

d.f.

MS

V.R

SSR

SSR/1

MSR/MSE

SSE
SST

n-2
n-1

SSE/(n-2)

SOURCE OF
VARIATION
Linear
Regression
Residual
Total

SS

d.f.

MS

V.R

3471.8116

3471.8116

95.74

326.3702
3798.1818

9
10

36.2634

326 .3702 9
36 .2634
2
~
r 1
1
0.9045
3798 .1818 10
379 .81818

326.3702
r 1
1 0.08593 0.91407
3798.1818
2

Testing Ho: =0 with the t- statistic


The sampling distribution of a and b are each normaly distributed with
means and variances as follows

a2

y2| x xi2

n xi x

b2

y2| x

x x

If 2yIx is known than we use z-statistic otherwise the t-statistic

Ho : 0
HA : 0

0.05
z

2
y| x

b 0
t
sb

b 0

yi y

n2

2
y| x

n 1 2

s y b 2 s x2
n2

If 2yIx is known than we use z-statistic otherwise the t-statistic

2
y| x

yi y

n2

s 379.8182
2
y

2
y| x

b 0
t
sb

b 0

2
y| x

n 1 2

s y b 2 s x2
n2

s x2 275 .0000

10
2

379.8182 1.1236 275.0000 36.2634


9

sb2

s y2| x

x x

sb2

s
2
b

s y2| x

x x

2
i

36.2634

50

825

55 ... 100
2

1.1236 0
t
9.85
0.013

11

0.013

s
2
b

36.2634

50

825

55 ... 100
2

0.013

11

1.1236 0
t
9.85
0.013
The table t-value at for n-2 degrees of freedom and
=0.05 is 2.2622
Since the calculated value is much larger than the table value
we reject the null hypothesis

So we say that the slope of the true regression line is not


zero

Confidence interval of
Estimator (reliability factor)(standard error of estimate)

If 2yIx is known than we use b otherwise the sb

y2| x

xi x

b t1 2

sb

s y2| x

x
i

s y2| x

x x

Confidence interval of

y2| x

x x

sb

b t1 2

s y2| x

x x

s y2| x

x x

1.1236 2.2622 0.013

1.1236 0.2579
0.8657,1.3815

Using The Regression Equation


Predicting Y from a given X
Let say we have a patient who makes a score of
70 on the new test and we want to predict his
score on the standardized test

x 70
y 0.9973 1.1236 70
78

Predicting Y from a given X


If 2yIx is unknown than we obtain 100(1-)
percent confidence interval for Y is as:

xp x
1
1
n xi x 2
2

y t1 2 s y| x

Here xp is the particular value of the X at which we


wish to obtain a prediction interval for Y

Predicting Y from a given X

xp x
1
1
n xi x 2
2

y t1 2 s y| x

1 70 75
78 2.2622 36.2634 1
11
2750

78 2.2622 6.02 1.0488

78 14
64, 92

Estimating mean of Y for a given X


If 2yIx is unknown than we obtain 100(1-) percent
confidence interval for Y is as:

y 0.9973 1.1236 70
78

If 2yIx is unknown than we obtain 100(1-) percent


confidence interval for yIx as:

xp x
1

2
n xi x
2

y t1 2 s y| x

y 0.9973 1.1236 70
78

xp x
1

n xi x 2
2

y t1 2 s y| x

1 70 75
78 2.2622 36.2634

11
2750

78 4
74, 82

Table 8.5.1. 95 % confidence limits for yIx for each


value of X

x
50
55
60
65
70
75
80
85
90
95
100

y
55.1827
60.8007
66.4187
72.0367
77.6547
83.2727
88.8907
94.5087
100.127
105.745
111.363

lower limit
47.49842
54.17769
60.75701
67.17674
73.3468
79.16528
84.5828
89.64874
94.46501
99.12169
103.6784

upper limit
62.86698
67.42371
72.08039
76.89666
81.9626
87.38012
93.1986
99.36866
105.7884
112.3677
119.047

Example 8.7.1The Correlation Coefficient


PATIENT
NUMBER METHOD I METHOD II
1
132
130
2
138
134
3
144
132
4
146
140
5
148
150
6
152
144
7
158
150
8
130
122
9
162
160
10
168
150
11
172
160
12
174
178
13
180
168
14
180
174
15
188
186
16
194
172
17
194
182
18
200
178
19
200
196
20
204
188
21
210
180
22
210
196
23
216
210
24
220
190
25
220
202

4172 25a 4440b


757,276 4440a 808408b

a 20.8928

b 0.8220

y 20.8928 0.8220x

r
2

2
i

xi n
2

y n

2
i

METHOD I x
132
138
144
146
148
152
158
130
162
168
172
174
180
180
188
194
194
200
200
204
210
210
216
220
220
total 4440

METHOD II y
130
134
132
140
150
144
150
122
160
150
160
178
168
174
186
172
182
178
196
188
180
196
210
190
202
4172

x2

y2

xy

17424
19044
20736
21316
21904
23104
24964
16900
26244
28224
29584
30276
32400
32400
35344
37636
37636
40000
40000
41616
44100
44100
46656
48400
48400
808408

16900
17956
17424
19600
22500
20736
22500
14884
25600
22500
25600
31684
28224
30276
34596
29584
33124
31684
38416
35344
32400
38416
44100
36100
40804
710952

17160
18492
19008
20440
22200
21888
23700
15860
25920
25200
27520
30972
30240
31320
34968
33368
35308
35600
39200
38352
37800
41160
45360
41800
44440
757276


0.8220 808,408 4440

4172
710,952
2

0.9112713

25

25

r r 2 0.9112713 0.954605 0.95

Alternative aproach

n xi yi xi yi

n x xi
2
i

n y yi
2
i

25757,276 44404172
25808,408 4440

25710,952 4172

0.95

Example 8.7.2.

H0 : 0

HA : 0
n2
tr
1 r2
23
t 0.954605
15 .37
1 0.9112713
Table value is 2.8, so the two variable is correlated and reject the null
hypothrsis

When p is a nonzero value in hypothesis setting of


Ho
1 1

1 1 r
zr ln
2 1 r

ln

Estimated standard deviation of

zr z
1 n 3

1
n3

H 0 : 0.98

H A : 0.98

r 0.95

zr 1.83178

0.98

z 2.29756

1.83178 2.29756
Z
2.18
1 25 3
Here the table value of z is -1.96 so we reject the null hypothesis and
conclude that the population correltion coefficient is not equal to 0.98

When the sample size less than 25

3z r
z zr r
4n

*
*
z

Z*
z* *
1 n 1

pronounced zeta z p
*

n 1

n 1

3z

4n

31.83178 0.95
z 1.83178
1.76733
425
*

32.29756 0.98
2.29756
2.21883
425
*

Z 1.76733 2.21883 25 1 2.21


*

So the same conclusion is made here

zr z 1

n3

1.83178 1.96 1

25 3

1.83178 0.41787
1.41391,2.24965

The following are the weights (kg) and blood glocose levels (mg/100mL)
of 16 apparently healty adult males:

a. Find the simple linear regression equation and plot a scatter graph of
the data
b. test Ho:=0 using t-test

c. What is the predicted glucose level for a man who weights 95 kg?
Let =0.05 for all the tests.
sample

Weight (kg)

62

75

73

82

90

96

59

93

Glucose
(mg/100mL)

82

95

91 102 108 114

79 110

10

11

12

13

14

15

16

Weight (kg)

82

79

77

81

87

69

65

78

Glucose
(mg/100mL)

101

98

97 100 106

87

85

97

sample

Weight (kg)
62
75
73
82
90
96
59
93
82
79
77
81
87
69
65
78

Glucose (mg/100mL)
82
95
91
102
108
114
79
110
101
98
97
100
106
87
85
97

Вам также может понравиться