100310PLG500 L11-Simple Linear Regression

PLG 500 STATISTICAL REASONING IN EDUCATION
Lecture 11: Simple Linear Regression
Contents
1) 2) 3) 4) 5) 6)
7)
Simple Linear Regression Obtaining the Regression Line Using SPSS Obtaining the Regression Equation Using SPSS Errors in Prediction Standard Error of Estimate Obtaining the Standard Error of Estimate Using SPSS Testing the Regression Coefficient for Statistical Significance
1. Simple Linear Regression
Simple linear regression is the process of predicting or estimating scores on one variable (Y), based on knowledge of scores on another variable (X), if Y and X are correlated
Y - the dependent, target or criterion variable X - the independent, regressor or predictor variable
Example 1: Predicting scores of Y from scores of X when the correlation between Y and X is perfect (r = 1)
Suppose you are interested in predicting scores on Y, based on knowledge of scores on X, using the following hypothetical data:
X 2 4 6 8 10 12 14 Y 3 4 5 6 7 8 9
a) Predict the score of Y when X =Y = 10 16 b) Predict the score of Y when X = Y = ? 125.5
Predicting the score of Y when X = 125.5

1. 2.
3.
4.
Plot a scatterplot Draw a straight line that best fits the data Determine the equation of the straight line Use the equation of the straight line to predict the score of Y when X = 125.5
Step 1: Plot a scatterplot

X 2 4 6 8 10 12 14 Y 3 4 5 6 7 8 9 Y
Step 2: Draw a straight line that best fits the data
Step 3: Determine the equation of the straight line

The equation of a straight line: Y=mX+c where m = gradient (slope) of the straight line
vertical distance m= horizontal distance

and c = Y-intercept, that is the value of Y where the straight line intercepts the Y-axis m=?,c=?
Y = m X of Step 3: Determine the equation+ the straight line c Y = 0.5 X + 2

c = Y-intercept =2 Y
vertical distance m= horizontal distance 1 = 2 = 0 .5
m = 0.5 indicates that an increase of 0.5 units in Y is X associated with an increase of 1 unit in X
Step 4: Use the equation of the straight line to predict the score of Y when X = 125.5
The equation of the straight line: Y = 0.5 X + 2 when X = 125.5, Y = 0.5 (125.5) + 2 = 64.75
Example 2: Predicting scores of Y from scores of X when the correlation between Y and X is not perfect (r 1)
Suppose you are interested in predicting students scores on creativity (Y), based on knowledge of their scores on logical reasoning (X) , using the following hypothetical data for 20 students:
17 15 9 8 15 11 17 8 11 12 13 18 7
X 15 10 7 18 5 10 7 Y 12 13 9 18 7 9
14 16 10 12 7 13 14 19 10 16 12 16 19 11
Predict the creativity score for a student with a logical reasoning score of 25. X = 25, Y = ?
Predicting the creativity score for a student with a logical reasoning score of 25
1. Plot a scatterplot 2. Draw a straight line that best fits the data 3. Determine the equation of the straight line 4. Use the equation of the straight line to predict the creativity score for a student with a logical reasoning score of 25
X = 25, Y = ?
Step 1: Plot a scatterplot

X 15 Y 12 10 13 7 9 18 18 5 7 10 9 7 14 17 16 15 10 9 12 8 7 15 13 11 14 17 19 8 10 11 16 12 12 13 16 18 19 7 11
20
18
16
14
12
Kreativiti (Y) Creativity(Y)
10
-5
10
15
20
25
30
35
-2
Penakulan Logik (X) Logical reasoning (X)
Step 2: Draw a straight line that best fits (the line of best fit) the data
20
Which is the line of best fit? The method of least squares
18
16
14
12
Kreativiti (Y) Creativity(Y)
10
-5
10
15
20
25
30
35
-2
Penakulan Logik (X) Logical reasoning (X)
The method of least squares
The method of least squares fits the straight line in such a way that: the sum of squares of the difference between the actual value of Y and the predicted value of Y (Y ) is a minimum Or is a minimum
(Y Y ' ) 2

T 2 20 1 18
1 16
1 14
1 12
Creativity (Y)
1 10
-5 5
1 10
1 15
2 20
2 25
3 30
35
-2 2
Logical Reasoning (X)

T 2 20 1 18
1 16
The line of best fit is called the regression line.
1 14
1 12
Creativity (Y)
1 10
-5 5
1 10
1 15
2 20
2 25
3 30
35
-2 2

T 2 20 1 18
regression line
the difference between the actual value of Y and the predicted value of Y
1 14 1 12
1 16
Creativity (Y)
1 10
Y - Y''
Y'' (predicted value of Y (Y'') p Y (actual value of Y) a
-5 5
1 10
1 15
2 20
2 25
3 30
35
-2 2

19
2 20
16.33
Y'' = 0.65X + 5.28 6 2 (regression equation) )
1 18
19 -16.33 =2.67 6 16 - 16.33 = -0.33 3
1 16
1 14
16
1 12
Creativity (Y)
1 10
Y - Y''
Y'' (predicted value of Y) p Y (actual value of Y) a
How to draw the regression line?

2 20 2 25 3 30 35
(Y Y) = 0
5 1 10 1 15
-5 5
-2 2
Actual value of Y Predicted value of Y
(Y Y ' ) 2 is a minimum
i
X
15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y
12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
Y = 0.65X + 5.28
Y = 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83
Y Y
12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17
(Y Y)2
9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37
X = 233 Y =257
* (Y Y) =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).
(Y Y) = 0.00*
(Y Y)2 = 116.59
Step 3: Determine the equation of the regression line

To draw the regression line, we need to determine the equation of the regression line which is called the regression equation The regression equation is defined as follows:
Y = b X + a
where Y = predicted score of Y b = gradient of the regression line (regression coefficient) X = the score used to predict the score of Y a = Y-intercept (regression constant)
Regression coefficient, b
The value of b, which is the gradient of the regression line, is called the regression coefficient The regression coefficient shows the amount of change in Y that is associated with a unit change in X The formula for finding b is as follows: n XY X Y b= n X 2 ( X ) 2
Regression constant, a
The value of a, which is the Y-intercept of the regression line, is called the regression constant The regression constant shows the value of Y where the regression line intercepts the Y-axis or the value of Y when X equals 0 Y b X The formula for finding a is as follows: a=
n or a = Y b X
b=
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
n XY X Y n X 2 ( X ) 2
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11 Y =257
Y b X a= n
XY 180 130 63 324 35 90 98 272 150 108 56 195 154 323 80 176 144 208 342 77 XY =3 205
n = 20
X2 225 100 49 324 25 100 49 289 225 81 64 225 121 289 64 121 144 169 324 49 X2 =3 037
X = 233
Computing the value of the regression coefficient, b

X = 233 Y =257
XY =3 205 X2 =3 037
n = 20
n XY X Y b= n X 2 ( X ) 2 20(3,205) (233)(257) = 20(3,037) 233 2 = 0.65
The positive value of b, that is 0.65 shows that a 0.65-unit increase in Y is associated with a 1unit increase in X
Computing the value of the regression constant, a

n = 20
X = 233 Y =257
Y b X a= n 257 (0.65)(233) = 20 = 5.28

The positive value of a, that is 5.28 shows that the regression line intercepts the Y-axis at 5.28
Determining the regression equation for Example 2: Therefore, the regression equation of the regression line is: Y = b X + a Y = 0.65 X + 5.28
The regression equation for Example 2

20
Y' = 0.65X + 5.28 (regression equation)
18
16
14
12
Creativity (Y)
10
a= 5.28
0.65-unit increase in Y 1-unit increase in X
b= 0.65
10 15 20 25 30 35
-5
-2
Step 4: Use the regression equation of the regression line to predict the creativity score (Y) of a student when his or her logical reasoning score (X) is 25 The regression equation of the regression line: Y = 0.65 X + 5.28 when X = 25, Y = 0.65 (25) + 5.28 = 21.53
2. Obtaining the Regression Line Using SPSS
i. ii.
Obtaining the scatterplot using SPSS Obtaining the regression line using SPSS
i. Obtaining the scatterplot using SPSS

Create a file for the data set Scatter Click Graphs Click Simple and then click Define Click on the Y variable and click the place it in the Y Axis box Click on the X variable and click the place it in the X Axis box Click OK
to to
ii. Obtaining the Regression Line Using SPSS

Double-click on the scatterplot Click on any point in the scatterplot Click Chart > Add Chart Element > Fit Line at Total Click Linear > Apply > Close Exit Chart Editor
SPSS scatterplot with regression line
3. Obtaining the Regression Equation Using SPSS

Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK
3. Obtaining the Regression Equation Using SPSS

Coefficients(a)
Model Unstandardized Coefficients B Std. Error Standardized Coefficients Beta t Sig.
(Constant) Logical Reasoning (X)
5.231 .654
a b
1.746 .142
.736
2.996 4.615
.008 .000
a Dependent Variable: Creativity (Y)
Y = b X + a Y = 0.654 X + 5.231
4. Errors in Prediction
Errors in prediction are the differences between the actual scores of Y and the predicted scores of Y (Y ) The formula for the calculation of the error in prediction (e ) is as follows:
e = Y Y
4. Errors in Prediction
20
Y' = 0.65X + 5.28 (regression equation) 19 -16.33 =2.67 16 - 16.33 = -0.33
18
16
14
e = Y Y
12
Creativity (Y)
10
e = Y - Y'
Y' (predicted value of Y) Y (actual value of Y)
-5
10
15
20
25
30
35
-2
X
15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y
12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
Y = 0.65X + 5.28
Y = 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83
e = Y Y
12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17
e2 = (Y Y)2
9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37
X = 233 Y =257
* e =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).
e = 0.00*
e2 = 116.59
5. Standard Error of Estimate

The standard deviation of the distribution of errors in prediction is called the standard error of estimate The standard error of estimate is an overall measure of the extent to which the predicted Y values deviate from the actual Y values is represented by se
The formula for the calculation of the standard error of estimate (se ) is as follows:
se =
) 2 (Y Y n2 e2 n2
or s e =
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
Y = 0.65X + 5.28 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83
Y Y 12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17
(Y Y)2 9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37
X = 233 Y =257
(Y Y) = 0.00*
(Y Y)2 = 116.59
* (Y Y) =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).
The standard error of estimate for the creativity score is: se = (Y Y ) 2 n2

(Y Y)2 = 116.59
n = 20
116.59 = 20 2 = 2.55
The stronger the correlation between Y and X
The smaller the standard error of estimate
The greater the accuracy of prediction
The stronger the correlation between Y and X (e.g., r = 1) The smaller the standard error of estimate (e.g., se = 0) The greater the accuracy of prediction (100% accurate)
6. Obtaining the standard error of estimate using SPSS

SPSS output
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate
.736(a)
.542
.517
2.545
a Predictors: (Constant), X, Penakulan Logik
se = 2.545
7. Testing the regression coefficient for statistical significance
To determine whether the predictor variable (X ) is a statistically significant predictor of the criterion variable (Y ) That is, to determine whether knowledge of scores on the X variable will enhance the prediction of scores on the Y variable
1.
2.
3.
Assumptions underlying the significance test for the regression coefficient: The scores for each variable are normally distributed The cases represent a random sample from the population Both variables are independent

1. 2. 3. 4.
5.
Steps for the significance test: State the null and alternative hypotheses Set the criterion for rejecting the null hypothesis Carry out the analysis using SPSS Make a decision by applying the criterion for rejecting the null hypothesis Make a conclusion in the context of the problem
Example 1: Suppose you are interested in predicting students scores on creativity (Y), based on knowledge of their scores on logical reasoning (X) , using the following hypothetical data for 20 students. Test whether logical reasoning is a statistically significant predictor of creativity at the 0.01 level of significance.
17 15 9 8 15 11 17 8 11 12 13 18 7
X 15 10 7 18 5 10 7 Y 12 13 9 18 7 9
14 16 10 12 7 13 14 19 10 16 12 16 19 11
Step 1: State the null and alternative hypotheses
Ho :
=0
(Logical reasoning is not a statistically significant predictor of creativity in the population)
H1 :
(Logical reasoning is a statistically significant predictor of creativity in the population)
or beta is the population regression coefficient
Step 2: Set the criterion for rejecting the null hypothesis
Reject Ho if p < 0.01
p < 0.01 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is less than 0.01) 0.01 is the level of significance (or )
Step 3: Carry out the analysis using SPSS

SPSS output
Coefficients(a)
Model Unstandardized Coefficients B Std. Error Standardized Coefficients Beta t Sig.
(Constant) Logical Reasoning, X
5.231 .654
a b
1.746 .142
.736
2.996 4.615
.008 .000
a Dependent Variable: Creativity, Y
p = .000 (< 0.01)
Y = b X + a Y = 0.654 X + 5.231
Step 4: Make a decision by applying the criterion for rejecting the null hypothesis

From the SPSS output, p = 0.000 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is 0.000) Therefore, reject Ho because p < 0.01
Step 5: Make a conclusion in the context of the problem
Logical reasoning is a statistically significant predictor of creativity in the population, t (19) = 4.615, p < .01 The regression equation is as follows: Y = 0.654 X + 5.231
Predicted creativity = 0.654 (logical reasoning) + 5.231
(df = N-1)
Coefficient of determination, r2
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate
.736(a)
.542
.517
2.545
a Predictors: (Constant), X, Penakulan Logik
r2 = 0.542

r 2 = 0.542 54.2% of the variance in creativity scores can be associated with (explained by) the variance in logical reasoning scores [Or only 45.8% of the variance in creativity scores cannot be associated with (explained by) the variance in logical reasoning scores]
Example 2:
The following table shows the order in which 14 students turn in their test papers and their test scores Test whether the order in which students turn in their test papers is a statistically significant predictor of their test scores
Order in which students turn in their test papers 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Test score 80 65 88 63 75 65 60 85 87 75 76 72 85 60
Step 1: State the null and alternative hypotheses
Ho :
=0
(The order in which students turn in their test papers is not a statistically significant predictor of their test scores in the population)
H1 :
(The order in which students turn in their test papers is a statistically significant predictor of their test scores in the population)
or beta is the population regression coefficient
Step 2: Set the criterion for rejecting the null hypothesis
Reject Ho if p < 0.05
p < 0.05 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is less than 0.05) 0.05 is the level of significance (or )
Step 3: Carry out the analysis using SPSS

SPSS output
Model B 1 (Constant) The order in which students turn in their test papers, X 74.033 -.004 Unstandardized Coefficients
Coefficients(a)
Standardized Coefficients Beta
Sig.
Std. Error 5.939 .697
12.466 -.002 -.006
.000 .995
a Dependent Variable: Test score, Y
p = . 995 (> 0.05)
Step 4: Make a decision by applying the criterion for rejecting the null hypothesis

From the SPSS output, p = 0.995 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is 0.995) Therefore, fail to reject Ho because p > 0.05
The order in which students turn in their test papers is not a statistically significant predictor of their test scores in the population, t (13) = -.006, p > .05 (That is, knowledge of the order in which students turn in their test papers does not enhance the prediction of their test scores)
Coefficient of determination, r
Model Summary
Model R R Square
Adjusted R Square
Std. Error of the Estimate
.002(a)
.000
-.083
10.520
a Predictors: (Constant), The order in which students turn in their test papers, X
r2 = 0.000

r 2 = 0.000 0% of the variance in the test scores can be associated with (explained by) the variance in the order in which students turn in their test papers [Or 100% of the variance in the test scores cannot be associated with (explained by) the variance in the order in which students turn in their test papers]
Thank you for your attention

100310PLG500 L11-Simple Linear Regression

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

100310PLG500 L11-Simple Linear Regression

Загружено:

Авторское право:

Доступные форматы

PLG 500 STATISTICAL REASONING IN EDUCATION

Lecture 11: Simple Linear Regression

1. Simple Linear Regression

a) Predict the score of Y when X =Y = 10 16 b) Predict the score of Y when X = Y = ? 125.5

Predicting the score of Y when X = 125.5

Step 1: Plot a scatterplot

Step 2: Draw a straight line that best fits the data

Step 3: Determine the equation of the straight line

vertical distance m= horizontal distance

Y = m X of Step 3: Determine the equation+ the straight line c Y = 0.5 X + 2

vertical distance m= horizontal distance 1 = 2 = 0 .5

Step 1: Plot a scatterplot

Kreativiti (Y) Creativity(Y)

Penakulan Logik (X) Logical reasoning (X)

Which is the line of best fit? The method of least squares

Kreativiti (Y) Creativity(Y)

Penakulan Logik (X) Logical reasoning (X)

The method of least squares

The method of least squares

Logical Reasoning (X)

The method of least squares

The line of best fit is called the regression line.

Logical Reasoning (X)

The method of least squares

Y'' (predicted value of Y (Y'') p Y (actual value of Y) a

Logical Reasoning (X)

The method of least squares

Y'' = 0.65X + 5.28 6 2 (regression equation) )

19 -16.33 =2.67 6 16 - 16.33 = -0.33 3

Y'' (predicted value of Y) p Y (actual value of Y) a

How to draw the regression line?

Logical Reasoning (X)

Actual value of Y Predicted value of Y

Step 3: Determine the equation of the regression line

Computing the value of the regression coefficient, b

n XY X Y b= n X 2 ( X ) 2 20(3,205) (233)(257) = 20(3,037) 233 2 = 0.65

Computing the value of the regression constant, a

Y b X a= n 257 (0.65)(233) = 20 = 5.28

The regression equation for Example 2

Y' = 0.65X + 5.28 (regression equation)

0.65-unit increase in Y 1-unit increase in X

Logical Reasoning (X)

2. Obtaining the Regression Line Using SPSS

i. Obtaining the scatterplot using SPSS

ii. Obtaining the Regression Line Using SPSS

SPSS scatterplot with regression line

3. Obtaining the Regression Equation Using SPSS

3. Obtaining the Regression Equation Using SPSS

(Constant) Logical Reasoning (X)

a Dependent Variable: Creativity (Y)

Y' = 0.65X + 5.28 (regression equation) 19 -16.33 =2.67 16 - 16.33 = -0.33

Y' (predicted value of Y) Y (actual value of Y)

Logical Reasoning (X)

5. Standard Error of Estimate

5. Standard Error of Estimate

5. Standard Error of Estimate

The standard error of estimate for the creativity score is: se = (Y Y ) 2 n2

5. Standard Error of Estimate

The stronger the correlation between Y and X

The smaller the standard error of estimate

The greater the accuracy of prediction

5. Standard Error of Estimate

6. Obtaining the standard error of estimate using SPSS