Вы находитесь на странице: 1из 69

PLG 500 STATISTICAL REASONING IN EDUCATION

Lecture 11: Simple Linear Regression

Contents
1) 2) 3) 4) 5) 6)

7)

Simple Linear Regression Obtaining the Regression Line Using SPSS Obtaining the Regression Equation Using SPSS Errors in Prediction Standard Error of Estimate Obtaining the Standard Error of Estimate Using SPSS Testing the Regression Coefficient for Statistical Significance

1. Simple Linear Regression

Simple linear regression is the process of predicting or estimating scores on one variable (Y), based on knowledge of scores on another variable (X), if Y and X are correlated

Y - the dependent, target or criterion variable X - the independent, regressor or predictor variable

Example 1: Predicting scores of Y from scores of X when the correlation between Y and X is perfect (r = 1)

Suppose you are interested in predicting scores on Y, based on knowledge of scores on X, using the following hypothetical data:
X 2 4 6 8 10 12 14 Y 3 4 5 6 7 8 9

a) Predict the score of Y when X =Y = 10 16 b) Predict the score of Y when X = Y = ? 125.5

Predicting the score of Y when X = 125.5


1. 2.

3.

4.

Plot a scatterplot Draw a straight line that best fits the data Determine the equation of the straight line Use the equation of the straight line to predict the score of Y when X = 125.5

Step 1: Plot a scatterplot


X 2 4 6 8 10 12 14 Y 3 4 5 6 7 8 9 Y

Step 2: Draw a straight line that best fits the data

Step 3: Determine the equation of the straight line


The equation of a straight line: Y=mX+c where m = gradient (slope) of the straight line

vertical distance m= horizontal distance


and c = Y-intercept, that is the value of Y where the straight line intercepts the Y-axis m=?,c=?

Y = m X of Step 3: Determine the equation+ the straight line c Y = 0.5 X + 2


c = Y-intercept =2 Y

vertical distance m= horizontal distance 1 = 2 = 0 .5

m = 0.5 indicates that an increase of 0.5 units in Y is X associated with an increase of 1 unit in X

Step 4: Use the equation of the straight line to predict the score of Y when X = 125.5

The equation of the straight line: Y = 0.5 X + 2 when X = 125.5, Y = 0.5 (125.5) + 2 = 64.75

Example 2: Predicting scores of Y from scores of X when the correlation between Y and X is not perfect (r 1)

Suppose you are interested in predicting students scores on creativity (Y), based on knowledge of their scores on logical reasoning (X) , using the following hypothetical data for 20 students:
17 15 9 8 15 11 17 8 11 12 13 18 7

X 15 10 7 18 5 10 7 Y 12 13 9 18 7 9

14 16 10 12 7 13 14 19 10 16 12 16 19 11

Predict the creativity score for a student with a logical reasoning score of 25. X = 25, Y = ?

Predicting the creativity score for a student with a logical reasoning score of 25
1. Plot a scatterplot 2. Draw a straight line that best fits the data 3. Determine the equation of the straight line 4. Use the equation of the straight line to predict the creativity score for a student with a logical reasoning score of 25
X = 25, Y = ?

Step 1: Plot a scatterplot


X 15 Y 12 10 13 7 9 18 18 5 7 10 9 7 14 17 16 15 10 9 12 8 7 15 13 11 14 17 19 8 10 11 16 12 12 13 16 18 19 7 11
20

18

16

14

12

Kreativiti (Y) Creativity(Y)

10

-5

10

15

20

25

30

35

-2

Penakulan Logik (X) Logical reasoning (X)

Step 2: Draw a straight line that best fits (the line of best fit) the data
20

Which is the line of best fit? The method of least squares

18

16

14

12

Kreativiti (Y) Creativity(Y)

10

-5

10

15

20

25

30

35

-2

Penakulan Logik (X) Logical reasoning (X)

The method of least squares

The method of least squares fits the straight line in such a way that: the sum of squares of the difference between the actual value of Y and the predicted value of Y (Y ) is a minimum Or is a minimum

(Y Y ' ) 2

The method of least squares


T 2 20 1 18

1 16

1 14

1 12

Creativity (Y)

1 10

-5 5

1 10

1 15

2 20

2 25

3 30

35

-2 2

Logical Reasoning (X)

The method of least squares


T 2 20 1 18

1 16

The line of best fit is called the regression line.

1 14

1 12

Creativity (Y)

1 10

-5 5

1 10

1 15

2 20

2 25

3 30

35

-2 2

Logical Reasoning (X)

The method of least squares


T 2 20 1 18

regression line

the difference between the actual value of Y and the predicted value of Y
1 14 1 12

1 16

Creativity (Y)

1 10

Y - Y''

Y'' (predicted value of Y (Y'') p Y (actual value of Y) a

-5 5

1 10

1 15

2 20

2 25

3 30

35

-2 2

Logical Reasoning (X)

The method of least squares


19
2 20

16.33

Y'' = 0.65X + 5.28 6 2 (regression equation) )

1 18

19 -16.33 =2.67 6 16 - 16.33 = -0.33 3

1 16

1 14

16
1 12

Creativity (Y)

1 10

Y - Y''

Y'' (predicted value of Y) p Y (actual value of Y) a

How to draw the regression line?


2 20 2 25 3 30 35

(Y Y) = 0
5 1 10 1 15

-5 5

-2 2

Logical Reasoning (X)

Actual value of Y Predicted value of Y

(Y Y ' ) 2 is a minimum
i

X
15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7

Y
12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11

Y = 0.65X + 5.28
Y = 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83

Y Y
12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17

(Y Y)2
9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37

X = 233 Y =257

* (Y Y) =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).

(Y Y) = 0.00*

(Y Y)2 = 116.59

Step 3: Determine the equation of the regression line


To draw the regression line, we need to determine the equation of the regression line which is called the regression equation The regression equation is defined as follows:

Y = b X + a
where Y = predicted score of Y b = gradient of the regression line (regression coefficient) X = the score used to predict the score of Y a = Y-intercept (regression constant)

Regression coefficient, b
The value of b, which is the gradient of the regression line, is called the regression coefficient The regression coefficient shows the amount of change in Y that is associated with a unit change in X The formula for finding b is as follows: n XY X Y b= n X 2 ( X ) 2

Regression constant, a
The value of a, which is the Y-intercept of the regression line, is called the regression constant The regression constant shows the value of Y where the regression line intercepts the Y-axis or the value of Y when X equals 0 Y b X The formula for finding a is as follows: a=
n or a = Y b X

b=
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7

n XY X Y n X 2 ( X ) 2
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11 Y =257

Y b X a= n
XY 180 130 63 324 35 90 98 272 150 108 56 195 154 323 80 176 144 208 342 77 XY =3 205

n = 20
X2 225 100 49 324 25 100 49 289 225 81 64 225 121 289 64 121 144 169 324 49 X2 =3 037

X = 233

Computing the value of the regression coefficient, b


X = 233 Y =257

XY =3 205 X2 =3 037

n = 20

n XY X Y b= n X 2 ( X ) 2 20(3,205) (233)(257) = 20(3,037) 233 2 = 0.65

The positive value of b, that is 0.65 shows that a 0.65-unit increase in Y is associated with a 1unit increase in X

Computing the value of the regression constant, a


n = 20
X = 233 Y =257

Y b X a= n 257 (0.65)(233) = 20 = 5.28


The positive value of a, that is 5.28 shows that the regression line intercepts the Y-axis at 5.28

Determining the regression equation for Example 2: Therefore, the regression equation of the regression line is: Y = b X + a Y = 0.65 X + 5.28

The regression equation for Example 2


20

Y' = 0.65X + 5.28 (regression equation)

18

16

14

12

Creativity (Y)

10

a= 5.28

0.65-unit increase in Y 1-unit increase in X

b= 0.65
10 15 20 25 30 35

-5

-2

Logical Reasoning (X)

Step 4: Use the regression equation of the regression line to predict the creativity score (Y) of a student when his or her logical reasoning score (X) is 25 The regression equation of the regression line: Y = 0.65 X + 5.28 when X = 25, Y = 0.65 (25) + 5.28 = 21.53

2. Obtaining the Regression Line Using SPSS

i. ii.

Obtaining the scatterplot using SPSS Obtaining the regression line using SPSS

i. Obtaining the scatterplot using SPSS


Create a file for the data set Scatter Click Graphs Click Simple and then click Define Click on the Y variable and click the place it in the Y Axis box Click on the X variable and click the place it in the X Axis box Click OK

to to

ii. Obtaining the Regression Line Using SPSS


Double-click on the scatterplot Click on any point in the scatterplot Click Chart > Add Chart Element > Fit Line at Total Click Linear > Apply > Close Exit Chart Editor

SPSS scatterplot with regression line

3. Obtaining the Regression Equation Using SPSS


Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK

3. Obtaining the Regression Equation Using SPSS


Coefficients(a)
Model Unstandardized Coefficients B Std. Error Standardized Coefficients Beta t Sig.

(Constant) Logical Reasoning (X)

5.231 .654

a b

1.746 .142

.736

2.996 4.615

.008 .000

a Dependent Variable: Creativity (Y)

Y = b X + a Y = 0.654 X + 5.231

4. Errors in Prediction

Errors in prediction are the differences between the actual scores of Y and the predicted scores of Y (Y ) The formula for the calculation of the error in prediction (e ) is as follows:

e = Y Y

4. Errors in Prediction
20

Y' = 0.65X + 5.28 (regression equation) 19 -16.33 =2.67 16 - 16.33 = -0.33

18

16

14

e = Y Y
12

Creativity (Y)

10

e = Y - Y'

Y' (predicted value of Y) Y (actual value of Y)

-5

10

15

20

25

30

35

-2

Logical Reasoning (X)

X
15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7

Y
12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11

Y = 0.65X + 5.28
Y = 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83

e = Y Y
12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17

e2 = (Y Y)2
9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37

X = 233 Y =257

* e =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).

e = 0.00*

e2 = 116.59

5. Standard Error of Estimate


The standard deviation of the distribution of errors in prediction is called the standard error of estimate The standard error of estimate is an overall measure of the extent to which the predicted Y values deviate from the actual Y values is represented by se

5. Standard Error of Estimate

The formula for the calculation of the standard error of estimate (se ) is as follows:

se =

) 2 (Y Y n2 e2 n2

or s e =

X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7

Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11

Y = 0.65X + 5.28 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83

Y Y 12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17

(Y Y)2 9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37

X = 233 Y =257

(Y Y) = 0.00*

(Y Y)2 = 116.59

* (Y Y) =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).

5. Standard Error of Estimate

The standard error of estimate for the creativity score is: se = (Y Y ) 2 n2


(Y Y)2 = 116.59
n = 20

116.59 = 20 2 = 2.55

5. Standard Error of Estimate

The stronger the correlation between Y and X

The smaller the standard error of estimate

The greater the accuracy of prediction

5. Standard Error of Estimate

The stronger the correlation between Y and X (e.g., r = 1) The smaller the standard error of estimate (e.g., se = 0) The greater the accuracy of prediction (100% accurate)

6. Obtaining the standard error of estimate using SPSS


Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK

SPSS output
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate

.736(a)

.542

.517

2.545

a Predictors: (Constant), X, Penakulan Logik

se = 2.545

7. Testing the regression coefficient for statistical significance

To determine whether the predictor variable (X ) is a statistically significant predictor of the criterion variable (Y ) That is, to determine whether knowledge of scores on the X variable will enhance the prediction of scores on the Y variable

7. Testing the regression coefficient for statistical significance

1.

2.

3.

Assumptions underlying the significance test for the regression coefficient: The scores for each variable are normally distributed The cases represent a random sample from the population Both variables are independent

7. Testing the regression coefficient for statistical significance


1. 2. 3. 4.

5.

Steps for the significance test: State the null and alternative hypotheses Set the criterion for rejecting the null hypothesis Carry out the analysis using SPSS Make a decision by applying the criterion for rejecting the null hypothesis Make a conclusion in the context of the problem

Example 1: Suppose you are interested in predicting students scores on creativity (Y), based on knowledge of their scores on logical reasoning (X) , using the following hypothetical data for 20 students. Test whether logical reasoning is a statistically significant predictor of creativity at the 0.01 level of significance.
17 15 9 8 15 11 17 8 11 12 13 18 7

X 15 10 7 18 5 10 7 Y 12 13 9 18 7 9

14 16 10 12 7 13 14 19 10 16 12 16 19 11

Step 1: State the null and alternative hypotheses

Ho :

=0

(Logical reasoning is not a statistically significant predictor of creativity in the population)

H1 :

(Logical reasoning is a statistically significant predictor of creativity in the population)

or beta is the population regression coefficient

Step 2: Set the criterion for rejecting the null hypothesis

Reject Ho if p < 0.01

p < 0.01 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is less than 0.01) 0.01 is the level of significance (or )

Step 3: Carry out the analysis using SPSS


Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK

SPSS output
Coefficients(a)
Model Unstandardized Coefficients B Std. Error Standardized Coefficients Beta t Sig.

(Constant) Logical Reasoning, X

5.231 .654

a b

1.746 .142

.736

2.996 4.615

.008 .000

a Dependent Variable: Creativity, Y

p = .000 (< 0.01)

Y = b X + a Y = 0.654 X + 5.231

Step 4: Make a decision by applying the criterion for rejecting the null hypothesis

From the SPSS output, p = 0.000 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is 0.000) Therefore, reject Ho because p < 0.01

Step 5: Make a conclusion in the context of the problem

Logical reasoning is a statistically significant predictor of creativity in the population, t (19) = 4.615, p < .01 The regression equation is as follows: Y = 0.654 X + 5.231

Predicted creativity = 0.654 (logical reasoning) + 5.231

(df = N-1)

Coefficient of determination, r2
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate

.736(a)

.542

.517

2.545

a Predictors: (Constant), X, Penakulan Logik

r2 = 0.542

Step 5: Make a conclusion in the context of the problem


r 2 = 0.542 54.2% of the variance in creativity scores can be associated with (explained by) the variance in logical reasoning scores [Or only 45.8% of the variance in creativity scores cannot be associated with (explained by) the variance in logical reasoning scores]

7. Testing the regression coefficient for statistical significance

Example 2:
The following table shows the order in which 14 students turn in their test papers and their test scores Test whether the order in which students turn in their test papers is a statistically significant predictor of their test scores

Order in which students turn in their test papers 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Test score 80 65 88 63 75 65 60 85 87 75 76 72 85 60

Step 1: State the null and alternative hypotheses

Ho :

=0

(The order in which students turn in their test papers is not a statistically significant predictor of their test scores in the population)

H1 :

(The order in which students turn in their test papers is a statistically significant predictor of their test scores in the population)

or beta is the population regression coefficient

Step 2: Set the criterion for rejecting the null hypothesis

Reject Ho if p < 0.05

p < 0.05 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is less than 0.05) 0.05 is the level of significance (or )

Step 3: Carry out the analysis using SPSS


Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK

SPSS output
Model B 1 (Constant) The order in which students turn in their test papers, X 74.033 -.004 Unstandardized Coefficients

Coefficients(a)

Standardized Coefficients Beta

Sig.

Std. Error 5.939 .697

12.466 -.002 -.006

.000 .995

a Dependent Variable: Test score, Y

p = . 995 (> 0.05)

Step 4: Make a decision by applying the criterion for rejecting the null hypothesis

From the SPSS output, p = 0.995 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is 0.995) Therefore, fail to reject Ho because p > 0.05

Step 5: Make a conclusion in the context of the problem

The order in which students turn in their test papers is not a statistically significant predictor of their test scores in the population, t (13) = -.006, p > .05 (That is, knowledge of the order in which students turn in their test papers does not enhance the prediction of their test scores)

Coefficient of determination, r
Model Summary
Model R R Square

Adjusted R Square

Std. Error of the Estimate

.002(a)

.000

-.083

10.520

a Predictors: (Constant), The order in which students turn in their test papers, X

r2 = 0.000

Step 5: Make a conclusion in the context of the problem


r 2 = 0.000 0% of the variance in the test scores can be associated with (explained by) the variance in the order in which students turn in their test papers [Or 100% of the variance in the test scores cannot be associated with (explained by) the variance in the order in which students turn in their test papers]

Thank you for your attention

Вам также может понравиться