Академический Документы
Профессиональный Документы
Культура Документы
Contents
1) 2) 3) 4) 5) 6)
7)
Simple Linear Regression Obtaining the Regression Line Using SPSS Obtaining the Regression Equation Using SPSS Errors in Prediction Standard Error of Estimate Obtaining the Standard Error of Estimate Using SPSS Testing the Regression Coefficient for Statistical Significance
Simple linear regression is the process of predicting or estimating scores on one variable (Y), based on knowledge of scores on another variable (X), if Y and X are correlated
Y - the dependent, target or criterion variable X - the independent, regressor or predictor variable
Example 1: Predicting scores of Y from scores of X when the correlation between Y and X is perfect (r = 1)
Suppose you are interested in predicting scores on Y, based on knowledge of scores on X, using the following hypothetical data:
X 2 4 6 8 10 12 14 Y 3 4 5 6 7 8 9
3.
4.
Plot a scatterplot Draw a straight line that best fits the data Determine the equation of the straight line Use the equation of the straight line to predict the score of Y when X = 125.5
m = 0.5 indicates that an increase of 0.5 units in Y is X associated with an increase of 1 unit in X
Step 4: Use the equation of the straight line to predict the score of Y when X = 125.5
The equation of the straight line: Y = 0.5 X + 2 when X = 125.5, Y = 0.5 (125.5) + 2 = 64.75
Example 2: Predicting scores of Y from scores of X when the correlation between Y and X is not perfect (r 1)
Suppose you are interested in predicting students scores on creativity (Y), based on knowledge of their scores on logical reasoning (X) , using the following hypothetical data for 20 students:
17 15 9 8 15 11 17 8 11 12 13 18 7
X 15 10 7 18 5 10 7 Y 12 13 9 18 7 9
14 16 10 12 7 13 14 19 10 16 12 16 19 11
Predict the creativity score for a student with a logical reasoning score of 25. X = 25, Y = ?
Predicting the creativity score for a student with a logical reasoning score of 25
1. Plot a scatterplot 2. Draw a straight line that best fits the data 3. Determine the equation of the straight line 4. Use the equation of the straight line to predict the creativity score for a student with a logical reasoning score of 25
X = 25, Y = ?
18
16
14
12
10
-5
10
15
20
25
30
35
-2
Step 2: Draw a straight line that best fits (the line of best fit) the data
20
18
16
14
12
10
-5
10
15
20
25
30
35
-2
The method of least squares fits the straight line in such a way that: the sum of squares of the difference between the actual value of Y and the predicted value of Y (Y ) is a minimum Or is a minimum
(Y Y ' ) 2
1 16
1 14
1 12
Creativity (Y)
1 10
-5 5
1 10
1 15
2 20
2 25
3 30
35
-2 2
1 16
1 14
1 12
Creativity (Y)
1 10
-5 5
1 10
1 15
2 20
2 25
3 30
35
-2 2
regression line
the difference between the actual value of Y and the predicted value of Y
1 14 1 12
1 16
Creativity (Y)
1 10
Y - Y''
-5 5
1 10
1 15
2 20
2 25
3 30
35
-2 2
16.33
1 18
1 16
1 14
16
1 12
Creativity (Y)
1 10
Y - Y''
(Y Y) = 0
5 1 10 1 15
-5 5
-2 2
(Y Y ' ) 2 is a minimum
i
X
15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y
12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
Y = 0.65X + 5.28
Y = 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83
Y Y
12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17
(Y Y)2
9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37
X = 233 Y =257
* (Y Y) =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).
(Y Y) = 0.00*
(Y Y)2 = 116.59
Y = b X + a
where Y = predicted score of Y b = gradient of the regression line (regression coefficient) X = the score used to predict the score of Y a = Y-intercept (regression constant)
Regression coefficient, b
The value of b, which is the gradient of the regression line, is called the regression coefficient The regression coefficient shows the amount of change in Y that is associated with a unit change in X The formula for finding b is as follows: n XY X Y b= n X 2 ( X ) 2
Regression constant, a
The value of a, which is the Y-intercept of the regression line, is called the regression constant The regression constant shows the value of Y where the regression line intercepts the Y-axis or the value of Y when X equals 0 Y b X The formula for finding a is as follows: a=
n or a = Y b X
b=
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
n XY X Y n X 2 ( X ) 2
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11 Y =257
Y b X a= n
XY 180 130 63 324 35 90 98 272 150 108 56 195 154 323 80 176 144 208 342 77 XY =3 205
n = 20
X2 225 100 49 324 25 100 49 289 225 81 64 225 121 289 64 121 144 169 324 49 X2 =3 037
X = 233
XY =3 205 X2 =3 037
n = 20
The positive value of b, that is 0.65 shows that a 0.65-unit increase in Y is associated with a 1unit increase in X
Determining the regression equation for Example 2: Therefore, the regression equation of the regression line is: Y = b X + a Y = 0.65 X + 5.28
18
16
14
12
Creativity (Y)
10
a= 5.28
b= 0.65
10 15 20 25 30 35
-5
-2
Step 4: Use the regression equation of the regression line to predict the creativity score (Y) of a student when his or her logical reasoning score (X) is 25 The regression equation of the regression line: Y = 0.65 X + 5.28 when X = 25, Y = 0.65 (25) + 5.28 = 21.53
i. ii.
Obtaining the scatterplot using SPSS Obtaining the regression line using SPSS
to to
5.231 .654
a b
1.746 .142
.736
2.996 4.615
.008 .000
Y = b X + a Y = 0.654 X + 5.231
4. Errors in Prediction
Errors in prediction are the differences between the actual scores of Y and the predicted scores of Y (Y ) The formula for the calculation of the error in prediction (e ) is as follows:
e = Y Y
4. Errors in Prediction
20
18
16
14
e = Y Y
12
Creativity (Y)
10
e = Y - Y'
-5
10
15
20
25
30
35
-2
X
15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y
12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
Y = 0.65X + 5.28
Y = 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83
e = Y Y
12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17
e2 = (Y Y)2
9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37
X = 233 Y =257
* e =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).
e = 0.00*
e2 = 116.59
The standard deviation of the distribution of errors in prediction is called the standard error of estimate The standard error of estimate is an overall measure of the extent to which the predicted Y values deviate from the actual Y values is represented by se
The formula for the calculation of the standard error of estimate (se ) is as follows:
se =
) 2 (Y Y n2 e2 n2
or s e =
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
Y = 0.65X + 5.28 0.65(15) + 5.28 = 15.03 11.78 9.83 16.98 8.53 11.78 9.83 16.33 15.03 11.13 10.48 15.03 12.43 16.33 10.48 12.43 13.08 13.73 16.98 9.83
Y Y 12 15.03 = -3.03 1.22 -0.83 1.02 -1.53 -2.78 4.17 -0.33 -5.03 0.87 -3.48 -2.03 1.57 2.67 -0.48 3.57 -1.08 2.27 2.02 1.17
(Y Y)2 9.18 1.49 0.69 1.04 2.34 7.73 17.39 0.11 25.30 0.76 12.11 4.12 2.46 7.13 0.23 12.74 1.17 5.15 4.08 1.37
X = 233 Y =257
(Y Y) = 0.00*
(Y Y)2 = 116.59
* (Y Y) =0.05, and does not equal zero because of rounding errors (Hinkle et al., 2003).
116.59 = 20 2 = 2.55
The stronger the correlation between Y and X (e.g., r = 1) The smaller the standard error of estimate (e.g., se = 0) The greater the accuracy of prediction (100% accurate)
Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK
SPSS output
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate
.736(a)
.542
.517
2.545
se = 2.545
To determine whether the predictor variable (X ) is a statistically significant predictor of the criterion variable (Y ) That is, to determine whether knowledge of scores on the X variable will enhance the prediction of scores on the Y variable
1.
2.
3.
Assumptions underlying the significance test for the regression coefficient: The scores for each variable are normally distributed The cases represent a random sample from the population Both variables are independent
5.
Steps for the significance test: State the null and alternative hypotheses Set the criterion for rejecting the null hypothesis Carry out the analysis using SPSS Make a decision by applying the criterion for rejecting the null hypothesis Make a conclusion in the context of the problem
Example 1: Suppose you are interested in predicting students scores on creativity (Y), based on knowledge of their scores on logical reasoning (X) , using the following hypothetical data for 20 students. Test whether logical reasoning is a statistically significant predictor of creativity at the 0.01 level of significance.
17 15 9 8 15 11 17 8 11 12 13 18 7
X 15 10 7 18 5 10 7 Y 12 13 9 18 7 9
14 16 10 12 7 13 14 19 10 16 12 16 19 11
Ho :
=0
H1 :
p < 0.01 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is less than 0.01) 0.01 is the level of significance (or )
Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK
SPSS output
Coefficients(a)
Model Unstandardized Coefficients B Std. Error Standardized Coefficients Beta t Sig.
5.231 .654
a b
1.746 .142
.736
2.996 4.615
.008 .000
Y = b X + a Y = 0.654 X + 5.231
Step 4: Make a decision by applying the criterion for rejecting the null hypothesis
From the SPSS output, p = 0.000 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is 0.000) Therefore, reject Ho because p < 0.01
Logical reasoning is a statistically significant predictor of creativity in the population, t (19) = 4.615, p < .01 The regression equation is as follows: Y = 0.654 X + 5.231
(df = N-1)
Coefficient of determination, r2
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate
.736(a)
.542
.517
2.545
r2 = 0.542
r 2 = 0.542 54.2% of the variance in creativity scores can be associated with (explained by) the variance in logical reasoning scores [Or only 45.8% of the variance in creativity scores cannot be associated with (explained by) the variance in logical reasoning scores]
Example 2:
The following table shows the order in which 14 students turn in their test papers and their test scores Test whether the order in which students turn in their test papers is a statistically significant predictor of their test scores
Test score 80 65 88 63 75 65 60 85 87 75 76 72 85 60
Ho :
=0
(The order in which students turn in their test papers is not a statistically significant predictor of their test scores in the population)
H1 :
(The order in which students turn in their test papers is a statistically significant predictor of their test scores in the population)
p < 0.05 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is less than 0.05) 0.05 is the level of significance (or )
Create a file for the data set Click Analyze > Regression > Linear Click on the Y variable and click the to place it in the Dependent: box Click on the X variable and click the to place it in the Independent(s): box Click OK
SPSS output
Model B 1 (Constant) The order in which students turn in their test papers, X 74.033 -.004 Unstandardized Coefficients
Coefficients(a)
Sig.
.000 .995
Step 4: Make a decision by applying the criterion for rejecting the null hypothesis
From the SPSS output, p = 0.995 (The probability of committing a Type I error that is, the likelihood of rejecting the null hypothesis when it is true is 0.995) Therefore, fail to reject Ho because p > 0.05
The order in which students turn in their test papers is not a statistically significant predictor of their test scores in the population, t (13) = -.006, p > .05 (That is, knowledge of the order in which students turn in their test papers does not enhance the prediction of their test scores)
Coefficient of determination, r
Model Summary
Model R R Square
Adjusted R Square
.002(a)
.000
-.083
10.520
a Predictors: (Constant), The order in which students turn in their test papers, X
r2 = 0.000
r 2 = 0.000 0% of the variance in the test scores can be associated with (explained by) the variance in the order in which students turn in their test papers [Or 100% of the variance in the test scores cannot be associated with (explained by) the variance in the order in which students turn in their test papers]