Вы находитесь на странице: 1из 3

Linear regression and the coeﬃcient of determination in SPSS

Bro. David E. Brown, BYU–Idaho Dept. of Mathematics February 2, 2012

To use the following instructions, your data need to be two numeric ( Scale) variables.

1. Start SPSS and enter your data or open your data ﬁle.

2. Make any necessary adjustments in the Variable View. Pay particular attention to the Measure-

ment levels of your variables.

3. To make

They must be scale for the following to work.

In the Graphs menu, click Legacy dialogs. A submenu will appear.

In the submenu, click Scatter/Dot, which is near the bottom.

The Scatter/Dot selector box appears.

Select Simple Scatter and click Define. The Simple Scatterplot dialog appears.

Move to the Y Axis: box the name of the variable you want on your vertical axis. (For association and correlation, it does not matter which variable you put on the y-axis and which you put on the x-axis.)

Move to the X Axis box the name of the variable you want on your horizontal axis.

to add a title, subtitle, and so on; click Continue to return

If you want, you can click Titles to the Simple Scatter dialog.

Click OK. The SPSS Output Viewer appears, with your scatterplot in it.

4. To calculate the regression coeﬃcients (slope and intercept) and the coeﬃcient of

In the Analyze menu, click Regression. A submenu will appear.

In the submenu, click Linear

Put in the Dependent: box the name of your response variable.

Put in the Independent(s): box the name of your explanatory variable.

Click Statistics

The Linear Regression dialog will appear.

Statistics dialog appears.

The Linear Regression:

To get the regression coeﬃcients (slope and intercept), make sure the box for Estimates has

a check in it.

To get the coeﬃcients of correlation and determination, make sure the box for Model fit has a check mark in it.

 – you want the mean and standard deviation of the explanatory and response variables, put If a check mark next to Descriptives.

Click Continue. SPSS returns you to the Linear Regression dialog.

To get a plot of residuals versus

Click Plots. The Linear Regression:

Move *ZRESID to the Y: box. (“ZRESID” stands for “Standardized residuals.” SPSS won’t plot the actual residuals, but their z-scores, which is better in some ways.)

Plots dialog appears.

1

Move *ZPRED to the X: box. (Again, SPSS will use the z-scores of the ﬁtted (or, predicted) values, and not the ﬁtted values themselves.)

You will see check boxes for Histogram and Normal probability plot. If your instructor requires you to check the requirements for linear regression, make sure there is a check mark next to these. SPSS will give you a histogram and a P-P plot (which is just like a Q-Q plot) of the residuals. (You’ll need these when deciding whether linear regression is appropriate for your data.)

Click Continue. SPSS returns you to the Linear Regression dialog.

If your instructor wants you to save the

Click

Put a check mark next to the type of residual your instructor has told you to save.

Click Continue. SPSS returns you to the Linear Regression dialog.

The Linear Regression: Save dialog appears.

OK. The SPSS Output

Viewer window appears.

First is the Descriptive Statistics table, with the means and standard deviations of your response and explanatory variables.

Next is a table of Correlations.

The ﬁrst set of rows gives you the correlation coeﬃcients for your variables.

The next set of rows gives you the corresponding P-values.

The ﬁnal set of rows tells you how many data points were used.

Next is a table called Variables Entered/Removed. Kindly ignore this table.

The Model Summary table is next.

Pearson’s correlation coeﬃcient is under the column heading R.

The coeﬃcient of determination is r 2 , and so is found in the R Square column.

Please ignore the Adjusted R Square column.

The standard error of the estimate is in the rightmost column.

An ANOVA table is next. Please ignore it.

Next is the Coefficients table.

Under Model you’ll see that the ﬁrst row is for the Constant, that is, the y-intercept, and the second row is for the explanatory variable. This really means that the second row is for the slope of the regression line.

Next is a pair of columns under Unstandardized Coefficients. The value of B in the ﬁrst row is the y-intercept of your regression line. The value of B in the second row is the slope of the regression line.

Under Std. Error are the standard errors of the y-intercept and slope, respectively.

Please ignore the Standardized Coefficients column.

The t column gives the t-scores for the tests of H 0 :

slope = 0,

intercept = 0 and H 0 :

respectively.

The rightmost column gives the P-values for the hypothesis tests just mentioned.

Next, if you asked for it, is the Residual Statistics table, giving the min., max., mean, and standard deviation of the predicted values, the residuals, and the standardized predicted values and residuals.

Next is the histogram-with-normal-curve for the standardized residuals, if you asked for it.

The normal P-P plot of the standardized residuals is next, if you asked for it. (This is just like a Q-Q plot.)

Finally, you’ll see the plot of residuals versus ﬁts (all standardized), if you requested it. It kind of looks like a scatterplot.

Page 2

In my classes, we now stop to check all the requirements for linear regression. That’s what the plots are for. We do not use the regression line for ANYTHING until we have veriﬁed that all the requirements are met.

As always, if you have questions, please ask them!

Page 3