Вы находитесь на странице: 1из 5

1

Using Excel:
Correlation and Regression
To find the correlation coefficient:
1. Click the fx button and use the correl function. Insert the x and y data as directed.
To make a scatter-plot with (or without) a least-squares line.
1. Select the x and y data columns. Use the control button to select non-adjacent columns of data.
2. Click insert from the menu bar and choose scatter (no lines).
3. Click anywhere on the graph and under chart tools select design.
4. Choose a design that has axis titles and edit these accordingly.
5. Right click any point on the graph and select add trendline
6. In the window that opens, check linear and display equation.
7. Edit the resulting line and equation to your preference.
8. Note: You can start by inserting a scatterplot and then adding the data. This is a little trickier
but if you follow the directions you can get to the same endpoint.
The following screen-shots show the step-by-step process.
Inserting a scatterplot:

Improving the layout of the plot:

Inserting a regression line and equation:

Conclusions:
1. If the data appears to be linearly related and
2. if there are no outliers that can mess up the regression equation and
3. if |r| is greater than the critical value of r from Table 4,
4. then you can use the regression equation (
y = m x + b) to make predictions about y given x.
Greater Correlation and Regression options are available in the Analysis ToolPak.
The Analysis ToolPak is available with all PC versions of Excel. Here is how to install the the
Analysis ToolPak for PCs (See Mac notes below).
1. Open a blank Excel spreadsheet.
2. Click on the windows icon (pre 2010) or the file tab (2010+).
3. Choose Excel Options (pre 2010) or just options (2010+).
4. Choose add-ins.
5. In manage (bottom of window), choose Excel Add-ins and click Go.
6. Check the box that says Analysis ToolPak and click OK.
7. After you load the Analysis ToolPak, the Data Analysis command is available under the Data tab.
It should be the far right option.
Mac Notes: As of this writing, if you are running Excel 2008 or higher on a Mac, the analysis ToolPak
is not available. There is an application called StatPlus:Mac LE which is a free version of the full
StatPlus application. It can handle most of the tasks performed by the Analysis ToolPak and in its
full version is probably superior - but that costs money.

To find the correlation coefficient(s) with the Data Analysis ToolPak.


1. Use the correlation option in the Data Analysis ToolPak. In this case you can put in an array
of columns and get back an array of correlation coefficients.
To use the regression option in the Data Analyis ToolPak: Example on Next Page
1. Open up the Excel file with the data in it, or open a new file and put the data in labeled columns.
2. Click on the Data Analysis section in the Data tab in the menu.
3. Choose the regression option, and a regression window will open.
4. Select the data for the y-values (response or dependent variable) include the label.
5. Select the data for the x-values (explanatory, predictor, or independent variable) include the label.
6. Check the box that says labels. Leave the confidence level and intercept is zero boxes unchecked.
7. Select an out-of-the-way cell to put the results. This will need some space.
8. Check the box that says residual plots and leave all other boxes unchecked.
9. Move the residual plot next to the residual output table.
10. Format Columns to Autofit Column Width.
11. It should look pretty good.
What does the output mean:
1. Multiple R is just the correlation coefficient.
2. R-square is just r2 .
3. Adjusted R-square is a more appropriate value when the data comes from a sample.
s

4. Standard Error is the standard error =

(y y)2
and is used in calculating a prediction
n2

interval of y for a given value of x.


5. Significance F denotes the P -value of the test statistic used in a hypothesis test with H0 : = 0.
If it is less than we can conclude that there is a significant linear correlation.
6. Coefficients: The first number in that column represents b (the y-intercept of the regression
equation). The second number in that column represents m (the slope of the regression equation).
Then the regression equation is y = m x + b.
7. The Residual Output table gives the residual (y y) for each value of x.
8. The Residual Plot displays a plot of the residuals with respect to each x-value. There should
be no discernable pattern in this plot. If there is, it means that the association is not linear and
hence linear regression is not a wise choice.
Conclusions
1. If the data appears linearly related, and the residual plot shows no pattern, and
2. if there are no outliers that can mess up the regression equation, and
3. if the Significance F value is less than (we generally use = 0.05),
4. then you can use the regression equation (
y = m x + b) to make predictions about y given x.

temperature (F)
53
62
57
71
78
66
84
87
96
91
94
96

Click: Data Data Analysis Regression


And you get this window to create output:

chirps(per minute)
20
32
40
60
80
100
120
140
160
180
200
220

SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

0.9357
0.8755
0.8631
25.2188
12

ANOVA
df
Regression
Residual
Total

1
10
11

SS
MS
44738.7783 44738.7783
6359.8884
635.9888
51098.6667

Coefficients
Standard Error
-204.2138
38.4764
4.0669
0.4849

Intercept
temperature (F)

F
Significance F
70.3452 0.0000077650

t Stat
P-value
-5.3075 0.00034
8.3872 7.8E-06

Lower 95%
Upper 95%
-289.9446
-118.4830
2.9865
5.1473

RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12

Predicted chirps(per minute)


11.33269663
47.9349333
27.60035737
84.53716997
113.0055763
64.20259404
137.4070674
149.6078129
186.2100496
165.8754737
178.0762192
186.2100496

Residuals
8.667303367
-15.9349333
12.39964263
-24.53716997
-33.00557627
35.79740596
-17.40706738
-9.607812933
-26.2100496
14.12452633
21.92378077
33.7899504

Correlation Coefficient
P-value of Test Statistic
y-intercept of regression line
slope of regression line

temperature (F) Residual Plot


40
30

Residuals

20
10
0
-10 0

20

40

60

80

-20
-30
-40

temperature (F)

There is a slight U-shaped pattern here so a linear fit might not be best.

100

120

Вам также может понравиться