Correlation & Regression

 Correlation
Correlations Box
Take a look at the first box in your output file called Correlations. You will see your
variable names in two rows. In this example, you can see the variable name ‘water’ in
the first row and the variable name ‘skin’ in the second row. You will also see your
two variable names in two columns. See the variable names ‘water’ and ‘skin’ in the
columns on the right? You will see four boxes on the right hand side. These boxes will
all contain numbers that represent variable crossings. For example, the top box on the
right represents the crossing between the ‘water’ variable and the ‘skin’ variable. The
bottom box on the left also happens to represent this crossing. These are the two boxes
that we are interested in. They will have the same information so we really only need
to read from one. In these boxes, you will see a value for Pearson’s r, a Sig. (2-tailed)
value and a number (N) value.
Sig (2-Tailed) value

You can find this value in the Correlations box. This value will tell you if there is a
statistically significant correlation between your two variables. In our example, our
Sig. (2-tailed) value is 0.002.
 If the Sig (2-Tailed) value is greater than 0.05: You can conclude that there is
no statistically significant correlation between your two variables. That means,
increases or decreases in one variable do not significantly relate to increases or
decreases in your second variable.
 If the Sig (2-Tailed) value is less than or equal to 0.05: You can conclude that
there is a statistically significant correlations between your two variables. That
means, increases or decreases in one variable do significantly relate to
increases or decreases in your second variable.
 Warning about the Sig (2-Tailed) value: When you are computing Pearson’s r,
significance is a messy topic. When you have small samples, for example only
a few participants, moderate correlations may misleadingly not reach
significance. When you have large samples, for example many participants,
small correlations may misleadingly turn out to be significant. Some
researchers think that significance should be reported but perhaps should
receive less focus when it comes to Pearson’s r.
The Sig. (2-Tailed) value in our example is 0.002. This value is less than .05. Because
of this, we can conclude that there is a statistically significant correlation between
amount of water consumed in glasses and participant rating of skin elasticity.
Pearson’s r
You can find the Pearson’s r statistic in the top of each box. The Pearson’s r for the
correlation between the water and skin variables in our example is 0.985.
 When Pearson’s r is close to 1: This means that there is a strong relationship
between your two variables. This means that changes in one variable are
strongly correlated with changes in the second variable. In our example,
Pearson’s r is 0.985. This number is very close to 1. For this reason, we can
conclude that there is a strong relationship between our water and skin
variables. However, we cannot make any other conclusions about this
relationship, based on this number.
 When Pearson’s r is close to 0: This means that there is a weak relationship
between your two variables. This means that changes in one variable are not
correlated with changes in the second variable. If our Pearson’s r were 0.01, we
could conclude that our variables were not strongly correlated.
 When Pearson’s r is positive (+): This means that as one variable increases in
value, the second variable also increase in value. Similarly, as one variable
decreases in value, the second variable also decreases in value. This is called a
positive correlation. In our example, our Pearson’s r value of 0.985 was
positive. We know this value is positive because SPSS did not put a negative
sign in front of it. So, positive is the default. Since our example Pearson’s r is
positive, we can conclude that when the amount of water increases (our first
variable), the participant skin elasticity rating (our second variable) also
increases.
 When Pearson’s r is negative (-): This means that as one variable increases in
value, the second variable decreases in value. This is called a negative
correlation. In our example, our Pearson’s r value of 0.985 was positive. But
what if SPSS generated a Pearson’s r value of -0.985? If SPSS generated a
negative Pearson’s r value, we could conclude that when the amount of water
increases (our first variable), the participant skin elasticity rating (our second
variable) decreases.
The scatterplot
You can find your scatterplot in your output file. It will look something like the graph
below. You will see a bunch of dots. Your scatterplot can tell you about the
relationship between variables, just like Pearson’s r. With it, you can determine the
strength and direction of the relationship between variables.
 Relationship strength: Try to imagine a line that connects the dots in your
scatterplot. Is this an easy or difficult task? This task can help you determine
the strength of the relationship between your two variables. If your variables
have a strong relationship, it will be easy for your to imagine a line connecting
all of the dots. For example, in our example scatterplots, the dots seem to go
together to form a straight line. However, some scatterplots do not look like
this. With some scatterplots, the dots are scattered about so that it is very hard
to imagine a line connecting them. The dots are not densely positioned in one
place. Instead, they are all over the place. When this is the case, your variables
may not have a strong relationship.
 Relationship direction: You can use your scatterplot to understand the direction
of your relationship. Your scatterplot can tell you if you have a positive,
negative or zero correlation.
 Positive correlation in a scatterplot: If the line that you imagine in your
graph slopes upward from zero, you can conclude that you have a positive
correlation between your variables. Increases in one variable are correlated
with increases in your other variable. Similarly, decreases in one variable
are correlated with decreases in your other variable.
 Negative correlation in a scatterplot: If the line that you imagine in your
graph starts high at zero and gradually slopes downward, you can conclude
that you have a negative correlation between your variables. Increases in
one variable are correlated with decreases in your other variable.
 Zero correlation in a scatterplot: If the line that you imagine does not slop,
or you can’t imagine a line at all, you can conclude that you have a zero
correlation between your variables. That means that your variables are not
related to one another. Increases or decreases in one variable have no effect
on increases or decreases in your second variable.
 Regression
Linear regression is the next step up after correlation. It is used when we want to
predict the value of a variable based on the value of another variable. The variable we
want to predict is called the dependent variable (or sometimes, the outcome variable).
The variable we are using to predict the other variable's value is called the
independent variable (or sometimes, the predictor variable). For example, you could
use linear regression to understand whether exam performance can be predicted based
on revision time; whether cigarette consumption can be predicted based on smoking
duration; and so forth. If you have two or more independent variables, rather than just
one, you need to use multiple regression.
SPSS Statistics will generate quite a few tables of output for a linear regression. In this
section, we show you only the three main tables required to understand your results
from the linear regression procedure, assuming that no assumptions have been
violated. A complete explanation of the output you have to interpret when checking
your data for the six assumptions required to carry out linear regression is provided in
our enhanced guide. This includes relevant scatterplots, histogram (with superimposed
normal curve), Normal P-P Plot, casewise diagnostics and the Durbin-Watson
statistic. Below, we focus on the results for the linear regression analysis only.
The first table of interest is the Model Summary table, as shown below:
This table provides the R and R2 values. The R value represents the simple correlation
and is 0.873 (the "R" Column), which indicates a high degree of correlation. The R 2
value (the "R Square" column) indicates how much of the total variation in the
dependent variable, Price, can be explained by the independent variable, Income. In
this case, 76.2% can be explained, which is very large.
The next table is the ANOVA table, which reports how well the regression equation
fits the data (i.e., predicts the dependent variable) and is shown below:
This table indicates that the regression model predicts the dependent variable
significantly well. How do we know this? Look at the "Regression" row and go to the
"Sig." column. This indicates the statistical significance of the regression model that
was run. Here, p < 0.0005, which is less than 0.05, and indicates that, overall, the
regression model statistically significantly predicts the outcome variable (i.e., it is a
good fit for the data).
The Coefficients table provides us with the necessary information to predict price
from income, as well as determine whether income contributes statistically
significantly to the model (by looking at the "Sig." column). Furthermore, we can use
the values in the "B" column under the "Unstandardized Coefficients" column, as
shown below:
to present the regression equation as: Price = 8287 + 0.564(Income)
Multiple regression
 Interpretation
Begin your interpretation by examining the "Descriptive Statistics" table. This table
often appears first in your output, depending on your version of SPSS. The descriptive
statistics will give you the values of the means and standard deviations of the variables
in your regression model. For example, a regression that studies the effect of years of
education and years of experience on average annual income will have the means and
standard deviations in your data for these three variables.
Turn your attention to the correlations table, which follows the descriptive statistics.
Correlations will measure the degree to which these variables are related. Correlations
range in value from zero to one. The higher the value, the greater the level of
correlation. The values can be positive or negative, signifying positive or negative
correlation.
Review the model summary, paying particular attention to the value of R-square. This
statistic tells you how much of the variation in the value of the dependent variable is
explained by your regression model. For example, regressing average income on years
of education and years of experience may produce an R-square of 0.36, which
indicates that 36 percent of the variation in average incomes can be explained by
variability in a person's education and experience.
Determine the linear relationship among the variables in your regression by examining
the Analysis of Variance (ANOVA) table in your SPSS output. Note the value of the F
statistic and its significance level (denoted by the value of "Sig."). If the value of F is
statistically significant at a level of 0.05 or less, this suggests a linear relationship
among the variables. Statistical significance at a .05 level means there is a 95 percent
chance that the relationship among the variables is not due to chance. This has become
the accepted significance level in most research fields.
Study the coefficients table to determine the value of the constant. This table
summarizes the results of your regression equation. Column B in the table gives the
values of your regression coefficients and the constant, which is the expected value of
the dependent variable when the values of the independent variables equal zero.
Study the values of the independent variables in the coefficients table. The values in
column B represent the extent to which the value of that independent variable
contributes to the value of the dependent variable. For example, a B of 800 for years
of education suggests that each additional year of education raises average income by
an average of $800 a year. The t-values in the coefficients table indicate the variable's
statistical significance. In general, a t-value of 2 or higher indicates statistical
significance.
 Multicollinearity
When predictor variables are highly (but not perfectly) correlated with one another,
the program may warn you of multicollinearity. This problem is associated with a lack
of stability of the regression coefficients. In this case, were you randomly to obtain
another sample from the same population and repeat the analysis, there is a very good
chance that the results (the estimated regression coefficients) would be very different.
Multicollinearity is a problem when for any predictor the R2 between that predictor
and the remaining predictors is very high. Upon request, SPSS will give you two
transformations of the squared multiple correlation coefficients. One is tolerance,
which is simply 1 minus that R 2. The second is VIF, the variance inflation factor,
which is simply the reciprocal of the tolerance. Very low values of tolerance (.1 or
less) indicate a problem. Very high values of VIF (10 or more, although some would
say 5 or even 4) indicate a problem. As you can see in the table below, we have no
multicollinearity problem here.

Correlation & Regression

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Correlation & Regression

Загружено:

Авторское право:

Доступные форматы

 Correlation

Sig (2-Tailed) value

to present the regression equation as: Price = 8287 + 0.564(Income)

Вам также может понравиться

Correlation &amp; Regression

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Correlation &amp; Regression

Загружено:

Авторское право:

Доступные форматы

 Correlation

Sig (2-Tailed) value

to present the regression equation as: Price = 8287 + 0.564(Income)

Вам также может понравиться

Correlation & Regression

Correlation & Regression