Вы находитесь на странице: 1из 33

`|`` ~:u~| lc `|`` ~:u~| lc `|`` ~:u~| lc `|`` ~:u~| lc [.c z|| [.c z|| [.c z|| [.

[.c z||- -- -(e:e~| [cc|cn (e:e~| [cc|cn (e:e~| [cc|cn (e:e~| [cc|cn
Using SPSS version 14
Joel Elliott, Jennifer Burnaford, Stacey Weiss

SPSS is a program that is very easy to learn and is also very powerful. This manual is designed to
introduce you to the program however, it is not supposed to cover every single aspect of SPSS.
There will be situations in which you need to use the SPSS Help Menu or Tutorial to learn how to
perform tasks which are not detailed in here. You should turn to those resources any time you have
questions.

The following document provides some examples of common statistical tests used in Ecology. To
decide which test to use, consult your class notes, your Statistical Roadmap or the Statistics Coach
(under Help menu in SPSS).

(([[[` (([[[` (([[[` (([[[`
Data entry p. 2
Descriptive statistics p. 4
Examining assumptions of parametric statistics
Test for normality p. 5
Test for homogeneity of variances p. 6
Transformations p. 7
Comparative Statistics 1: Comparing means among groups
Comparing two groups using parametric statistics
Two-sample t-test p. 8
Paired T-test p. 10
Comparing two groups using non-parametric statistics
Mann Whitney U test p. 11
Comparing three or more groups using parametric statistics
One-way ANOVA and post-hoc tests p. 13
Comparing three or more groups using non-parametric statistics
Kruskal-Wallis test p. 15
For studies with two independent variables
Two-way ANOVA p. 17
ANCOVA p. 20
Comparative Statistics 2: Comparing frequencies of events
Chi Square Goodness of Fit p. 23
Chi Square Test of Independence p. 24
Comparative Statistics 3: Relationships among continuous variables
Correlation (no causation implied) p. 26
Regression (causation implied) p. 27
Graphing your data
Simple bar graph p. 30
Clustered bar graph p. 31
Box plot p. 32
Scatter plot p. 32
Printing from SPSS p. 33

[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z zz z
[~r~ [:rn [~r~ [:rn [~r~ [:rn [~r~ [:rn

Start SPSS and when the first box appears for What would you like to do? click the button for
Type in data.
A spreadsheet will appear. The set-up here is similar to Excel, but at the bottom of the window
you will notice two tabs. One is Data View. The other is Variable View. To enter your
data, you will need to switch back and forth between these pages by clicking on the tabs.

/ J~r~ e:rn e~|e. / J~r~ e:rn e~|e. / J~r~ e:rn e~|e. / J~r~ e:rn e~|e.
Suppose you are part of a biodiversity survey group working in the Galapagos Islands and you are
studying marine iguanas. After visiting a couple of islands you think that there may be higher
densities of iguanas on island A than on island B. To examine this hypothesis, you decide to
quantify the population densities of the iguanas on each island. You take 20 transects (100 m
2
) on
each island (A and B), counting the number of iguanas in each transect. Your data are shown below.

A 12 13 10 11 12 12 13 13 14 14 14 14 15 15 15 16 14 12 14 14
B 15 13 16 10 9 24 13 18 14 16 15 19 14 16 17 15 17 22 15 16

First define the variables to be used. Go to Variable View of the SPSS Data Editor window as
shown below.



The first column (Name) is where you name your variables. For example, you might name one
Location (you have 2 locations in your data set, Island A and Island B). You might name the
other one Density (this is your response variable, number of iguanas).
Other important columns are the Type, Label, Values, and Measure.
o For now, we will keep Type as Numeric but look to see what your options are. At
some point in the future, you may need to use one of these options.
o The Label column is very helpful. Here, you can expand the description of your variable
name. In the Name column you are restricted by the number & type of characters you can
use. In the Label column, there are no such restrictions. Type in labels for your iguana
data.
o In the Values column, you can assign numbers to represent the different locations (so
Island A will be 1 and Island B will 2). To do this, you need to assign Values to
your categorical explanatory variable. Click on the cell in the Values column, and click
on the that shows up. A dialog box will appear as below. Type in 1 in the value
cell and A in the value label cell, and then hit Add. Type in 2 in the value cell and
B in the value label cell. Hit Add again. Then Hit OK.

[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`


o In the Measure column, you can tell the computer what type of variables these are. In this
example, island is a categorical variable. So in the Location row, go to the measure
column (the far right) and click on the cell. There are 3 choices for variable types. You
want to pick Nominal. Iguana density is a continuous variable... since scale (meaning
continuous) is the default condition, you dont need to change anything.
Now switch to the Data View. You will see that your columns are now titled Location and
Density.
To make the value labels appear in the spreadsheet pull down the View menu and choose Value
Labels. The labels will appear as you start to enter data.
You can now enter your data in the columns. Each row is a single observation. Since you have
chosen View Value Labels and entered your Location value labels in the Variable View
window, when you type 1 in the Location column, the letter A will appear. After youve
entered all the values for Island A, enter the ones from Island B below them. The top of your
data table will eventually look like this:



[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
- -- -
[e-c [e-c [e-c [e-c.r..e -r~r.-r.c-. .r..e -r~r.-r.c-. .r..e -r~r.-r.c-. .r..e -r~r.-r.c-.
/ // / cu.c| ..e. cl ncu J~r~ .: r~|e ~:J ~|.c~| lc~r- cu.c| ..e. cl ncu J~r~ .: r~|e ~:J ~|.c~| lc~r- cu.c| ..e. cl ncu J~r~ .: r~|e ~:J ~|.c~| lc~r- cu.c| ..e. cl ncu J~r~ .: r~|e ~:J ~|.c~| lc~r-

Once you have the data entered, you want to summarize the trends in the data. There a variety of
statistical measures for summarizing your data, and you want to explore your data by making tables
and graphs. To help you do this you can use the Statistics Coach found under the Help menu in
SPSS, or you can go directly to the Analyze menu and choose the appropriate tests.

To get a quick view of what your data look like:
Pull down the Analyze menu and choose Descriptive statistics, then Frequencies. A new
window will appear. Put the Density variable in the box, then choose the statistics that you want
to use to explore your data by the clicking on the Statistics and Charts buttons at the bottom of
the box (e.g., mean, median, mode, standard deviation, skewness, kurtosis). This will produce
summary statistics for the whole data set. Your results will show up in a new window.
SPSS can also produce statistics and plots for each of the islands separately. To do this, you
need to split the file. Pull down the Data menu and choose Split File. Click on Organize
output by groups and then select the Island [Location] variable as shown below. Click OK.


Now, if you repeat the Analyze Descriptive statistics Frequencies steps and hit Okay
again, your output will now be similar to the following for each Island.
Statistics(a)

Density
Valid
20
N
Missing
0
Mean
13.3500
Median
14.0000
Mode
14.00
Std. Deviation
1.49649
Variance
2.239
Skewness
-.463
Std. Error of Skewness
.512
Kurtosis
-.045
Std. Error of Kurtosis
.992
Range
6.00
Minimum
10.00
Maximum
16.00
a Island = A

Statistics(b)

Density
Valid
20
N
Missing
0
Mean
15.7000
Median
15.5000
Mode
15.00(a)
Std. Deviation
3.46562
Variance
12.011
Skewness
.475
Std. Error of Skewness
.512
Kurtosis
1.302
Std. Error of Kurtosis
.992
Range
15.00
Minimum
9.00
Maximum
24.00
a Multiple modes exist. The smallest value is shown
b Island = B
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`

10.00 12.00 14.00 16.00
Density
0
1
2
3
4
5
6
7
F
r
e
q
u
e
n
c
y
Mean = 13.35
Std. Dev. = 1.49649
N = 20
Island: A
Histogram

9.00 12.00 15.00 18.00 21.00 24.00
Density
0
2
4
6
8
10
F
r
e
q
u
e
n
c
y
Mean = 15.70
Std. Dev. = 3.46562
N = 20
Island: B
Histogram


From these summary statistics you can see that the mean density of iguanas on Island A is
smaller than that on Island B. Also, the variation patterns of the data are different on the two
islands as shown by the frequency distributions of the data and their different dispersion
parameters. In each histogram, the normal curve indicates the expected frequency curve for a
normal distribution with the same mean and standard deviation as your data. The range of data
values for Island A is lower with a lower variance and kurtosis. Also, the distribution of Island
A is skewed to the left whereas the data for Island B is skewed to the right.
You could explore your data more by making box plots, stem-leaf plots, and error bar charts.
Use the functions under the Analyze and Graphs menus to do this.
After getting an impression of what your data look like you can now move on to determine
whether there is a significant difference between the mean densities of iguanas on the two
islands. To do this we have to use comparative statistics.

NOTE: Once you are done looking at your data for the two islands separately, you need to unsplit
the data. Go to Data Split File and select Analyze all cases, do not create groups.



[~.:.: [~.:.: [~.:.: [~.:.: r|e ~--ur.c:- r|e ~--ur.c:- r|e ~--ur.c:- r|e ~--ur.c:- cl ~~er.c -r~r.-r.c- cl ~~er.c -r~r.-r.c- cl ~~er.c -r~r.-r.c- cl ~~er.c -r~r.-r.c-


As you know, parametric tests have two main assumptions: 1) approximately normally distributed
data, and 2) homogeneous variances among groups. Lets examine each of these assumptions.

e-r lc :c~|.rn e-r lc :c~|.rn e-r lc :c~|.rn e-r lc :c~|.rn

Before you conduct any parametric tests you need to check that the data values come from an
approximately normal distribution. To do this, you can compare the frequency distribution of
your data values with those of a normalized version of these values (See Descriptive Statistics
section above). If the data are approximately normal, then the distributions should be similar. From
your initial descriptive data analysis you know that the distributions of data for Island A and B did
not appear to fit an expected normal distribution perfectly. However, to objectively determine
whether the distribution varies significantly from a normal distribution you have to conduct a
normality test. This test will provide you with a statistic that determines whether your data are
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`

significantly different from normal. The null hypothesis is that the distribution on your data is NOT
different from a normal distribution.
For the marine iguana example, you want to know if the data from Island A population are
normally distributed and if the data from Island B are normally distributed. Thus, your data
must be split. (Data Split File Organize output by groups split by Location) Dont
forget to unsplit when you are done!
To conduct a statistical test for normality on your split data, go to Analyze Nonparametric
Tests 1 Sample K-S. In the window that appears, put the response variable (in this case,
Density) variable into the box on the right. Click Normal in the Test Distribution check box
below. Then click OK.
A output shows a Komolgorov-Smirnov (K-S) table for the data from each island. Your p-value
is the last line of the table: Asymp. Sig. (2-tailed).
If p>0.05 (i.e., there a greater than 5% chance that your null hypothesis is true), you should
conclude that the distribution of your data is not significantly different from a normal
distribution.
If p<0.05 (i.e., there is a less than 5% chance that your null hypothesis is true), you should
conclude that the distribution of your data is significantly different from normal. Note: always
look at the p-value. Dont trust the test distribution is normal note below sometimes that
lies.
If your data are not normal, you should inspect them for outliers which can have a strong effect
on this test. Remove the extreme outliers and try again. If this does not work, then you must
either transform your data so that they are normally distributed, or use a nonparametric test.
Both of these options are discussed later.

One-Sample Kolmogorov-Smirnov Test(c)

Density
N
20
Mean
13.3500
Normal
Parameters(a,b) Std. Deviation
1.49649
Absolute
.218
Positive
.132
Most Extreme
Differences
Negative
-.218
Kolmogorov-Smirnov Z
.975
Asymp. Sig. (2-tailed)
.298
a Test distribution is Normal.
b Calculated from data.
c Island = A
One-Sample Kolmogorov-Smirnov Test(c)

Density
N
20
Mean
15.7000
Normal
Parameters(a,b) Std. Deviation
3.46562
Absolute
.166
Positive
.166
Most Extreme
Differences
Negative
-.120
Kolmogorov-Smirnov Z
.740
Asymp. Sig. (2-tailed)
.644
a Test distribution is Normal.
b Calculated from data.
c Island = B

For the iguana example, you should find that the data for both populations are not significantly
different from normal (p > 0.05). With a sample size of only N=20 the data would have to be
skewed much more or have some large outliers to vary significantly from normal.
If your data are not normally distributed, you should try to transform the data to meet this
important assumption. (See below.)

e e e e-r lc |cce:e.rn cl .~.~:ce- -r lc |cce:e.rn cl .~.~:ce- -r lc |cce:e.rn cl .~.~:ce- -r lc |cce:e.rn cl .~.~:ce-

Another assumption of parametric tests is that the variances of each of the groups that you are
comparing have relatively similar variances. Most of the comparative tests in SPSS will do this test
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`

for you as part of the analysis. For example, when you run a t-test, the output will include columns
labeled Levenes test for Equality of Variances. The p-value is labeled Sig. and will tell you
whether or not your data meet the assumption of parametric statistics.

If the variances are not homogeneous, then you must either transform your data (e.g., using a log
transformation) to see if you can equalize the variances, or you can use a nonparametric comparison
test that does not require this assumption.

V|~r Jc ncu Jc .l ncu V|~r Jc ncu Jc .l ncu V|~r Jc ncu Jc .l ncu V|~r Jc ncu Jc .l ncu J~r~ J~r~ J~r~ J~r~ Jc :cr eer r|e ~--ur.c:- Jc :cr eer r|e ~--ur.c:- Jc :cr eer r|e ~--ur.c:- Jc :cr eer r|e ~--ur.c:-
~:-lc ~:-lc ~:-lc ~:-lc~r.c:- ~r.c:- ~r.c:- ~r.c:-

If your data do not meet one or both of the above assumptions of parametric statistics, you may be
able to transform the data so that they do. You can use a variety of transformations to try and make
the variances of the different groups equal or normalize the data. If the transformed data meet the
assumptions of parametric statistics, you may proceed by running the appropriate test on the
transformed data. If, after a number of attempts, the transformed data do not meet the assumptions
of parametric statistics, you must run a non-parametric test.

If the variances were not homogeneous, look at how the variances change with the mean. The usual
case is that larger means have larger variances. If this is the case, a transformation such as common
log, natural log or square root often makes the variances homogeneous.

Whenever your data are percents (e.g., % cover) they will generally not be normally distributed. To
make percent data normal, you should do an arcsine-square root transformation of the percent data
(percents/100).

To transform your data:
Go to Transform Compute. You will get the Compute Variable window.
In the Target Variable box, you want to name your new transformed variable (for example,
Log_Density).
There are 3 ways you can transform your data. 1) using the calculator, 2) choosing functions
from lists on the right, or 3) typing the transformation in the Numeric Expression box.
For this example: In the Function Group box on the right, highlight Arithmetic by clicking on it
once. Various functions will show up in the Functions and Special Variables box below.
Choose the LG10 function. Double click on it.
In the Numeric Expression box, it will now say LG10[?]. Double-click on the name of the
variable you want to transform (e.g., Density) in the box on the lower left to make Density
replace the ?.
Click Ok. SPSS will create a new column in your data sheet that has log-values of the iguana
densities.
NOTE: you might want to do a transformation such as LN (x + 1). Follow the directions as
above but choose LN instead of LG10 from the Functions and Special Variables box. Move
your variable in the parentheses to replace the ?. Then type in +1 after your variable so it
reads, for example, LN[Density+1].
NOTE: for the arcsine-square root transformation, the composite function to be put into the
Numeric Expression box would look like: arcsin(sqrt(percent data/100)).

[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
s ss s
After your transform your data, redo the tests of normality and homogeneity of variances to see if
the transformed data now meet the assumptions of parametric statistics.

Again, if your data now meet the assumptions of the parametric test, conduct a parametric statistical
test using the transformed data. If the transformed data still do not meet the assumption, you can do
a nonparametric test instead, such as a Mann-Whitney U test on the original data. This test is
described later in this handout.


(c~~r..e `r~r.-r.c- |. (c~~r..e `r~r.-r.c- |. (c~~r..e `r~r.-r.c- |. (c~~r..e `r~r.-r.c- |.
( (( (c~. c~. c~. c~.: e~:- : e~:- : e~:- : e~:- ~c: cu- ~c: cu- ~c: cu- ~c: cu-


(c~.: (c~.: (c~.: (c~.: r.c r.c r.c r.c cu- u-.: ~~er.c -r~r.-r.c- cu- u-.: ~~er.c -r~r.-r.c- cu- u-.: ~~er.c -r~r.-r.c- cu- u-.: ~~er.c -r~r.-r.c-. r.c -~ . r.c -~ . r.c -~ . r.c -~|e : ~.eJ r |e : ~.eJ r |e : ~.eJ r |e : ~.eJ r- -- -re-r- re-r- re-r- re-r-

.c .c .c .c- -- --~|e r -~|e r -~|e r -~|e r- -- -re-r re-r re-r re-r

This test compares the means from two groups, such as the density data for the two different iguana
populations. To run a two-sample t-test on the data:
First, be sure that your data are unsplit. (Data Split File Analyze all cases, do not create
groups.)
Then, go to Analyze Compare Means Independent Samples T-test.
Put the Density variable in the Test Variable(s) box and the Location variable in the Grouping
Variable box as shown below.


Now, click on the Define Groups button and put in the names of the groups in each box as shown
below. The click Continue and OK.



[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`

The output consists of two tables

Group Statistics

Island N Mean Std. Deviation
Std. Error
Mean
A
20 13.3500 1.49649 .33462
Density
B
20 15.7000 3.46562 .77494


Independent Samples Test
4.234 .047 -2.784 38 .008 -2.35000 .84410 -4.05879 -.64121
-2.784 25.847 .010 -2.35000 .84410 -4.08557 -.61443
Equal variances
assumed
Equal variances
not assumed
Density
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)
Mean
Difference
Std. Error
Difference Lower Upper
95% Confidence
Interval of the
Difference
t-test for Equality of Means

The first table shows the means and variances of the two groups. The second table shows the
results of the Levenes Test for Equality of Variances, the t-value of the t-test, the degrees of
freedom of the test, and the p-value which is labeled Sig. (2-tailed).

Before you look at the results of the t-test, you need to make sure your data fit the
assumption of homogeneity of variances. Look at the columns labeled Levenes test for
Equality of Variances. The p-value is labeled Sig..

In this example the data fail the Levenes Test for Equality of Variances, so the data will have to
be transformed in order to see if we can get it to meet this assumption of the t-test. If you log-
transformed the data and re-ran the test, youd get the following output.

Group Statistics

Island N Mean Std. Deviation
Std. Error
Mean
Log_Density A
20 1.1228 .05052 .01130
B
20 1.1856 .09817 .02195

Independent Samples Test
2.642 .112 -2.547 38 .015 -.06288 .02469 -.11286 -.01290
-2.547 28.404 .017 -.06288 .02469 -.11342 -.01234
Equal variances
assumed
Equal variances
not assumed
Log_Density
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)
Mean
Difference
Std. Error
Difference Lower Upper
95% Confidence
Interval of the
Difference
t-test for Equality of Means

Now the variances of the two groups are not significantly different from each other (p =0.112)
and you can focus on the results of the t-test. For the t-test, p=0.015 (which is <0.05) so you can
conclude that the two means are significantly different from each other. Thus, this statistical
test provides strong support for your original hypothesis that the iguana densities varied
significantly between Island A and Island B.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
|o |o |o |o
WHAT TO REPORT: Following a statement that describes the patterns in the data, you should
parenthetically report the t-value, df, and p. For example: Iguanas are significantly more dense
on Island B than on Island A (t=2.5, df=38, p<0.05).


|~.eJ |~.eJ |~.eJ |~.eJ - -- -re-r re-r re-r re-r

You should analyze your data with a paired t-test only if you paired your samples during data
collection. This analysis tests to see if the mean difference between samples in a pair is = 0. The
null hypothesis is that the difference is not different from zero.

For example, you may have done a study in which you investigated the effect of light intensity on
the growth of the plant Plantus speciesus. You took cuttings from source plants and for each source
plant, you grew 1 cutting in a high light environment and 1 cutting in a low-light environment. The
other conditions were kept constant between the groups. You measured growth by counting the
number of new leaves grown over the course of your experiment.

Your data look like this:
Plant 1 2 3 4 5 6 7 8 9 10
Low
Light
2 4 1 3 2 5 4 1 3 4
High
Light
3 6 2 4 5 6 5 2 5 5

Enter your data in 2 columns named Low and High. Each row in the spreadsheet should
have a pair of data. In Variable View, leave the Measure column on Scale. Leave Values as
None.
Go to Analyze Compare Means Paired Samples T-test.
Highlight both of your variables and hit the arrow to put them in the Paired-Variables box.
They will show up as Low-High. Hit OK. The following output should be produced.
The output consists of 3 tables
Paired Samples Statistics
2.9000 10 1.37032 .43333
4.3000 10 1.49443 .47258
Low Light
High Light
Pair
1
Mean N Std. Deviation
Std. Error
Mean

Paired Samples Correlations
10 .884 .001 Low Light & High Light Pair 1
N Correlation Sig.

Paired Samples Test
-1.40000 .69921 .22111 -1.90018 -.89982 -6.332 9 .000 Low Light - High Light Pair 1
Mean Std. Deviation
Std. Error
Mean Lower Upper
95% Confidence
Interval of the
Difference
Paired Differences
t df Sig. (2-tailed)


[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
|| || || ||
The first table shows the summary statistics for the 2 groups. The second table shows
information that you can ignore. The third table, the Paired Samples Test table, is the one you
want. It shows the mean difference between samples in a pair, the variation of the differences
around the mean, your t-value, your df, and your p-value (labeled as Sig (2-tailed)). In this case,
the P-value reads 0.000, which means that it is very low it is smaller than the program will
show in the default 3 decimal places. You can express this in your results section as p<0.001.
WHAT TO REPORT: Following a statement that describes the patterns in the data, you should
parenthetically report the t-value, df, and p. For example: Plants in the high light treatment
added significantly more leaves than their counterpart plants in the low light treatment (t=6.3,
df=9, p<0.001).


(c~.: r (c~.: r (c~.: r (c~.: r.c cu- u-.: .c cu- u-.: .c cu- u-.: .c cu- u-.: :c: :c: :c: :c:- -- -~~er.c ~~er.c ~~er.c ~~er.c -r~r.-r.c-. l~:: -r~r.-r.c-. l~:: -r~r.-r.c-. l~:: -r~r.-r.c-. l~::- -- -V|.r:en | re-r V|.r:en | re-r V|.r:en | re-r V|.r:en | re-r

The t-test is a parametric test, meaning that it assumes that the sample mean is a valid measure of
center. While the mean is valid when the distance between all scale values is equal, it's a problem
when your test variable is ordinal because in ordinal scales the distances between the values are
arbitrary. Furthermore, because the variance is calculated using squared deviations from the mean,
it too is invalid if those distances are arbitrary. Finally, even if the mean is a valid measure of
center, the distribution of the test variable may be so non-normal that it makes you suspicious of any
test that assumes normality.
If any of these circumstances is true for your analysis, you should consider using the nonparametric
procedures designed to test for the significance of the difference between two groups. They are
called nonparametric because they make no assumptions about the parameters of a distribution, nor
do they assume that any particular distribution is being used.
A Mann-Whitney U test doesnt require normality or homogeneous variances, but it is slightly less
powerful than the t-test (which means the Mann-Whitney U test is less likely to show a significant
difference between your two groups). So, if you have approximately normal data, then you should
use a t-test.

To run a Mann-Whitney U test:
Go to Analyze Nonparametric tests 2 Independent samples and a dialog box will appear.
Put the variables in the appropriate boxes, define your groups, and confirm that the Mann-
Whitney U test type is checked. Then click OK.


[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
|z |z |z |z
The output consists of two tables. The first table shows the parameters used in the calculation of
the test. The second table shows the statistical significance of the test. The value of the U
statistic is given in the 1
st
row (Mann-Whitney U). The p-value is labeled as Asymp. Sig. (2-
tailed).

Ranks

Island N Mean Rank Sum of Ranks
A
20 15.08 301.50
B
20 25.93 518.50
Density
Total
40


Test Statistics(b)

Density
Mann-Whitney U 91.500
Wilcoxon W 301.500
Z -2.967
Asymp. Sig. (2-
tailed)
.003
Exact Sig. [2*(1-
tailed Sig.)]
.003(a)
a Not corrected for ties.
b Grouping Variable: Island

In the table above (for the marine iguana data), the p-value = 0.003, which means that the
densities of iguanas on the two islands are significantly different from each other (p < 0.05). So,
again this statistical test provides strong support for your original hypothesis that the iguana
densities are significantly different between the islands.
WHAT TO REPORT: Following a statement that describes the patterns in the data, you should
parenthetically report the U-value, df, and p. For example: Iguanas are significantly more dense
on Island B than on Island A (U=91.5, df=39, p<0.01).


(c~.-c:- cl r|ee c ce cu- (c~.-c:- cl r|ee c ce cu- (c~.-c:- cl r|ee c ce cu- (c~.-c:- cl r|ee c ce cu- u-.: u-.: u-.: u-.: ~~er.c ~~er.c ~~er.c ~~er.c -r~r.-r.c- -r~r.-r.c- -r~r.-r.c- -r~r.-r.c-. . . .
(:e (:e (:e (:e- -- -.~n /[(\/ ~:J |c-r .~n /[(\/ ~:J |c-r .~n /[(\/ ~:J |c-r .~n /[(\/ ~:J |c-r- -- -|cc re-r- |cc re-r- |cc re-r- |cc re-r-

Lets now consider parametric statistics that compare three or more groups of data.

To continue the example using iguana population density data, lets add data from a series of 16
transects from a third island, Island C. Enter these data into your spreadsheet at the bottom of the
column Density.

Density (100 m
2
)
Island C: 15 13 10 14 12 12 13 13 14 14 11 14 15 12 15 16

To enter the Location for Island C, you must first edit the Value labels by going to Variable View:
add a third Value (3) and Value label (C). Then, back on Data View, type a 3 into the last cell of the
Location column, and copy the C and paste it into the rest of the cells below.

The appropriate parametric statistical test for continuous data with one independent variable and
more than two groups is the One-way analysis of variance (ANOVA). It tests whether there is a
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
| | | |
significant difference among the means of the groups, but does not tell you which means are
different from each other. In order to find out which means are significantly different from each
other, you have to conduct post-hoc paired comparisons. They are called post-hoc, because you
conduct the tests after you have completed an ANOVA and it shows where significant differences lie
among the groups. One of the Post-hoc tests is the Fisher PLSD (Protected Least Sig. Difference)
test, which gives you a test of all pairwise combinations.

To run the ANOVA test:
Go to Analyze Compare Means One-way ANOVA.
In the dialog box put the Density variable in the Dependent List box and the Location variable in
the Factor box.
Click on the Post Hoc button and then click on the LSD check box and then click Continue.
Click on the Options button and check 2 boxes: Descriptive and Homogeneity of variance test.
Then click Continue and then OK.


The output will include four tables Descriptive statistics, results of the Levene test, the results
of the ANOVA, and the results of the post-hoc tests.
The first table gives you some basic descriptive statistics for the three islands.
Descriptives
Density
20 13.3500 1.49649 .33462 12.6496 14.0504 10.00 16.00
20 15.7000 3.46562 .77494 14.0780 17.3220 9.00 24.00
16 13.3125 1.62147 .40537 12.4485 14.1765 10.00 16.00
56 14.1786 2.63616 .35227 13.4726 14.8845 9.00 24.00
A
B
C
Total
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval for
Mean
Minimum Maximum

The second table gives you the results of the Levene Test (which examines the assumption of
homogeneity of variances). You must assess the results of this test before looking at the
results of your ANOVA.
Test of Homogeneity of Variances
Density
3.237 2 53 .047
Levene
Statistic df1 df2 Sig.

[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
|- |- |- |-
In this case, your variances are not homogeneous (p<0.05), the data do not meet one of the
assumptions of the test. Thus, and you cannot proceed to using the results of the ANOVA
comparisons of means. You have two main choices of what to do. You can either transform
your data to attempt to make the variances homogeneous or you may run a test that does not
require homogeneity of variances (a non-parametric test like Welchs Test for three or more
groups).
First, try transforming the data for each population (try a log transformation), and then run the
test again. The following tables are for the log transformed data.

Descriptives
Log_Density
20 1.1228 .05052 .01130 1.0991 1.1464 1.00 1.20
20 1.1856 .09817 .02195 1.1397 1.2316 .95 1.38
16 1.1211 .05472 .01368 1.0919 1.1503 1.00 1.20
56 1.1447 .07729 .01033 1.1240 1.1654 .95 1.38
A
B
C
Total
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval for
Mean
Minimum Maximum

Test of Homogeneity of Variances
Log_Density
1.902 2 53 .159
Levene
Statistic df1 df2 Sig.

Now your variances are homogeneous (p>0.05), and you can continue with the assessment of the
ANOVA.
The third table gives you the results of the ANOVA test, which examined whether there were
any significant differences in mean density among the three island populations of marine
iguanas.
ANOVA
Log_Density
.052 2 .026 4.989 .010
.277 53 .005
.329 55
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.

Look at the p-value in the ANOVA table (Sig.). If this p-value is > 0.05, then there are no
significant differences among any of the means. If the p-value is < 0.05, then at least one mean
is significantly different from the others. In this example, p = 0.01 in the ANOVA table, and
thus p < 0.05, so the mean densities are significantly different.
Now that you know the means are different, you want to find out which pairs of means are
different from each other. e.g., is the density on Island A greater than B? Is it greater than C?
How do B & C compare with each other?
The Post Hoc tests, Fisher LSD (Least Sig. Difference), allow you to examine all pairwise
comparisons of means. The results are listed in the fourth table. Which groups are and are not
significantly different from each other? Look at the Sig. column for each comparison. B is
different from both A and C, but A and C are not different from each other.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
| | | |
Multiple Comparisons
Dependent Variable: Log_Density
LSD
-.06288* .02284 .008 -.1087 -.0171
.00166 .02423 .946 -.0469 .0503
.06288* .02284 .008 .0171 .1087
.06453* .02423 .010 .0159 .1131
-.00166 .02423 .946 -.0503 .0469
-.06453* .02423 .010 -.1131 -.0159
(J) Island
B
C
A
C
A
B
(I) Island
A
B
C
Mean
Difference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.
*.

WHAT TO REPORT: Following a statement that describes the general patterns in the data,
you should parenthetically report the F-value, df, and p from the ANOVA. Following statements
that describe the differences between specific groups, you should report the p-value from the
post-hoc test only. (NOTE: there is no F-value or df associated with the post-hoc tests only a
p-value!) For example: Iguana density varies significantly across the three islands (F=5.0,
df=2,53, p=0.01). Iguana populations on Island B are significantly more dense than on Island A
(p<0.01) and on Island C (p=0.01), but populations on Islands A and C have similar densities
(p>0.90).


(c~.-c:- cl r|ee c ce cu- (c~.-c:- cl r|ee c ce cu- (c~.-c:- cl r|ee c ce cu- (c~.-c:- cl r|ee c ce cu- u-.: :c: u-.: :c: u-.: :c: u-.: :c:- -- -~~er.c ~~er.c ~~er.c ~~er.c -r~r.-r.c- -r~r.-r.c- -r~r.-r.c- -r~r.-r.c-. . . .
ku-|~| ku-|~| ku-|~| ku-|~|- -- -V~||.- re-r- V~||.- re-r- V~||.- re-r- V~||.- re-r-

Like a Mann-Whitney U test was a non-parametric version of a t-test, a Kruskal-Wallis test is the
non-parametric version of an ANOVA. The test is used when you want to compare three or more
groups of data, and those data do not fit the assumptions of parametric statistics even after
attempting standard transformations. Remind yourself of the assumptions of parametric statistics
and the downside of using non-parametric statistics by reviewing the information on Page 11.

To run the Kruskal-Wallis test:
Go to Analyze Nonparametric Tests K Independent Samples.
Note: Remember for the Mann-Whitney U test, you went to Nonparametric tests
2 Independent Samples. Now you have more than 2 groups, so you go to K Independent
Samples instead, where K is just standing in for any number or more than 2.
Put your variables in the appropriate boxes, define your groups, and be sure Kruskal-Wallis box
is clicked on in the Test Type box. Click OK.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
| | | |

The output consists of two tables. The first table shows the parameters used in the calculation of
the test. The second table shows you the statistical results of the test. As you will see, the test
statistic that gets calculated is a chi-square value and it is reported in the first row of the second
table. The p-value is labeled as Asymp. Sig. (2-tailed).
Ranks
20 23.15
20 38.20
16 23.06
56
Location
A
B
C
Total
density
N Mean Rank

Test Statistics(a,b)

density
Chi-Square
11.279
df
2
Asymp. Sig.
.004
a Kruskal Wallis Test
b Grouping Variable: Location

In the table above, the p-value = 0.004, which means that the densities on the three islands are
significantly different from each other (p < 0.01). So, this test also supports the hypothesis that
iguana densities differ among islands. We do not yet know which islands are different from
which other ones.
Unlike an ANOVA, a Kruskal-Wallis test does not have an easy way to do post-hoc analyses.
So, if you have a significant effect for the overall Kruskal-Wallis, you can follow that up with a
series of two-group comparisons using Mann-Whitney U tests. In this case, we would follow up
the Kruskal-Wallis with three Mann-Whitney U tests: Island A vs. Island B, Island B vs. Island
C, and Island C vs. Island A.
WHAT TO REPORT: Following a statement that describes that general patterns in the data,
you should parenthetically report the chi-square value, df, and p. For example: Iguana density
varied significantly across the three islands (
2
=11.3, df=2, p=0.004).


[c -ruJ.e- ..r| r.c [c -ruJ.e- ..r| r.c [c -ruJ.e- ..r| r.c [c -ruJ.e- ..r| r.c .:Jee:Je:r .:Jee:Je:r .:Jee:Je:r .:Jee:Je:r .~.~|e-. .~.~|e-. .~.~|e-. .~.~|e-. .c .c .c .c- -- -.~n /[(\/ .~n /[(\/ .~n /[(\/ .~n /[(\/ : /[((\/ : /[((\/ : /[((\/ : /[((\/

In many studies, researchers are interested in examining the effect of >1 independent variable (i.e.,
factors) on a given dependent variable. For example, say you want to know whether the bill size
of finches is different between males and females of two different species. In this example, you
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
| | | |
have two factors (Species and Sex) and both are categorical. They can be examined simultaneously
in a Two-way ANOVA, a parametric statistical test. The two-way ANOVA will also tell you
whether the two factors have joint effects on the dependent variable (bill size), or whether they act
independently of each other (i.e., does bill size depend on sex in one species but not in the other
species?).

What if we wanted to know, for a single species, how sex and body size affect bill size? We still
have two factors, but now one of the factors is categorical (Sex) and one is continuous (Body Size).
In this case, we need to use an ANCOVA an analysis of covariance.

Both tests require that the data are normally distributed and all of the groups have homogeneous
variances. So you need to check these assumptions first. If you want to compare means from two
(or more) grouping variables simultaneously, as ANOVA and ANCOVA do, there is no satisfactory
non-parametric alternative. So you may need to transform your data.


.c .c .c .c- -- -V~n /[(\/ V~n /[(\/ V~n /[(\/ V~n /[(\/

Enter the data as shown to the right:
The two factors (Species and Sex) are put in two
separate columns. The dependent variable (Bill length)
is entered in a third column.





Before you run a two-way ANOVA, you might want to first
run a t-test on bill size just between species, then a t-test on
bill size just between sexes. Note the results. Do you think
these results accurately represent the data? This exercise
will show you how useful a two-way ANOVA can be in
telling you more about the patterns in your data.





Now run a two-way ANOVA on the same data. The procedure is much the same as for a One-way
ANOVA with one added step to include the second variable to the analysis.

Go to Analyze General Linear Model
Univariate. A dialog box appears as below.
Your dependent variable goes in the Dependent
Variable box.
Your explanatory variables are Fixed Factors
Now click Options. A new window will appear.
Click on the check boxes for Descriptive
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
|s |s |s |s
Statistics and Homogeneity tests, then click Continue.














Click OK. The output will consist of three tables which show descriptive statistics, the results of
the Levenes test and the results of the 2-way ANOVA.
From the descriptive statistics, it appears that the means may be different between the sexes and
also different between species.
Descriptive Statistics
Dependent Variable: Bill size
17.60 1.140 5
23.00 1.581 5
20.30 3.129 10
26.60 2.074 5
16.60 2.074 5
21.60 5.621 10
22.10 4.999 10
19.80 3.795 10
20.95 4.478 20
Species
Species A
Species B
Total
Species A
Species B
Total
Species A
Species B
Total
Sex
Female
Male
Total
Mean Std. Deviation N

From this second table, you know that your data meet the assumption of homogeneity of
variance. So, you are all clear to interpret the results of your 2-way ANOVA.
Levene's Test of Equality of Error Variances
a
Dependent Variable: Bill size
1.193 3 16 .344
F df1 df2 Sig.
Tests the null hypothesis that the error variance of the
dependent variable is equal across groups.
Design: Intercept+Sex+Species+Sex * Species
a.

[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
| | | |
The ANOVA table shows the statistical significance of the differences among the means for each
of the independent variables (i.e., factors or main effects. Here, they are Sex and Species) and
the interaction between the two factors (i.e., Sex * Species). Lets walk through how to interpret
this information
Tests of Between-Subjects Effects
Dependent Variable: Bill size
331.350
a
3 110.450 35.629 .000
8778.050 1 8778.050 2831.629 .000
8.450 1 8.450 2.726 .118
26.450 1 26.450 8.532 .010
296.450 1 296.450 95.629 .000
49.600 16 3.100
9159.000 20
380.950 19
Source
Corrected Model
Intercept
Sex
Species
Sex * Species
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .870 (Adjusted R Squared = .845)
a.


Always look at the interaction term FIRST. The p-value of the interaction term tells you
the probability that the two factors act independently of each other and that different
combinations of the variables have different effects. In this bill-size example, the interaction
term shows a significant sex*species interaction (p < 0.001). This means that the effect of
sex on bill size differs between the two species. Simply looking at sex or species on their
own wont tell you anything.
To get a better idea of what the interaction term means, make a Bar Chart with error bars.
See the graphing section of the manual for instructions on how to do this.

If you look at the data, the interaction
should become apparent. In Species
A, bills are larger in males than in
females, but in Species B, bills are
larger in females than in males. So
simply looking at sex doesnt tell us
anything (as you saw when you did
the t-test) and neither sex has a
consistently larger bill when
considered across both species.
The main effects terms in a 2-way
ANOVA basically ignore the
interaction term and give similar
results to the t-tests you may have
performed earlier. So, the p-value
associated with each independent
variable (i.e., factor or main effect)
tells you the probability that the means of the different groups of that variable are the same. So,
if p < 0.05, the groups of that variable are significantly different from each other. In this case, it
tests whether males and females are different from each other disregarding the fact that we
have males and females from two different species in our data set. And it tests whether the two
species are different from each other disregarding the fact that we have males and females
from each species in our data set.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
zo zo zo zo
The two-way ANOVA found that species was significant if you ignore the interaction. This
suggests that species A has larger bills overall, mainly because of the large size of the males of
Species A, but does not always have larger bills because bill size also depends gender.
WHAT TO REPORT:
If there is a significant interaction term, the significance of the main effects cannot
be fully accepted because of differences in the trends among different combinations of
the variables. Thus, you only need to tell your reader about the interaction term of the
ANOVA table. Describe the pattern and parenthetically report the appropriate F-value,
df, and p). For example: The way that sex affected bill size was different for the two
different species (F=95.6, df=1,16, p<0.001). (Often, a result like this would be followed
up with two separate t-tests.)
If the interaction term is not significant, then the statistical results for the main
effects can be fully recognized. In this case, you need to tell your reader about the
interaction term and about each main effect term of the ANOVA table. Following a
statement that describes the general patterns for each of these terms, you should
parenthetically report the appropriate F-value, df, and p. For example: Growth rates of
the both invasive and native grass species were significantly higher at low population
densities than at high population densities (F=107.1, df=1,36, p<0.001). However, the
invasive grass grew significantly faster than the native grass at both populations densities
(F=89.7, df=1,36, p<0.001). There is no interaction between grass species and
population densities on growth rate (F=1.2, df=1,36, p>0.20).


/[ /[ /[ /[( (( ((\/ (\/ (\/ (\/

Remember, ANCOVA is used when you have 2 or more independent variables that are a mixture of
categorical and continuous variables. Our example here is a study investigating the effect of gender
(categorical) and body size (continuous) on bill size in a species of bird. Your data must be
normally distributed and have homogeneous variances to use this parametric statistical test.

Enter the data as shown to the right:
The two factors (Species and Body Size) are put in two separate
columns. The dependent variable (Bill size) is entered in a third
column.

To run the ANCOVA:
Go to Analyze General Linear Model Univariate as you
did for the two-way ANOVA.
Put your dependent variable in the Dependent Variable box.
Put your categorical explanatory variable in the Fixed
Factor(s) box.
Put your continuous explanatory variable in the Covariate(s)
box.
Click on Options. A new window will appear. Click on the
check boxes for Descriptive Statistics and Homogeneity tests,
then click Continue.
Click on Model. A new window will appear. At the top middle
of the pop-up window, specify the model as Custom instead
of Full factorial. Highlight one of the factors shown on the left side of the pop-up window
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z| z| z| z|
(under Factors & Covariates) and click the arrow button. That variable should now show up
on the right side (under Model). Do the same with the second factor. Now, highlight the two
factors on the right simultaneously and click the arrow, making sure the option is set to
interaction. In the end, your Model pop-up window should look something like the image
below:














Click Continue and then click OK. The output will consist of four tables which show the
categorical (between-subjects) variable groupings, some descriptive statistics, the results of the
Levenes test and the results of the ANCOVA.
From the first and second table, it appears that males and females have similarly sized bills.

Between-Subjects Factors

Value Label N
1.00
male 8
sex
2.00
female 8

Descriptive Statistics

Dependent Variable: bill_size
sex Mean
Std.
Deviation N
male
21.2625 1.70791 8
female
21.6500 2.24817 8
Total
21.4563 1.93906 16

From the third table, you know that the data meet the assumption of homogeneity of variance.
So, you are clear to interpret the results of the ANCOVA (assuming your data are normal).

Levene's Test of Equality of Error Variances(a)

Dependent Variable: bill_size
F df1 df2 Sig.
.237 1 14 .634
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
a Design: Intercept+sex+body_size+sex * body_size


The ANCOVA results are shown in an ANOVA table which is interpreted similar to the table
from the two-way ANOVA. You can see the statistical results regarding the two independent
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
zz zz zz zz
variables (factors) and the interaction between the two factors (i.e., Sex * Body_size) are shown
on three separate rows of the table below.

Tests of Between-Subjects Effects

Dependent Variable: bill_size
Source
Type III Sum
of Squares df Mean Square F Sig.
Corrected Model
48.612(a) 3 16.204 24.970 .000
Intercept
10.555 1 10.555 16.265 .002
sex
.278 1 .278 .428 .525
body_size
44.322 1 44.322 68.299 .000
sex * body_size
.141 1 .141 .217 .649
Error
7.787 12 .649
Total
7422.330 16
Corrected Total
56.399 15
a R Squared = .862 (Adjusted R Squared = .827)

As with the 2-way ANOVA, you must interpret the interaction term FIRST. In this example,
the interaction term shows up on the ANOVA table as a row labeled sex*body_size and it tells
you whether or not the way that body size affects bill size is the same for males as it is for
females. The null hypothesis is that body size does affect bill size the same for each of the two
sexes. In other words, the null hypothesis is that the two factors (body size and sex) do not
interact in the way they affect bill size.
Here, you can see that the interaction term is not significant (p=0.649). Therefore, you can go on
to interpret the two factors independently. You can see that there is no effect of Sex on bill size
(p=0.525). And, you can see that there is an effect of Body Size on bill size (p<0.001).
Lets see how this looks graphically. Make a scatterplot with the dependent variable (Bill Size)
on the y-axis and the continuous independent variable (Body Size) on the x-axis. To make the
Male and Female data show up as different shaped symbols on your graph, move the categorical
independent variable (Sex) into the box labeled Style as shown below:


male

femal e
sex
10.00 11.00 12.00 13.00 14.00
body_size
19.00
20.00
21.00
22.00
23.00
24.00
b
i
l
l
_
s
i
z
e

[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z z z z
From the figure you can see 1) that the way that body size affects bill size is the same for males
as it is for females (i.e., there is no interaction between the two factors), that males and females
do not differ in their mean bill size (there is clear overlap in the distributions of male and female
bill sizes), and 3) that body size and bill size are related to each other (as body size increase, bill
size also increases).
WHAT TO REPORT:
If there is a significant interaction term, the significance of the main effects cannot
be fully accepted because of differences in the trends among different combinations of
the variables. Thus, you only need to tell your reader about the interaction term from the
ANOVA table. Describe the pattern and parenthetically report the appropriate F-value,
df, and p). For example: The way that prey size affected energy intake rate was different
for large and small fish (F=95.6, df=1,16, p<0.001). (Typically, a result like this would
be followed up with two separate regressions (see pg. 27 below) one for large fish and
one for small fish.)
If the interaction term is not significant, then the statistical results for the main
effects can be fully recognized. In this case, you need to tell your reader about the
interaction term and about each main effect term of the ANOVA table. Following a
statement that describes the general patterns for each of these terms, you should
parenthetically report the appropriate F-value, df, and p. For example: Males and
females have similar mean bill sizes (F=0.4, df=1,12, p>050), and for both sexes, bill size
increases as body size increases (F=68.3, df=1,12, p<0.001). There is no interaction
between gender and body size on bill size (F=0.2, df=1,12, p>0.60).



(c~~r..e `r~r.-r.c- (c~~r..e `r~r.-r.c- (c~~r..e `r~r.-r.c- (c~~r..e `r~r.-r.c- z zz z. .. .
( (( (c~. c~. c~. c~.: lecue:c : lecue:c : lecue:c : lecue:c.e- .e- .e- .e- cl e.e:r- cl e.e:r- cl e.e:r- cl e.e:r-


(|. `cu~e /:~|n-.- (|. `cu~e /:~|n-.- (|. `cu~e /:~|n-.- (|. `cu~e /:~|n-.- ~ ~~ ~ (ccJ:e-- cl [.r e-r (ccJ:e-- cl [.r e-r (ccJ:e-- cl [.r e-r (ccJ:e-- cl [.r e-r

This test allows you to compare observed to expected values within a single group of test subjects.
For example: Are guppies more likely to be found in predator or non-predator areas?

You are interested in whether predators influence guppy behavior. So you put guppies in a tank that
is divided into a predator-free refuge and an area with predators. The guppies can move between the
two sides, but the predators can not. You count how many guppies were in the predator area and in
the refuge after 5 minutes.

Here are your data: number of guppies
in predator area in refuge
4 16

Your null hypothesis for this test is that guppies are evenly distributed between the 2 areas.

To perform the Chi-Square Goodness of fit test:
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z- z- z- z-
Open a new data file in SPSS
In Variable View, name the first variable Location. In the Measure column, choose
Ordinal. Assign 2 values: one for Predator Area and one for Refuge. Then create a second
variable called Guppies. In the Measure column, choose Scale.
In Data View, enter the observed number of guppies in the 2 areas.
Go to Data Weight Cases. In the window that pops up, click on Weight Cases by and select
Guppies. Hit OK.
Go to Analyze Nonparametric Tests Chi-square.
Your test variable is Location.
Under Expected Values click on Values. Enter the expected value for the refuge area first, hit
add then enter the expected value for the predator area and hit add. Hit OK.
In the Location Table, check the values to make sure the test did what you thought it was going
to do. Are the observed and expected numbers for the 2 categories correct?
Your Chi-Square value, df, and p-value are displayed in the Test Statistics Table.

NOTE: Once you are done with this analysis, you will likely want to stop weighting cases. Go to
Data Weight Cases and select Do not weight cases.

WHAT TO REPORT: You want to report the
2
value, df, and p, parenthetically, following a
statement that describes the patterns in the data.


(|. `cu~e /:~|n-.- (|. `cu~e /:~|n-.- (|. `cu~e /:~|n-.- (|. `cu~e /:~|n-.- ~ ~~ ~ e-r cl [:Jee:Je:ce e-r cl [:Jee:Je:ce e-r cl [:Jee:Je:ce e-r cl [:Jee:Je:ce

If you have 2 different test subject groups, you can compare their responses to the independent
variable. For example, you could ask the question: Do female guppies have the same response to
predators as male guppies?

The chi-square test of independence allows you to determine whether the response of your 2 groups
(in this case, female & male guppies) is the same or is different.

You are interested in whether male and female guppies have different responses to predators. So
you test 10 male and 10 female guppies in tanks that are divided into a predator-free refuge and an
area with predators. Guppies can move between the areas predators can not. You count how
many guppies were in the predator area and in the refuge after 5 minutes.

Here are the data: number of guppies
in predator area in refuge
male guppies 1 9
female guppies 3 7

Your null hypothesis is that guppy gender does not affect response to predators or in other words,
that there will be no difference in the response of male and female guppies to predators. Or in other
words you predict that the effect of predators will not depend on guppy gender.

To perform the test in SPSS:
In Variable View, set up two variables: Gender and Location. Both are categorical, so they
must be Nominal, and you need to set up Values.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z z z z
Enter your data in 2 columns. Each row is a single fish.
Go to Analyze Descriptive Statistics Crosstabs.
In the pop-up window, move one of your variables into the Rows window and the other one
into the Column window.
Click on the Statistics button on the bottom of the Crosstabs window, then click Chi-square
in the new pop-up window.
Click Continue, then Okay.
Your output should look like this:
Case Processing Summary
Cases
Valid Missing Total

N Percent N Percent N Percent
Gender * Location 20 100.0% 0 .0% 20 100.0%


Gender * Location Crosstabulation
Location

predators refuge
Total
male 1 9 10 Gender
female 3 7 10
Total 4 16 20

Chi-Square Tests
Value df
Asymp. Sig. (2-
sided)
Exact Sig. (2-
sided)
Exact Sig. (1-
sided)
Pearson Chi-Square 1.250(b) 1 .264
Continuity Correction(a) .313 1 .576
Likelihood Ratio 1.297 1 .255
Fisher's Exact Test .582 .291
Linear-by-Linear
Association
1.188 1 .276
N of Valid Cases 20
a Computed only for a 2x2 table
b 2 cells (50.0%) have expected count less than 5. The minimum expected count is 2.00.

How to interpret your output:
Ignore the 1st table.
The second table (Gender*Location Crosstabulation) has your observed values for each
category. You should check this table to make sure your data were entered correctly.
In this example, the table correctly reflects that there were 10 of each type of fish, and that 1
male and 3 females were in the predator side of their respective tanks.
In the 3rd table, look at the Pearson Chi-Square line. Your Chi-square value is
2
= 1.250.
Your p-value is p = 0.264. This suggests that the response to predators was not different
between male and female guppies.

WHAT TO REPORT: You want to report the
2
value, df, and p, parenthetically, following a
statement that describes the patterns in the data. For example: Male and female guppies did not
differ in their response to predators (chi-square test of independence,
2
=1.25, df=1, p>0.20).
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z zz z
Regardless of gender, more guppies fed in the refuge areas than in the predator areas. Ninety
percent of males and seventy percent of females fed in the refuge areas.


(c~~r..e `r~r.-r.c- (c~~r..e `r~r.-r.c- (c~~r..e `r~r.-r.c- (c~~r..e `r~r.-r.c- . .. .
|e|~r.c:-|.- ~c: cc:r.:ucu- .~.~|e- |e|~r.c:-|.- ~c: cc:r.:ucu- .~.~|e- |e|~r.c:-|.- ~c: cc:r.:ucu- .~.~|e- |e|~r.c:-|.- ~c: cc:r.:ucu- .~.~|e-


[c c~u-~r.c: .|.eJ. [c c~u-~r.c: .|.eJ. [c c~u-~r.c: .|.eJ. [c c~u-~r.c: .|.eJ. (ce|~r.c: (ce|~r.c: (ce|~r.c: (ce|~r.c:

If the values of two variables appear to be related to one another, but one is not dependent on the
other, they are considered to be correlated. For example, fish weight and egg production are
generally correlated, but neither variable is dependent on the other. No causation is implied,
meaning we have no reason to suspect that fish weight causes egg number or vice versa.

The correlation coefficient, r, provides a quantitative measurement of how closely two variables are
related. It ranges from 0 (no correlation) to 1 or -1 (the two variables are perfectly related,
positively or negatively).

Lets examine the correlation between bird weight and bill length, using the data displayed below.

Bird #
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

bird weight (g) 15 13 10 14 12 12 9 17 14 14 11 13 16 12 15
bill length (mm) 43 45 35 41 42 39 39 47 44 48 41 43 42 45 45

Enter the data above in a new spreadsheet and name the columns Weight and Length.
To visualize what the correlation represents, make a scatterplot of the data. For instructions, go
to the graphing section of this manual.
The bird data listed above looks like this when graphed:

From this plot you can see that as weight increases
there is also an increase in bird bill length. Thus, these
two variables appear to be correlated.










To quantify the extent of the correlation and see if it is statistically significant:
Go to Analyze Correlate Bivariate.
10 12 14 16
Bird weight (g)
36
40
44
48
B
i
l
l

l
e
n
g
t
h

(
m
m
)


[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z z z z
In the dialog box, move your 2 variables to the box on the right. Click on the check box for
Pearson. Click OK.
Correlations
1 .666**
. .007
15 15
.666** 1
.007 .
15 15
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Weight
Length
Weight Length
Correlation is significant at the 0.01 level
(2-tailed).
**.

The first row in your correlation table gives you the Pearson correlation coefficient (r). In this
example, r = 0.666, which shows there is a positive correlation between Weight and Length.
The results of the statistical test shows that it is a statistically significant correlation, (p = 0.007
which is <0.05, and the results ** indicate that the Correlation is significant at the 0.01 level.
This means that the slope is significantly different from zero. In other words, there is a strong
relationship between the two variables.
WHAT TO REPORT: Following a statement that describes the patterns in the data, you should
parenthetically report the r, df, and p. For example: Larger birds have significantly longer bills
(r=0.67, df=13, p<0.001).


(~u-~r.c: .|.eJ. (~u-~r.c: .|.eJ. (~u-~r.c: .|.eJ. (~u-~r.c: .|.eJ. | || |ee- ee- ee- ee--.c: -.c: -.c: -.c:

Regressions and correlations both test whether two variables are related to each other, and if so, how
closely they are related. Regression is used when you suspect that the two variables are causally
related, such that variation in one is causing the variation in the other. Regression is also used when
you want to know the degree to which a change in one variable can predict a change in the other.

A simple example is a one-to-one relationship between two variables, such as the relationship
between the age and the number of growth rings of a tree. Another example is the relationship
between the age and length of a fish.

Age (years) Length (cm)

1 12.2
2 14.3
3 15.7
4 16.1
5 18.8
6 19.0
7 20.4

The data consist of a value for the independent variable (x) and the associated value for the
dependent variable (y). Think of these as on an x-axis and a y-axis. In our example, given the age
of the fish (x) one can predict its length (y). Generally, the independent variable (x) is controlled or
standardized by the investigator, and the y variable is dependent on the value of x.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
zs zs zs zs

A regression calculates the equation of the best fitting straight line through the (x,y) points that the
data pairs define. In the equation of a line (y = a + bx), a is the y-intercept (where x=0) and b is the
slope. The output of a regression will give you estimates for both of these values. If we wanted to
predict the length of a fish at a given age, we could do so using the regression equation that best fits
these data.

Enter the data above into a new spreadsheet and
name the two data columns Age and Length.
To visualize the relationship between these two
variables, make a scatterplot of the data. See the
graphing section of this manual for instructions on
making scatterplots.

The graph shows that there is a strong positive
relationship between fish Age and Length. The
equation is for the regression line that best
describes the relationship between the two
variables.
The R-square (R
2
) value is the coefficient of
determination, and can be interpreted as the
proportion of the variation in the dependent
variable that is explained by variation in the
independent variables. R
2
ranges from 0 to 1. If it is close to 1, it means that your independent
variable has explained almost all of why the value of your dependent variable differs from
observation to observation. If R
2
is close to 0, it means that they have explained almost none of
the variation in your dependent variable.
In this example it appears that 97% of the variation in the Length is explained by variation in
Age.
Now what you want to do is determine whether the relationship is statistically significant.

To run a regression analysis:
Go to Analyze Regression Linear.
Your response variable goes in the Dependent variable box.
Your explanatory variable goes in the Independent variable box.
The output contains four tables The first table simply tells you what variables were used in
what way.

Variables Entered/Removed(b)

Model
Variables
Entered
Variables
Removed Method
1 Age(a) . Enter
a All requested variables entered.
b Dependent Variable: Length


The model summary table provides the basic data for the analysis, along with the R
2
value.

Model Summary

2 4 6
Age (years)
12
14
16
18
20
L
e
n
g
t
h

(
c
m
)

Length (cm) = 11.34 + 1.33 * Age


R-Square = 0.97
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z z z z
Model R R Square
Adjusted R
Square
Std. Error of
the Estimate
1
.984(a) .969 .963 .56208
a Predictors: (Constant), Age

The next table is an ANOVA table, and in fact, a regression analysis is very similar to an
ANOVA. (If the independent variable is categorical you use an ANOVA and if it is continuous
you use a regression.) The results of the ANOVA table indicate whether the relationship
between the two variables is significant. Here, p < 0.001 (in the Sig. column), so we can
conclude that age is a significant predictor of length.
ANOVA
b
49.157 1 49.157 155.597 .000
a
1.580 5 .316
50.737 6
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Age
a.
Dependent Variable: Length
b.

The fourth table contains the Regression Coefficients which are the estimates of y-intercept (in
the row titled Constant) and the slope (in the row titled Age). A regression analysis tests
whether the y-intercept and slope of the best fit line are each significantly different from zero.
The p-value for each row allows you to assess this. If the p-value for the y-intercept is less than
0.05, then the y-intercept is significantly different from zero. If the p-value for the slope is less
than 0.05, then the slope is significantly different from zero.

Coefficients
a
11.343 .475 23.878 .000
1.325 .106 .984 12.474 .000
(Constant)
Age
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Length
a.


From the output, we can see the very high R
2
value reveals that 97% of the variation in length
(dependent variable) can be explained by variation in age (independent variable).
NOTE: There is no p-value associated with an R
2
!
The very low p-value (Sig. = 0.000) in the ANOVA table indicates that the relationship is highly
significant, and thus very unlikely to occur by chance alone.
The output also indicates that the y-intercept (Constant) and the slope (Age) are significantly
different from zero.
These statistics support the strong relationship that is evident in the scatterplot shown on the
previous page.
In a paper describing your results you would include a scatterplot of your data along with the
equation for the regression line.
WHAT TO REPORT: Following a statement that describes the patterns in the data, you should
parenthetically report the F, df, and p from the ANOVA, as well as the R
2
value. Remember that
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
o o o o
there is no p-value associated with an R
2
! For example: Fish age can significantly predict fish
length (F=155.6, df=1,5, p<0.001; R
2
=0.97).


(~|.: ncu J~r~. (~|.: ncu J~r~. (~|.: ncu J~r~. (~|.: ncu J~r~.


For studies that use parametric statistics to compare means of different groups, you will want
to present those data in the form of a bar graph that shows means and some measure of variation
around the means (standard deviation or standard error).

For studies that use non-parametric statistics to compare groups, the mean is likely not a good
representation of the typical member of the population, so instead you will want to present your
data in the form of a box plot which shows medians and interquartile ranges.

For studies that compare the relationships between two (or more) continuous variables, you
will want to present those data in the form of a scatterplot.


`.|e `.|e `.|e `.|e ~ ~|- ~ ~|- ~ ~|- ~ ~|-

First, make sure your data are set up to be graphed:
Make sure that your data file is not split. (Go to Data Split File and select Analyze all
cases, do not create groups)
Go to Variable View and give your data Labels (in the Labels column) if you havent
already. Make sure that you include units for all quantities. For the iguana example, you
might give Density the label Number of Iguanas. Or if you were graphing % cover,
you could label it lichen % cover.
Identify your categorical variable (e.g. site, location, etc.) as being Nominal (in the
Measure column). Leave the response variable as Scale.
In the Decimals column, change the number of decimals associated with each scale
variable so that they accurately reflect the precision of your measurement.
Go to Graphs Interactive Bar.
On the right of the pop-up window is a graphic which represents your two axes. You want to
define the variables for each axis by dragging them into the appropriate cells.
Grab your categorical variable from the left window and drag to the x axis.
Grab your response variable and drag to the y axis. The default value for the Y axis is
Count (for some unknown reason).
REMEMBER: you might do the analysis on transformed data, but you always want to
graph raw data.
At the bottom, check to make sure that under Bars Represent it says Means
Choose the Bar Chart Options tab. Make sure neither box is checked under Bar Labels.
Choose the Error Bars tab. Click on the box for Display Error Bars. Under Confidence
Intervals, instead of Confidence Interval for Mean select Standard Error of the Mean as
your units. Make sure the box directly under the pull-down menu says 1.0.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
| | | |
Island A Island B
Location
0
5
10
15
D
e
n
s
i
t
y

o
f

i
g
u
a
n
a
s

Figure 1. Mean (+SE) density of iguanas on two small


islands in the Galapagos archipelago. Island B has
significantly more iguanas per unit area than Island A
(t=2.5, df=38, p<0.05).
Choose the Titles tab. Do not put a title or subtitle, but DO put in the complete text of your
figure legend in the box labeled caption. Read the note at the bottom of the Titles tab window
and do as it suggests!
Hit OK.

To pretty it up:
For presentation in a paper, you dont want
any information on the sides you want
info about error bars and sample size in the
caption. Double click on any floating
information. Unclick the box that says
display key.
Double click on a bar and see your options
for bar color, fill, width, etc. Resist your
temptation to use fancy colors, etc. As a
rule, you should try to keep your figures as
simple as possible so stick to black,
white, and/or hatched bars if possible.

Sometimes you will want to put letters over
your bars (for example, to show the results of
post-hoc tests from an ANOVA).
Double click on your figure to select it.
Click on the Text Tool on the left hand
menu bar (a line w/ an a next to it).
Click above a bar and type the appropriate letter.
Do this for all of your bars.
You might need to change the y axis so your letter will fit above the bars. To do that, double
click on the y axis. The scale axis window will pop up. Unclick the Auto box next to
Maximum and manually change the value to what you need it to be. Click OK.


(|u-reeJ ~ ~|- (|u-reeJ ~ ~|- (|u-reeJ ~ ~|- (|u-reeJ ~ ~|-

If you have several different data categories (for example, if you were doing a 2-way ANOVA) you
would want to use a clustered bar graph in which you plot the mean number of the response variable
found in each level of your multiple categories.

Prepare your data for graphing as you did above.
Go to Graphs Interactive Bar. Drag the response variable to the y-axis and put one of the
independent variables on the x-axis. Now, drag the other factor and put it in the space for
Legend Variables: Color. Keep the selection to the right of this space as Cluster.
For the bird bill size by species and sex example on page 18, Species was put on the x-
axis and Sex was put in the space for Legend Variables: Color.
Follow the same directions as above for what to do on the Bar Chart Options, Error Bars, and
Titles tabs. Hit OK.
You will get a graph which has different colors for different groups. As with a regular bar graph,
make sure you include standard error bars and that you get rid of floating stuff.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`
z z z z
Island A Island B
Location
12
16
20
24
D
e
n
s
i
t
y

o
f

i
g
u
a
n
a
s

Keep the key that shows which bars are identified by which colors! Make sure you use colors /
patterns that will print well in black and white. (Two dark solid colors will not work).


[c |cr [c |cr [c |cr [c |cr- -- -

Box plots shows the median, the interquartile ranges, and any outliers in the data. This is a common
way to graphically represent non-parametric data.

Go to Graphs Interactive Boxplot.
Put your categorical variable on the x axis and your
response variable on the y.
Click on the Boxes tab in the window. Make sure that
the following boxes are all checked: Outliers,
Extremes, and Median Line.
Dont forget to add a caption.
To pretty it up, follow directions as above.

A boxplot has a number of nifty features. The line in
the middle of each box represents the median value of
the response variable in that category. The box covers
the middle 50% of observations in each category. The
whiskers outside the box extend between the highest
and lowest values in the sample that are within 1.5 box lengths from the edge of the box.
Individuals that are outside this limit are shown by circles.
What a boxplot can tell you: a) where the medians are in the 2 groups, b) how variable the
groups are. For example, iguana densities on Island B have a higher median value and are much
more variable than iguana densities on island A.


`c~rre|cr `c~rre|cr `c~rre|cr `c~rre|cr- -- -

Go to Graphs Interactive Scatterplot.
For a dataset being analyzed with correlation, it doesnt
really matter which variable goes on the x-axis and
which goes on the y-axis.
Click and drag the variables where you want
them. For the correlation example in this
manual, we put Length on the y-axis and Weight
on the x-axis.
Click OK.
For a dataset being analyzed with regression, you must
put the explanatory variable on the x-axis and the
response variable on the y-axis. In the regression
example in this manual, we put age on the x-axis and
length on the y-axis.
Click on the tab for Fit and for the method choose Regression
Click OK.
[.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--` [.c z|| .Ve.--`




|.:r.: lc `|`` |.:r.: lc `|`` |.:r.: lc `|`` |.:r.: lc `|``

To print table or graph output from SPSS, click on it, go to Print (under File or choose the icon) and
print the selection. Alternatively, you can copy it into MS Word and print from there. Some
students find it easier to produce figures in SPSS without figure legends, copy the figures to Word,
and add captions there using text boxes.

Вам также может понравиться