Вы находитесь на странице: 1из 29

How to Use and Apply SPSS

Version 15.0 for Windows

Benjamin Noblitt
Lead Tutor
Quantitative Skill Center

Notes

This mini guide/ walkthrough is for SPSS 15.0 for Windows and was written in
the Summer of 2010. If there are any changes/updates that apply afterwards, this
document will not address them. Additionally, if a previous version (pre 15.0) is
being used, this document might not fully apply. This said, the concepts behind
what is being explained should still be the same, so long as the functions of the
program remain similar.
This guide is intended to give the reader a VERY basic understanding on how
to use SPSS. This is also intended to be a crash course type of guide. The length of
this document is indicative of how in depth this document goes. Furthermore,
there is a lot that this document does not mention. If you want to perform a very
thorough analysis with very in depth statistics, you can read the SPSS survival
manual by Julie Pallant.

Table of Contents
Preparations
3
Getting Started 4
Entering Data
7
Output Window
9
Walkthrough
10
Analysis
10
Graphing
10
Regression 13
Correlation 14
Testing
15
One Sample T-Test 15
Independent Sample T-Test
17
Paired Sample T-Test
18
One Way ANOVA
20
Hypothesis Testing Crash Course 26

Preparations
Know that SPSS is for analyzing data from a statistical researchers
point of view. The name SPSS stands for Statistical Package for the
Social Sciences SPSS. This means that this program is best suited to
comparing data samples or surveys using statistics. Furthermore, this
is a very robust program that is much more difficult to use if you do not
know statistics very well; it is designed primarily for the researcher or
statistician. Essentially, if you need to use this program, you will need
to learn basic statistics and its application before using this program.
Set up a survey or data collection mechanism that is numerical in its
nature. SPSS works with numbers, so the data needs to be able to be
quantified. (If its qualitative, you will need to make it quantitative.)
Make sure you have a goal in your research. Have a question you want
to answer, and collect data that you think will help answer that
question. For example do you think there is a correlation between
stress level and red blood cell count? These are questions that can be
answered numerically, making SPSS a good candidate for analyzing
this data.

Getting Starting
1. Open up SPSS (may be PASW Statistics 18)
2. A screen will pop up that has 5 different options to choose from as follows:
Figure 1:

a. Run the Tutorial This is a very informative walkthrough made by


the authors of the program. This tutorial will only show you how to
operate the program. Whereas it is the intention of this walkthrough to
show you how to apply it.
b. Type in data This will be the most commonly used option. If this is
selected, you can type in the data manually.
c. Skip Run an existing query, and Create a new query using
Database Wizard

d. Open an existing data source Like the name implies, you can
open a file that already has data in it from this option.
3. Select Type in data. You will be presented with a screen that looks a lot like
an Excel page. At the top of the work-area (the area of white cells) you will
see tabs that say var; these columns are the variables of your sample data.
Consequently, the numbered rows represent the number of data points for
each variable in your data.
Figure 2

4. At the bottom of the page there are two tabs: Data View and Variable View.
The variable view brings you to the page where you can enter the variables into
the program. For example if you have a range of age groups, this is where you
would enter the age groups and their numerical assignments. The data view is
the tab where you enter the raw data into the program.
Figure 3:

5. Open the file accidents.sav from SPSS/Tutorial/sample_files/accidents.sav. You


can see an example of how these tabs work by playing with them. If you look
at the variable view tab, you can see that there are 4 variables. If you click
on the data view tab, you can see these four variables at the top of the
work-area. A confusing aspect to SPSS is that the rows in the variable view
are the columns in the data view tabs (see Figure 4 and 5).
Figure 4: (Variable View)

Figure 5: (Data View)

6. Look at the variables in variable view; each one has an abbreviated name
(with no spaces) that will show up in the data view tab in place of the var
that usually shows up. The Label column is the full name of the variable,
which you can enter in as well with spaces. Under the Values column, you
can determine what numerical values represent certain data points. For
6

example, in the accidents.sav file, you can see that a 1 represents a female,
and a 0 represents a male. In the Scale column, this is where you determine
the type of data being used. For example, gender is a categorical type of
variable, so it best fits into the nominal1 option. Furthermore, age category is
an ordinal variable, so the ordinal2 option would fit best.

7. In the data view tab, notice a small button that looks like a price tag with a
red end (see Figure 6). If you click on this button, it changes the data from
numbers to labels. You can see here that Females are denoted with a 1 and
males are denoted with a 0 as stated in the variable view. This makes
entering data into the work-area easier. Instead of writing female for all of
your data, you can just enter a 1 and the program knows to treat it as female.
Figure 6:

Entering Data
1. Open up the variable view tab.
2. You must first assign every variable a numerical value. For example assign a
1 to Under 21 and a 2 to 21-25 and a 3 to 26-30. This is the ONLY way
for SPSS to quantify your data. Additionally it is advised to at least determine
the Label, Values, and Measure of your data.
Figure 7:

1 Nominal: A type of data classification that categorizes data either by name or


category. Example: Hair color is Brown = Br, Black = Bk, Red = Rd, and
Blonde = Bl
2 Ordinal: A type of data classification that categorizes data by some order or rank.
Example: agree = 3, somewhat agree = 2, does not agree = 1
7

3. Open up the data view tab.


4. After you have entered in all your variables, you need to enter the data into
the work-area. You must now numerically assign your data sources. For
example, if you have taken a survey of 6 people (each person is a data
source), each person should be assigned a number of 1 through 6 on paper
that will correspond to rows in SPSS. This allows SPSS to correlate data from
each data source (i.e. what people responded with what data).
5. Each data set can then be assigned to its respective row. This means that
data source 1 (the first person surveyed) will be entered in row 1, and data
source 2 will be entered in row 2 and so on.
6. There are two ways to enter data in the data view tab:
a. If the Value Labels button is pressed, you can simply select to
response of each survey from a drop down menu in each cell. This can
be done if the cell that corresponds to the correct data source (row)
and variable (column) is selected (see Figure 8).
b. If the Value Labels button is not pressed, you must enter the data in
as a number (with no drop down menu) (see Figure 9).
Figure 8:

Figure 9:

Output Window
Figure 10:

1. The Output window is a window that is separate from the main window which
shows you the results of your analysis. For example, if you want to have a
graph of some of your data, it will pop up in the output window.
2. There are two sections of the output window. The area on the left is a log of
all the data you have generated in your study. The area on the right is the
visual output of that data. For example, if you have 5 different graphs you
have made, you can click on any specific chart in the left area, and it will
appear (selected) in the right area. If you look at Figure 10, you can see that
the Log portion is selected, and it highlighted the log data in the right section.
3. Note: you can close the output window and NOT close SPSS; it is an auxiliary
feature.

Walkthrough
Analysis
1. Graphing is the most basic type of visual analysis and is presented first in
this guide.
a. Open SPSS/Tutorial/sample_files/accidents.sav. At the top of the screen,
select Graphs>Chart Builder.
b. You can select the different type of graphs you want to create from the
Gallery tab. Select Scatter/Dot and then Grouped Scatter by
clicking and dragging the thumbnail up into the empty area above the
tabs (see Figure 11).
Figure 11:

10

c. Next, you can see the variables that you have created in the upper left
side of the screen. Click and drag Age Category and place it as the
independent variable (x-axis), and then click and drag Accidents to
the dependent variable (Y-axis). Next, click and drag Gender into the
Set Color area. This allows the graph to distinguish between the two
genders, when graphing the two sets of data provided (see Figure 12).
Figure 12:

11

d. Click OK and look at the graph produced in the output window. If you
notice, both males and females are graphed on the same scale. This
can allow you to compare different sets of data on the same graph.
(see Figure 13)

Figure 13:

12

e. Go back to the chart builder and experiment with the different


graphing options and see what you can gather from visual analysis of
the sample data. For example, look at using a simple bar graph, and
then look under the Basic Elements tab for the three dimensional
aspects, and then place gender on the Z axis instead of the Set Color
area. These are both very common ways to visually compare larger
sets of data.
Graphing can sometimes illustrate a large amount of data in a very compact
and elegant way if done effectively. Looking at the different types of graphs
available to use in SPSS can help familiarize you with the options that you
can use if you wish to use a graph. Also, a graph can act as a useful reference
in a report (always recommended).
2. Regression is a good trend analysis tool. In simple terms a regression can
allow you to model data with a linear equation (a straight line).
13

a.

Go to File>Open>Data, and then go to


SPSS/Tutorial/sample_files/car_sales.sav. If you notice, you now have
two main windows open. Opening up a new file does not close out
your existing project.
b. From the top of the window go to Analysis>Regression>Linear. It
should be noted that this guide is only covering linear regression for
now. In the screen that pops up, you can see all the variables on the
left hand side (lots of them!).
c. Select Horsepower and then click on the triangle under the
dependent area. This will place Horsepower as the dependent
variable in the regression. Next, select Price and click on the triangle
under the independent area. Note that the triangle acts as an arrow
showing you if you can put a variable either in or out of the area
available. See Figure 14 for a reference.
Figure 14:

d. Click OK. A new set of tables should appear in the Output window. In
the output window, there are a lot of tables that SPSS will make for
you. If you want to model the data with a linear equation, the bottom
table should be used. The equation in this example can be made in the
form of y=mx+b where y is the horsepower of the car, and x is the
price of the car.

14

Figure 15:

e. The slope is given by the unstandardized coefficient for price in


thousands: 3.323 and the intercept is given as the constant
(unstandardized coefficient constant): 94.670 so the final equation is:
y=3.323x+94.670. You can now use this equation to extrapolate
beyond the data set and model the data mathematically.

3. Correlation is a good analysis tool because it can provide a numerical value


of how closely two variables are related to one another. (PLEASE note
correlation is NOT causation)
a. Go to File>Open>Data, and then go to
SPSS/Tutorial/sample_files/car_sales.sav.
b. At the top of either window (output or the main screen) go to
Analyze>Correlate>Bivariate. Click on Price in Thousands and
move it over to the variables box by clicking on the triangle (arrow).
Do the same with the Horsepower variable (see Figure 16).
Figure 16:

15

c. Note that the box next to Pearson is checked. This will produce a table
that has the Pearson correlation values for two variables (basic
correlation). This tool is most handy when finding the correlation (not
causation) between large numbers of variables. If you noticed, the
correlation coefficient for the two variables is given in both the
regression (see previous section) and the correlation table (R=.840)
(see Figure 17).
Figure 17:

Testing
In order to determine the validity of sample data, you need to test it. Testing
determines the likelihood of obtaining the sample results given a certain
assumption (the assumption is called the Null Hypothesis:

H 0 ). If you are

unfamiliar with Hypothesis testing, please refer to the end of this walkthrough for a
brief crash course on hypothesis testing.
16

1. One Sample T-Test is a way to determine whether or not you are convinced
that your sample can allow you to make a conclusion from it, i.e. (does it
reject

H 0 )? This is done by comparing the t-critical value to the t-value. If

the t-critical is larger than the t-value (using absolute values) then there is
insufficient proof to reject

H 0 , however if the t-value is larger, then

should be rejected in favor of the alternative hypothesis:

Ha

H0

(also using

absolute values).
a. Go to File>Open>Data, and then go to
SPSS/Tutorial/sample_files/callwait.sav.
b. We will see if the waiting time for being on hold is different than 9
minutes.
c. Go to Analyze>Compare Means>One-Sample T Test. A box should
open up that looks like Figure 18.

Figure 18:

d. Select Minutes to Respond and move it to the Test Variable(s) box.


Then type in a 9 in the Test Value box to represent our 9 minutes
benchmark. This means that SPSS is going to do a T-Test comparing the
mean of the data in the sample compared to 9. Click OK.
e. Figure 19 shows the output for the T-Test. Note that the significance
level is very small at .001. Since the alpha value is so small, you would

17

reject your Null Hypothesis (mean waiting time = 9) in favor of your


Alternative Hypothesis (mean waiting time 9).
Figure 19:

2. Independent Samples T-Test is useful to compare two sets of sample data.


It is only useful when the data comes from two distinct groups, rather than
the same group of data (i.e., one sample from men, and the other from
women).
a. Go to File>Open>Data, and then go to
SPSS/Tutorial/sample_files/callwait.sav.
b. We will be looking to see if there is a significant difference between the
Monday (coded as 2) and Friday (coded as 6) waiting times.
that there is no difference, and

Ha

H0

is

is that there is a difference.

c. Go to Analyze>Compare Means>Independent Sample T Test.


Place the Minutes to respond as the Test Variable, and the
Grouping Variable will be the days of the week, see Figure 20.
Figure 20:

18

d. The Grouping Variable now needs to be defined. Click on Define


Groups and type in a 2 for Group 1 and then a 6 for Group 2 (see
Figure 21). This lets SPSS know that you are comparing only the
Monday and Friday data. Also, since this is a two sample test of
differences, determining which set of data is Group 1 or Group 2 is
arbitrary (it does not matter). Click on Continue and then Hit OK.
Figure 21:

e. SPSS should now bring you to the output window with a table that
looks like Figure 22. First you must look to see if Equal Variance can be
assumed. To do this, look at Levenes Test for Equal Variances and
see if the significance is large (above .05 or so). If it is, then use the
upper row of data, if it is not use the bottom row of data. In our case,
the significance level of Equal Variances is far too small to assume
equal variances, so we use the bottom row.
f. Look at the bottom row and see that the T-value of the test is -7.519
(very large!), and the significance of the test (Sig two tailed) is .000, so
there is significant data to determine a difference among the mean call
waiting time of Monday compared to Friday. Another way to measure
this is to look at the 95% confidence interval; if 0 is within the range,
there is not sufficient data to determine a difference. In our interval, 0

19

is outside of the range, which supports our previous conclusion (reject

H 0 ).

Figure 22:

3. Paired Samples T-Test is useful when the data is either from the same
group, or the data is paired up. For example if two police officers are giving
tickets each day for a week, the data would be paired because the number of
tickets written by each officer is paired since we have two sets of data that
are coming from the same two officers on the same days.
a. Go to File>Open>Data, and then go to
SPSS/Tutorial/sample_files/dietstudy.sav.
b. We want to see if there is a difference in someones weight before and
after a treatment plan. Therefore,
and

Ha

H0

is that there is no difference,

is that there is a difference. To do this, we will look at a set

of paired data so that each test subject has two sets of data; an initial
weight (wgt0) and a final weight (wgt4).
c. Go to Analyze>Compare Means>Paired Sample T Test. Select
both the initial weight and the final weight and move them over to the
Paired Variables box and click OK (see Figure 23). Note: you must
select BOTH data points before you can move then to the paired
variables box, because it treats them as one pair.
Figure 23:

20

d. In Figure 24 it can be seen that the t-value for this test is 11.175, with
a sig value of .000. This means that the likelihood of these two samples
being from the same population is effectively 0. Thus, we will reject our

H0

in favor of the

Ha

(there is a difference in the before and after

populations).
Figure 24:

4. One-Way ANOVA test is most useful when more than two means are being
compared. This is done by looking at the sample variances. ANOVA stands for
ANalysis Of VAriances.
21

a. Go to File>Open>Data, and then go to


SPSS/Tutorial/sample_files/demo.sav.
b. We want to see if there is a difference among income levels based on
education background. So we are going to compare the means of the
income levels for the different education backgrounds. Our

H0 :

there is no difference among the different mean income levels, and our

H a : there is a difference. We will do this by using an analysis of

variance test (ANOVA).


c. Go to Analyze>Compare Means>One-Way ANOVA There are two
terms that need to be clarified; the Dependent and the Factor. The
factor is what you are using to distinguish the groups from one
another. Our Factor is the Level of Education. To try and make this easy
to remember the dependent depends on what factor we are looking
at (kind of). The dependent is what you are measuring as a result of
the change in factor. Our dependent is Household Income in
Thousands (see Figure 25).
Figure 25:

d. If the ANOVA test fails to reject

H 0 , then we are ok to leave the test

as it is. However, if the test rejects

H0

in favor of

H a , then we

need to conduct an additional Tukey test to determine which levels of


income are in fact different. This is because the ANOVA test only tells
us that there is a difference among the means, it does not tell us where
the difference is. Consequently, the Tukey test will show where the
differences are, should we want to know. Click on the Post Hoc
button, and select the Tukey test as shown in Figure 26. Click
Continue.

22

Figure 26:

e. To further assess the validity of the assumptions used in the ANOVA


and Tukey test, we must also make sure that the variances are treated
appropriately. To do this, click on Options, and then select
Descriptive, Homogeneity of variance test, Drown-Forsythe, and
Welch, and the Means plot. See Figure 27. Click Continue and then
OK.
Figure 27:

23

f.

The output window will have a lot of data provided! Dont worry
because once it is explained, it is not too bad to follow. The first table
shows the basic descriptive statistics of the groups of data analyzed.
The data provided is a good overview of what the data looks like in a
condensed form with all the basic data provided (see Figure 28).
Figure 28:

g. Before the ANOVA test can be done, we must see if the assumption of
equal variances is appropriate. The table of Homogeneity of Variances
looks at this very thing and tests it using Levenes test (see Figure 29).
Since the F value of the test is large (14.766), the corresponding
significance level (.000) is well below .05, meaning that equal
variances cannot be assumed. If the sig value was larger than .05, we
could simply move onto the ANOVA test.
Figure 29:

24

h. As a result of non equal variances, we need to be careful with


interpreting the ANOVA table. Because the variances are not equal, the
resulting F value and significance value might be off enough to sway
the output of the test (see Figure 30). The results of the ANOVA test
imply that we would reject our

H0

because of the low significance

level, however we need to verify this with the Welch and DrownForsythe tests (Robust Test for Equality of Means).
Figure 30:

i.

If the Welch and Brown-Forsythe tests produce values that are above .
05 then this would contradict our ANOVA test and cause us to fail to
reject our

H 0 , however, since both of these tests yield the same

results as our ANOVA test, (significance levels of .000) we can use the
ANOVA results. See Figure 31.
Figure 31:

j.

Now that we have sufficient evidence for rejecting

H 0 , we need to

determine which groups are different from one another. The Post Hoc
table in Figure 32 provides the data need to see which groups of data
have sufficient evidence to say that they are indeed different. Look at
the Did not complete high school (I) compared to High school
25

degree (J). Notice that both the sig level and the 95% confidence
interval are above .05 and includes 0 respectively. This means that
there is not sufficient data to say that the income levels between Did
not complete high school and High school degree are different.
Conversely, look at Did not complete High school compared to
Some college. This sig level and confidence interval are below .05
and do not contain 0 respectively, so there is sufficient data to say that
this pair of data is likely to be different. If you noticed, the mean
differences that have an asterisk next to them indicate a significant
difference between values.

Figure 32:

26

k. We can now look at a graphical representation of the means. If we did


not use the ANOVA test, we could not infer the validity of this graph
(Figure 33), but now we can say which points contain significant
differences. For example, we see graphically that Did not complete
high school and High school degree have different means, but now
we can say that the this difference is not enough proof to say that they
are in fact different (statistically). This illustrates the potentially
misleading information in a graph. Furthermore, the axis on the graph
can make small differences that are not significant appear to be large
differences since the graph zooms in, which can allow for faulty
analysis of a graph.
27

Figure 33:

Hypothesis Testing Super Crash Course!


A hypothesis is a claim that you need to test in order to accept 3 something.
Whenever you create a hypothesis, there are always two sides to it: the claim that
you want to prove and the claim that is already assumed to be true.

3 Accept is in quotes since the statistical terminology is fail to reject. The reason
for this is that there are no absolutes in statistics.
28

The Alternative hypothesis

H
( a)

is generally what you are trying to prove. For

example if we want to show that trees are taller than shrubs, you would say that the
mean height for a shrub is less than the mean height for a tree.

H a : meanheight of a shrub ( s ) < meanheight of a tree ( t )

The Null Hypothesis

H
( 0)

is what is assumed to be true. For example, if you have

absolutely no prior knowledge of something, then this is what you would believe,
Using the previous case, if you never knew anything about trees or shrubs, you
would not know that there is a difference in their heights, or you could assume that
shrubs are taller than trees. This said, the Null hypothesis is that the mean height
for shrubs is either equal to (or greater than) 4 the mean height of trees.

H 0 : meanheight of a shrub ( s ) =mean height of atree ( t )


The alpha value () is the point at which you are convinced that your sample is
significant. When in doubt, let =.05 or if you REALLY want to be sure let =.01
If the probability of an event occurring (given the assumption of your

H 0 ) is less

than some arbitrary percentage (alpha value or ) then it is called significant. This
means that there is sufficient evidence to say that
reject

H0

in favor of the

H0

is not likely, therefore you

Ha .

Another way to find if a test is significant (when you reject

H 0 ) is to use a t-test

as explained previously in this document.

44 It is assumed that there is no difference among the trees and shrubs, however, it
is also IMPLIED that the shrubs could be greater than because that would still not
contradict the alternative hypothesis.
29