Академический Документы
Профессиональный Документы
Культура Документы
with SPSS
A practical approach
Johan Smits
E-mail:
j.smits@koalapress.com
Marketing Research
with SPSS
A practical approach
Table of Contents
1. Starting with the Research Process 1
Case...................................................................................................................................................1
1.1 The Marketing Research Process .......................................................................................1
1.2 Measurement in Marketing Research ..................................................................................3
1.2.1. Question-Response Formats ..................................................................................................... 4
1.2.2. Scale Characteristics ................................................................................................................. 6
1.2.3. Levels of Measurement .............................................................................................................. 7
1.3 Coding Data and the Data Code Book ..............................................................................10
9. Appendix 185
9.1 Adjustment of SPSS Settings ........................................................................................... 185
9.2 SPSS Distribution of Saxion ............................................................................................. 190
9.3 How to Create and Customise a Chart in EXCEL ............................................................. 190
Case
Suxes is the market leader on the Dutch catering market. The company is successful
in several market segments. For university campus restaurants it has developed a
concept that is appreciated by many students and lecturers. Main points are the
wishes and needs of the university and those of the users (students and lecturers).
The formula used by Suxes can be characterised as 'value for money' in an attractive
campus restaurant.
The Pandion University is situated in the eastern part of the Netherlands. For two
years it has been housed in a new building. All courses are given in this new building
with a large and brand-new campus restaurant for both students and lecturers.
From the start the catering was contracted all to Suxes. At Pandion University, Suxes
offers, in addition to the basic assortment (sandwiches, donuts, soup, coffee, tea,
milk), also a luxury assortment. This luxury assortment consists of nutritious soups,
products from the salad bar, extra luxury sandwiches, etcetera.
Now, two years after the start of Suxes catering activities at Pandion University, the
governing board of Pandion wants to do a customer satisfaction survey among the
students and employees.
Market research is needed when decision makers must make a decision and they do
not have the information to help them make the decision. When this is clear, the first
step is taken. The next step (step 2), defining the problem, is the most important step
in the research process, because a firm may spend literally hundreds of dollars doing
market research but, if it has not correctly identified the problem, those dollars will be
wasted. Often a form of exploratory research is needed to clearly identify the problem
so that proper research may be conducted. After that, research objectives, although
related to and determined by the problem definition, are set so that, when achieved,
they provide the information necessary to solve the problem. Some of the research
objectives can be answered by means of desk research or qualitative field research like
observation studies, depth interviews or focus groups conversations. Other research
questions must be answered by means of quantitative field research. This is done by
constructing a questionnaire, collecting the data and analysing it. By using tables,
graphs and statistical tests, an answer can be given to the research objectives.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 1
For the customer satisfaction survey in this case, we have to use a questionnaire to
gather information from the employees and students of Pandion University. This
questionnaire is based on the next problem definition and research objectives.
Research Objectives
(1) How many days per week does one use the restaurant?
(3) How does one assess the range of choice in the basic assortment and in the
luxury assortment?
(4) How does one assess the customer service offered by the staff?
(6) Is there a need for products which are not in the assortment at this moment?
(7) Who is the customer (gender, student or lecturer, number of study years)?
(8) What is the overall level of satisfaction of the catering services expressed as a
score on a scale from 1 up to 10?
(11) Is there a difference between students and lecturers in the amount spent?
(13) Is there a relationship between the score given by customers and the amount
spent?
(14)
For the field research a questionnaire is constructed on the basis of the research
questions. Generally speaking, manual processing of the inquiry results is not an
option. In general the quantity data will be such that we need statistical software: SPSS.
Now days questionnaires are distributed via the Internet increasingly. There are
software products with which you can realise web inquiries. You can construct the
questionnaire with this software, put it online and invite people via email. During the
research, you can monitor the results daily, even on an ongoing basis. At the end you
can export all data to SPSS.
As an example, we will outline this process by means of the next questionnaire which
the researchers have established for the Suxes Customer Satisfaction Survey at
Pandion University.
5. Your satisfaction with the customer service of the staff of Suxes is ...
O Excellent
O Good
O Bad
O Very bad
7. How do you rate the university caterer Suxes on a scale from 1 to 10?
8. I would like to have the assortment extended with the following products:
................................................................................................................
10. How many years have you been registered as a student at Pandion University?
year(s)
Figure 1.1 The questionnaire for the Suxes Customer Satisfaction Survey for the students
and lecturers of Pandion University
Closed-Ended
Open-Ended with numerical response
Open-Ended with text response
Multiple response questions
We will discuss these different formats in the next sections.
5. Your satisfaction with the customer service of the staff of Suxes is ...
O Excellent
O Good
O Bad
O Very bad
By using these codes the data entry is limited to the entry of codes (1, 2, 3, 4) instead of
the entry of the literal answers (Excellent, Good, Bad, Very bad). We prefer to use
numbers because numbers are easier and faster to keystroke into a computer file.
We can transform this type of question into one variable in SPSS. It is no use coding the
answers, because the response is already numerical and has its meaning by nature.
10. How many years have you been registered as a student at Pandion University?
year(s)
8. I would like to have the assortment extended with the following products:
................................................................................................................
There are two ways to transform the question and the responses into:
1. We can take over the answer literally and type in the text. For this we have to
define a text variable in SPSS. We can use an extra module of SPSS, called SPSS
Text Analysis for Surveys, to process the data and count twinge words. This is
not dealt with in this book.
2. We can define a numerical variable in SPSS and code the responses afterwards. If
the number of answers is limited, this method is to be recommended. During
the data input you maintain a list of answers and their codes. This is the way we
have dealt with question 8 of the questionnaire.
For each response category the respondent can only answer with yes or no. So each
response category transforms into a variable in SPSS. Of course, you are free to choose
your codes, but the standard approach is to have each response category option coded
with a 0 or a 1. The designation 0 will be used if the category is not checked, whereas
a 1 is used if it is checked by a respondent. Thus question 6 of the questionnaire is
processed in SPSS by defining nine variables each having a dichotomous response
structure: 1=yes, 0=no.
Description
Description refers to the use of a unique descriptor, or label, to represent each
designation on the scale. For instance, yes and no, agree and disagree and the
number of years of a respondent's age are descriptors on a simple scale. All scales
include description in the form of characteristic labels that identify what is being
measured.
Order
Order refers to the relative sizes of the descriptors. Here, the key word is relative and
includes such descriptors as greater than, less than, and equal to. A respondent's
least-preferred brand is less than his or her most-preferred brand and respondents
who check the same income category are the same (equal to). Not all scales possess
order characteristics. For instance, is a buyer greater than or less than a nonbuyer?
We have no way of making a relative size distinction.
Distance
A scale has the characteristic of distance when absolute differences between the
descriptors are known and may be expressed in units. The respondent who purchases
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 6
three bottles of diet cola buys two more than the one who purchases only one bottle; a
three-car family owns one more automobile than a two-car family. Note that when the
characteristic of distance exists, we are also given order. We know not only that the
three-car family has more than the number of cars of the two-car family, but we also
know the distance between the two (one car). A scale is said to have the characteristic
of origin if there is a unique beginning or true zero point for the scale. Thus, 0 is the
origin for an age scale just as it is for the number of miles travelled to the store or for
the number of bottles of soda consumed. Not all scales have a true zero point for the
property they are measuring. In fact, many scales used by market researchers have
arbitrary neutral points, but they do not possess origins. For instance, when a
respondent says, No opinion, to the question Do you agree or disagree with the
statement The Lexus is the best car on the road today? We cannot say that the person
has a true zero level of agreement.
Perhaps you noticed that each scaling characteristic builds on the previous one. That
is, description is the most basic and is present on every scale. If a scale has order, it
also possesses description. If a scale has distance, it also possesses order and
description, and if a scale has origin, it also has distance, order, and description. In
other words, if a scale has a higher-level characteristic, it also has all lower-level
characteristics. But the opposite is not true, as is explained in the next section.
Table 1.1 also introduces two new concepts: categorical versus metric scales. A
categorical scale is one that is typically composed of a small number of distinct values
or categories such as male versus female, or married versus single versus
widowed. As you can see in the table, there are two categorical scale types: nominal
and ordinal. These will be described in detail in this section. The other concept is a
metric scale, which is composed of numbers or labels that have an underlying
measurement continuum. There are two metric scales that are also described in this
section, and they are interval and ratio scales.
When you are interpreting the data by means of statistical analyses you have to come
up with answers to the research questions. The statistical method you can use depends
on the type of question in the questionnaire. To be more precise: the level of
measurement determines the statistical method to be used.
Just a small example to make this clear. An elementary data summarizing method is
calculating average values, e.g. computing mean values for variables. But it should not
surprise you that for some variables this is nonsense and for other variables this is
meaningful. It is no use talking about the mean Gender, but on the other hand, for
Expenditure we can calculate the mean value with sense. This is clearly related to the
level of measurement of the variable involved. As said, we distinguish four levels of
measurements.
Your course at Pandion (Marketing, Int. Business and languages, Int. Business
administration, Management and law, Health studies, Security);
Smoker or non smoker (yes, no);
Choice of a supermarket (A&P, Wal-Mart, Sears, Aldi, other).
For variables with a nominal scale there are hardly any calculations available. You
cannot compute an average (mean) value. Calculating the median makes no sense
either. The only statistical activity is counting the frequencies. You might wonder how
SPSS is able to calculate a mean value for the variable gender. That is done on the basis
of the numbers (the codes) used for the values male and female. But the calculated
value of the mean is meaningless. Interpretation of the calculations done by SPSS is a
human activity.
Ordinal Scales
Ordinal scales permit the researcher to rank-order the respondents or their responses.
For instance, if the respondent was asked to indicate his or her first, second, third, and
fourth choices of brands, the results would be ordinally scaled.
Similarly, if one respondent checked the category Buy every week or more often on a
purchase-frequency scale and another checked the category Buy once per month or
less, the result would be an ordinal measurement. Ordinal scales indicate only relative
size differences among objects. They possess description and order, but we do not
know how far apart the descriptors are on the scale because ordinal scales do not
possess distance or origin. Examples of ordinal-scaled questions are:
Please rank each brand in terms of your preference. Place a 1 by your first choice,
a 2 by your second choice, and so on.
__ Sony
__ Zenith
__ Philips
__ BasF
__ Grundig
Indicate your degree of agreement with the following statements by circling the
appropriate number.
Strongly Strongly
Statement disagree agree
a. I always look for bargains 1 2 3 4 5
b. I enjoy being outdoors 1 2 3 4 5
c. I love to cook 1 2 3 4 5
Please rate the Pontiac Firebird by checking the line that best corresponds to your
evaluation of each item listed.
Slow pickup ___ ___ ___ ___ Fast pickup
Good design ___ ___ ___ ___ Bad price
Low price ___ ___ ___ ___ High price
Ratio Scales
Ratio scales are ones in which a true zero origin existssuch as an actual number of
purchases in a certain time period, dollars spent, miles travelled, number of children,
or years of college education. This characteristic allows us to construct ratios when
comparing results of the measurement. One person may spend twice as much as
another or travel one-third as far. Such ratios are inappropriate for interval scales, so
we are not allowed to say that one store was one-half as friendly as another. Examples
of ratio-scaled questions are:
Approximately how many times in the last month have you purchased anything
over $5 in value at a convenience store?
0 1 2 3 4 5 More (specify: ___)
How much do you think a typical purchaser of a $100,000 term life insurance
policy pays per year for that policy?
$ ____
What is the probability that you will use a lawyers service when you are ready to
make a will?
___ percent
Summary:
The level of measurement of a variable is determined by the way in which it is
measured. You have to take into account the possible responses. We distinguish
categorical scales (nominal and ordinal) and metric scales (interval and ratio). The
metric scale is denoted as Scale by SPSS.
So the data code book is a list of transformations of questions into variables, their
(variable) labels, the codes of the answers with their (value) labels, and the level of
measurement. Recall that we discussed in Section 1.2 that each question corresponds
with one variable except multiple response questions where we need as many variables
as there are response options.
Choose a short name for a variable in SPSS. The first character of the name must
be a letter, where letters and numbers are allowed for the other characters. The
use of symbols like @, #, $ and _ is also allowed. Spaces have been prohibited and
we strongly dissuade the use of a point or comma.
The name of the variable entered in SPSS is often a concise reproduction of the
characteristic measured in the questionnaire. This name of the variable is
extended with a (variable) label to provide SPSS with a full description. The
variable label is used as a title in tables and graphs. So this label must be very
clear and meaningful.
The codes you use for entering the data in SPSS must also be provided with labels.
SPSS will use these value labels in tables and graphs as well. So it is very important
to use clear and concise descriptions for value labels.
The level of measurement of a variable is especially important in the phase of data
analysing. As we have explained the statistical analysis to be used is restricted to
variables with the required level of measurement. In displaying variable lists
within dialogs of statistical procedures SPSS takes the level of measurement into
account.
In Figure 1.2 we have constructed the data code book for the Pandion survey.
1. Start SPSS.
If your version of SPSS starts with a dialog What would you like to do?, select the
option box Type in data (halfway through the dialog) and also select the option Dont
show this in the future (at the bottom of the dialog) to prevent this dialog from
reappearing.
After clicking OK SPSS will show you the Data Editor window.
This window has a certain resemblance to EXCEL, a worksheet with columns (variables)
and rows (the data of each respondent). This sheet is called Data View. SPSS data files
are organized by cases (rows) and variables (columns). In our data files, cases
represent individual respondents to a survey. Variables represent questions asked in
the survey.
We will use the second sheet Variable View to enter the data code book. On this
sheet we can define the variables.
We will give a short description of the columns. The bold entries refer to the columns
of the data code book, which we discussed in Chapter 1 (see Figure 1.2).
Name Stores the name of the variable (must start with a letter)
Type Sets the data type for a variable. Most often you will use Numeric.
Other data types will be discussed in Section 2.3.
Width Specifies the maximum number of characters for a variable value.
Leave this at 8.
Decimals Sets the number of decimal places for a numeric value.
Change this to 0, unless you want to use decimals.
Label Stores the description used by SPSS to identify the variable in output.
Values Sets the labels for the coded values of a categorical variable.
In the Data View sheet you can tell SPSS to display these value labels by
selecting View Value Labels in the menu. (See also Section 2.5.1)
Missing Specifies whether the data set contains missing values, and the missing
values, if present, are coded.
Columns Sets the width of the column of the variable on the screen.
Align Sets the alignment of the column (only on screen).
Measure Specifies the level of measurement (see Section 1.2.3). Metric variables are
denoted as Scale by SPSS.
Role Specifies the role of the variable for advanced models. This originates from
the software programme Clementine (PASW MODELLER).
Value labels can only be used with coded variables. The other columns of the Variable
View are used only if necessary.
Note If you are defining categorical variables, you should define them as
numerically coded variables and then establish the meaning of
those codes in the Label column. You should not define such
variables as string variables!
4. Fill in the Variable View. Refer to the data code book of Figure 1.2. The first
part of the window looks like this:
In order to enter the Value labels you click in the cell which displays None. At that very
moment a button with three dots appears. Click this button.
Use your mouse (or the Tab-button) to proceed to the next input box. Finally you click
the OK-button to leave the dialog because otherwise all your work will disappear.
6. Process all variables of the data code book in the Variable View.
Tip: Adjust the column width in order to fit all columns on the screen.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 15
7. Do not forget to enter the correct level of measurement in the last column.
Please remember the three levels Nominal, Ordinal and Scale (which combines
the levels Interval and Ratio).
8. Switch to the Data View. Here you see the names of the variables appear as the
column names.
Note A spss data file must always have the extension .sav.
Click with the right mouse button on the row heading and choose the option Insert
Variables.
Click with the left mouse button on the row heading and press the Delete button
of your keyboard.
Or:
Right click the row heading of the variable and select the option Clear.
Among the other data types you will need to use there are Date and String. For date or
time related variables it is wise to use the Date data type, because you can use the date
and time functions of SPSS to make calculations. Examples of these kinds of
calculations are the computation of the age of a customer on the basis of the date of
birth or the number of days since the last visit of the customer on the basis of last
visiting date.
If you want to input text you will need the data type String.
On the tab Variable View in the Data Editor you can choose another data type via the
corresponding button.
Work carefully in order to prevent making errors with the data entry. After entering
the respondent number, write it down on the questionnaire also, in order to be able to
locate this form if needed at a later stage. If you have activated the Value labels you
can use option lists to facilitate the input process. But if you prefer the codes over the
labels you can deactivate this option by clicking the same button again.
And now the good news is that your lecturer has entered the data for you already. The
data is available for you in the file Suxes Catering Services.xls and the only thing left
to do is a copy and paste action.
2. Switch to SPSS and paste the data via the menu Edit Paste. Please note that
the Data View-tab must be active. Check whether the cursor is in the top left
cell of the sheet.
3. After pasting the data you have to save the file (again).
From the menus, choose: File Save or use the corresponding icon of the
toolbar.
1. While you have the Data View of the Data Editor on screen, use the Value labels
button of the toolbar. Click one more time to see how this button toggles the
display. In this view check the column of your sheet.
1. From the menus, choose File Display Data File Information working File.
Contents pane
If this is your first time working with SPSS output it is worthwhile to spend a few
minutes to become familiar with the structure of the SPSS output window. The results
from running a statistical procedure are displayed in the Viewer. The output produced
can be statistical tables, charts, graphs, or text, depending on the choices you make
when you run the procedure.
The Viewer window is divided into two parts. The outline pane (on the left side)
contains an outline of all information stored in the Viewer. The contents pane (on the
right) contains statistical tables, charts, and text output.
Use the scroll bars to navigate through the windows content, both vertically and
horizontally. For easier navigation, click an item in the outline pane to display it in the
contents pane. If you think that there is not enough room in the Viewer to see an entire
table or that the outline view is too narrow, you can easily resize the window.
The results from most statistical procedures are displayed in pivot tables. In the next
chapter we will discuss how to edit a pivot table.
The output of the FILE INFORMATION procedure has a number of components, Title
(which contains the title of the block), Notes (containing the creation date, name of the
data file, etcetera), Active Dataset (a text output block with the full path and name of
the data file), Statistics (a table with the number of valid and missing observation for
each variable), and Frequency Table (which contains the frequency tables). The
content of each block is shown on the right side in the content pane.
2. Check the file information by comparing your SPSS output with the codebook
from Figure 1.2.
3. Scroll downwards and check whether all variables have the correct value labels.
5. Select all variables except the respondent number and put them into the right
pane. You can select the variables most conveniently by clicking on the variable
Visits and, while holding the Shift key, clicking on the variable Gender.
At the second tab of the dialog, Output, you can select which variable and file
information you want to display by SPSS. The tab Statistics facilitates you to choose the
way of summarizing the data on the basis of the measurement label of the variable.
You do not need to change these settings right now.
6. By clicking the button OK, an overview of all selected variables will be produced.
In the output file you can see a new branch added to the tree structure. This new
branch contains the name of the command as top label and each variable has its own
entry. Every leaf on the left (outline pane) corresponds to a table on the right side
(contents pane) of the window. Please note that variables with a ratio level of
measurement have a different table (see Figure 2.4) than the nominal and ordinal
variables (see Figure 2.5). For scale variables (this is ratio!) the table contains the
mean and standard deviation, while tables with nominal and ordinal variables contain
all observations and their frequencies and percentages.
Inspect the tables in the output file. You will discover that two frequency tables
contain an error.
7. Repair the errors in the data editor. The one error is clear, because the 11 must
of course be a 1, and the other is a male. Use the search button to find these
values in the data editor.
9. Remove the CODEBOOK tables (by selecting and deleting the whole output block).
Make new tables and check whether they are correct now.
10. Save the output file in the folder SPSS Basic Course also. Name it Suxes Survey
Output 1 and notice that SPSS adds the extension .spv.
Note A SPSS output file always has the extension .spv. In version 15 and
older the SPSS output files have the extension .spo.
Unfortunately CODEBOOK tables are not meant to be published in this raw format. They
can only be published after some elaborations have been made. In the next chapter we
will discuss how to create tables for a publication.
11. Exit SPSS. Close the output file and the data file. SPSS displays an alert to warn
you.
3.1 Introduction
In this chapter we will discuss analyses which deal with only one variable at a time.
This means that we will analyse the responses of one question of the questionnaire
without taking into account the answers to other questions. Examples of research
objectives with respect to one variable in the Suxes Survey are (see also Section 1.1):
(1) How many days per week does one use the restaurant?
(4) How does one assess the customer service offered by the staff?
To answer these questions you have to make a choice from the available statistical
analyses. Firstly the level of measurement of the variable determines this choice. The
research objectives (1) and (2) are related to ratio scaled variables. You can come to an
answer by calculating the mean value. Research question (4) is related to an ordinal
scaled variable and (7) to a nominal scaled variable. Calculating a mean value to
answer these questions makes no sense of course.
Table 3.1 summarizes how research questions with respect to one variable can be
analysed. There are many numerical descriptive measures available for scale 1
variables, including:
1 SPSS uses scale to denote the interval and ratio level of measurement.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 25
Level of measurement
In this chapter we will discuss how to conduct these statistical analyses with SPSS.
Furthermore we will show you how to edit tables and graphs so that you can use them
in your publications, like a research report in WORD or a POWERPOINT presentation.
Section 3.5 will discuss how to document your SPSS output file in order to retrieve
tables, graphs and other output elements easily.
The last section concludes by discussing all research questions with respect to one
variable. For each research question we will show the analysis in SPSS and we will give
an appropriate conclusion.
1. Start SPSS and open the data file Suxes Survey.sav which you have created and
saved in the previous chapter.
2. From the menus, choose: File Open Output and open the output file Suxes
Survey Output 1.spv. This file has been created in the previous chapter as well and
it contains the code book tables of all variables of the survey. In order to save this
file by a new name, from the menus choose File Save as. Name your new file
Suxes Survey Chapter 3.spv
Note Save all your output of this chapter in the output file Suxes Survey
Chapter 3.spv. This file is a large container in which all tables and
graphs are stored. At the end of this chapter you will learn how to
structure this file so that you can retrieve your work without effort.
We are going to create a frequency table for the variables Variety of products in the
basic assortment, Score for the catering service of Suxes, Numbers of years
registered as a student at Pandion University. These variables have names like:
Variety_basic, Mark, YearStud.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 26
3. If you prefer to use the names of the variables instead of their labels, please change
this setting of SPSS as is displayed in Section 9.1.
5. Select the three variables from the list and put them into the right pane. Hint: If
you hold the Ctrl-button, you can select variables simultaneously. The tables will
be produced after clicking the OK button.
As you can see in Figure 3.1 SPSS added a new branch to the existing tree structure. In
the viewer pane (right side) the frequency tables are displayed. SPSS starts by showing
a summary with the number of valid outcomes and the number of missing
observations for each variable. Please note that for ten respondents the variable
YearStud has no value.
6. Locate the frequency table of the variable YearStud in the viewer pane.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 27
A rough table is displayed in Figure 3.2.
Valid Cumulative
Frequency Percent Percent Percent
Valid 1 10 20,0 25,0 25,0
2 11 22,0 27,5 52,5
3 10 20,0 25,0 77,5
4 6 12,0 15,0 92,5
5 3 6,0 7,5 100,0
Total 40 80,0 100,0
Missing System 10 20,0
Total 50 100,0
Figure 3.2
The format of your tables is a critical part of providing clear, concise and meaningful
results. If your table is difficult to read, the information contained within that table
may not be easily understood. It is clear that there is a need for some adjustments,
like:
7. Double-click on the frequency table to start the editor. As you can see there are
some changes in the interface: a notched edge around the table and some menu
entries have changed. Moreover the Formatting toolbar has appeared.
Figure 3.3 The output window with an active editor and the formatting toolbar
Note If the formatting toolbar does not show up, use the menu View
Toolbar.
We are going to hide two columns from the table. There are two ways to hide a
column: either you drag the right border of the column or you select the whole column
and choose Hide Category from the popup context menu.
8. With the left mouse button drag the right border of the column Percent to the left,
as far as possible. While dragging, the actual width of the column is displayed until
SPSS shows the message Hide. At that very moment you release the left mouse
button and the column will disappear.
Note If things go wrong and you happen to destroy the table, please do
not panic. It is very easy to recreate the table by running the
FREQUENCIES procedure again. However, there is also a Undo entry
in the Edit menu. Moreover, the first button of the toolbar gives
you that function as well. Appologies to heavy users: there is no
keystroke Ctrl-z available.
9. Ctrl-Alt-click on the Cumulative Percent column to select all of the cells in that
column. Right-click the highlighted column and choose Hide Category from the
pop-up context menu. This column is now hidden also.
Now we change the display format of the percentages in the pivot table.
11. Select the second tab Format Values. Select ##,#% from the Format list.
Type 0 in the Decimals field to hide all decimals in this column.
Click OK to apply your changes.
13. Switch to the tab Borders and point with your mouse at the line of interest. In the
Border list, the option Horizontal category border (rows) is selected
automatically. Select the appropriate line style: the dashed line.
17. Now we have finished. Click outside the notched edge to close the pivot table
editor. The final result is shown in Figure 3.4.
18. Customize the frequency tables of the variables Variety Basic Assortment and
Mark in the same way.
3.3.1. FREQUENCIES
We will discuss some features of the FREQUENCIES procedure by using it for the
variable Mark.
2. Click Statistics. Select Quartiles, Mean, Median, Mode, Std. deviation, Minimum,
and Maximum.
4. Deselect Display frequency tables in the main dialog box. (Often frequency tables
are not useful for scale variables since there may be almost as many distinct values
as there are cases in the data file).
3.3.2. DESCRIPTIVES
The procedure DESCRIPTIVES also calculates statistics. If you deal with several variables
at the same time the output is more convenient than FREQUENCIES. The other differen-
ces with the procedure FREQUENCIES are not significant. DESCRIPTIVES provides the
calculation of z-scores, whereas FREQUENCIES provides the option to produce a bar
chart, pie chart or histogram.
3. Save the output file again. You always have to save after producing new output.
This is so obvious that we will not repeat this anymore.
We will use the CHART BUILDER wizard to create graphs, because this interface
facilitates your building process and shows you a preview of the graph. It is important
to know that SPSS can produce a good graph only if the measurement levels of your
variables have been set correctly. To emphasise this, SPSS will popup a warning when
you start the chart builder wizard and invites you to check the measurement levels.
Since we have set all measurement levels for each variable and all our categorical
variables have value labels we can proceed by clicking OK.
The preview
pane of the
chart
The icon of a
pie chart
The large dialog of the CHART BUILDER wizard will be shown. Please note that if you
select a categorical variable in the list, e.g. Staff, SPSS displays the value labels. The
preview displays these labels also, but not the real data. You can build your graph by
dragging the elements into the preview pane.
2. In the lower part of the dialog CHART BUILDER (Figure 3.8) at the tab Gallery,
select the option Pie/Polar. Drag the Pie chart icon into the preview pane.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 36
The dialog Element Properties will show up, but we will discuss this dialog later. Right
now, we will concentrate on the preview pane.
3. From the list Variables drag the variable Staff and drop it in the box Slice by?
which is right below the pie.
In the preview you will see that SPSS replaces the vertical box with Count to let you
know that the chart will be based on the counts of each category of the variable Staff.
Two elements are added to the preview of the pie chart and these two elements have
entered the option list Edit Properties in the dialog Element Properties also. In this
dialog you can set the properties of the elements of your chart.
5. If this dialog is not visible, use the button Element Properties on the dialog of
the CHART BUILDER wizard to display it.
6. Enter todays date and your name and class right after the copyright symbol
(press Alt+0169) in the content box of Footnote 1. Click Apply to confirm.
7. Select Title 1 from the list and enter Rating customer service.
Again, click Apply to confirm.
Note It is important to place the title (and footnote) in the graph itself. If
the graph is copied to WORD (or exported to another software
package) only the graphical data is included. So the graphical file
must contain all the information.
8. Click OK to finish. SPSS will create the pie chart for you.
1. Select the graph in the Output Viewer and double click the pie chart to open it in
the Chart Editor.
The Chart Editor will open the chart in a new window. As long as the chart is in
progress in the Chart Editor you see that it is shaded in the Output Viewer. The graph
is object oriented, which means that it consists of elements with their properties. You
can set the properties of each element of the graph after selecting it. The collection of
elements contains the graph as a whole, the interior part of the graph, the collection of
slices, each slice itself, the titles, the labels and so on.
3. Right-click on the pie and choose the option Show Data Labels.
The dialog Properties appears. In this dialog you can change the properties of the
elements of the graph.
You can also open the dialog Properties with the button on the
toolbar.
Hide labels
(= move
downwards).
Display labels
(= move upwards)
Position of labels
outside the pie.
4. Select the Number Format tab. You do not want the labels to display decimal
places, so type 0 in the Decimal Places text box.
6. Select the Data Value Labels tab. In the Displayed list you see Percent and in
the Not Displayed list the value labels of the variable Staff. Those labels are
hidden in the chart. Since we want to display these labels in the chart, add them
to the box Displayed.
7. Choose the option Custom at Label Position in order to place the label outside
the slice (left button). At Display Options, check the option Display connecting
lines to label. Click Apply to update the labels properties.
Now the labels are outside the pie and can be positioned individually. However the
font size needs to be adjusted.
8. Select the Text Style tab and adjust the font size. Our preferred size is 9 which
makes the label size easily readable.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 39
9. Select Footnote and change the font style to Italic.
10. Select Title and change the font family and enlarge to preferred size.
11. Since all information in the legend is in the chart itself, there is no need to have
a legend anymore. Hide the legend.
12. Change the colour of one slice and the border of another one. Click on the pie
and click again on a slice to select it. Use the Fill & Border tab to make the
changes.
13. When you are done, close the Chart Editor. The updated pie chart is shown in
the Viewer.
We will create a bar chart of the variable Variety of products in the basic assortment.
2. On the tab Gallery, select the option Bar and drag the icon Simple Bar into the
preview pane.
5. In the dialog Element Properties, enter the title Variety of the basic assortment
and todays date, your name and class as footnote. Do not forget to confirm by
clicking Apply.
6. Still in the dialog Element Properties, select the element Bar1 and edit the property
Statistic into Percentage and click Apply.
7. Click OK and SPSS will create the bar chart for you.
2. Select the title of the horizontal axis. After a (right-)click the Properties dialog
containing the properties of this object will appear.
3. From the dialog Properties select the Text Style tab. Choose Georgia from the Font
Family list, Style Italic, Size 12 and Colour Dark Blue. Confirm your choices by
clicking Apply.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 43
There are three types of justification
available for an axis title
4. Select the Text Layout tab and choose a justification to the right (Justify).
5. The vertical axis also needs some adjustments. Click the button Y on the toolbar to
select the vertical axis.
7. On the Scale tab you can adjust the subdivisions on the vertical axis. So type 50 in
the Maximum text box and 5 in the Major Increment text box. Click Apply to see
the results.
Distance between
the ticks
9. You see that gridlines are added to the graph. Select the Lines tab and choose a
dashed Lines Style with a grey colour.
11. The bars: click on one of the bars to select them. From the Chart Editor menus,
choose Elements Show Data Labels. (This can also be achieved with the
corresponding button on the toolbar.)
12. Now, look at the Properties dialog on the Data Value Labels tab. In the Label
Position panel, select Custom, Below Centre. Click Apply to confirm.
13. On the Number Format tab, type 0 in the Decimal Places text box and on the Text
Style tab, choose a Font Size of 9.
15. Adjust the title. Choose the font family Bookman Old Style, change it to bold and
size 18. (Another font family is fine as well.)
16. Finally, change the Footnote into Italic, 8 points and locate it at the bottom left in
the chart.
The graph has been very much improved and now it is ready to be published.
17. Close the Chart Editor and save your output file.
The boxplot provides graphical representation of the data based on the five-number
summary that consists of
The vertical line in the middle of the box represents the median. The vertical line at the
left side of the box represents the location of Q1 and the vertical line at the right side of
the box represents Q3. Thus the box contains the middle 50% of the observations in the
distribution. The lines outside the box contain the lower 25% and the upper 25% of the
observations up to the outliers and extremes. These observations are represented with
a star or a circle symbol.
In this section we will introduce the simple boxplot to you, a boxplot of the
expenditures per week in the restaurant. In Section 4.3 we will produce a boxplot to
compare the expenditures of students and lecturers.
3. Select the variable Expenditure from the list and drag it into the box X-axis? at the
left site of the graph.
At the tab Titles/Footnotes, select the options Title 1 and Footnote 1 again to add
your texts to the to graph.
4. In the Element Properties dialog enter the title Expenditures in the restaurant on
a weekly basis and let the footer display the current date with your name and class
after the copyright symbol (as always). Do not forget to confirm by clicking the
Apply button.
The boxplot displays the distribution of the variable Expenditure. The two circles
represent two outliers and the numbers are the corresponding respondent numbers.
3. Now, we want to adjust the horizontal axis. Use the X button on the toolbar to
select it.
4. Select the Scale tab and change the Major Increment into 2. Click Apply.
Note: If your PC shows the units in inches, you can change this setting via
the menu Edit Options. The tab General in this dialog has a
pane with an option list, which has centimeters as a third entry.
Please see Section 9.1 for further details about the settings of SPSS.
6. Since the chart title contains the same text as the axis label, we are going to change
the latter into Amount in (The -sign is inserted by hitting Alt+0128). Enlarge
the font size of the text and justify it to the right side of the axis. (You can also
change the font family etc, if you like).
Now the chart has improved a lot and it looks much better.
For interval/ratio variables statistics like median, mean and standard deviation can be
computed as well. Use the CHART BUILDER wizard if you want to make the histogram
only. This graph will have a legend with the mean value, the standard deviation and
the number of observations. If you want to have a separate summary of the statistics
(as well), then you can use the histogram option of the FREQUENCIES command.
1. From the menus, choose Analyze Descriptive Statistics Frequencies and move
the variable Expenditure into the Variable(s) text box. (If this text box already
contains a variable, click Reset to cancel all previous choices).
2. Click Statistics and select the options displayed in the next figure.
Do the same in the Charts dialog.
4. After that, click OK in the FREQUENCIES dialog to start the analysis. The first block of
the output contains the statistics. The histogram sees rather capriciously. If you are
dealing with a rather large number of classes and relatively low frequencies it is
preferred to adjust the classification.
4. Resize the graph to fill the whole width of the graph area it has behind it. (See next
figure which square to take.)
Dragging this
square to the
right enlarges
the graph to fill
the whole
canvas
5. Move the legend above the graph (upper right corner) and change the background
and the border colours.
Figure 3.12 The histogram transformed into a layout suitable for a publication
The File menu in the Chart Editor does not provide an option to save or export the
chart. It is however possible to save the layout as a template. This is very useful if you
have to produce a number of charts with the same layout. If you save the template SPSS
raises a dialog to select the layout elements to be saved.
With this option you can save the lay-out, not the
chart itself.
This is called a template.
In the output viewer you can export graphs to a graphical file, such as a jpeg file. You
can find this in the Viewer menus under File Export. There are, of course, many
other ways to export your work to different software applications. Browse this entry if
you are looking for a special format.
1. Realize yourself how many categories are needed and what the boundaries must
be.
2. From the menus, choose: Transform Recode into Different Variables. This
procedure creates a new variable with the classification.
3. The last step is entering appropriate value labels and the correct level of
measurement.
We will elaborate these three steps now.
Valid Cumulative
33% =
Frequency Percent Percent Percent
upper bound of the first
Valid 2 4 8,0 8,7 8,7
class =
3 3 6,0 6,5 15,2
lower bound of the
4 3 6,0 6,5 21,7 second class
5 10 20,0 21,7 43,5
6 2 4,0 4,3 47,8
7 1 2,0 2,2 50,0
8 3 6,0 6,5 56,5
10 11 22,0 23,9 80,4 67% =
12 3 6,0 6,5 87,0 upper bound of the
15 4 8,0 8,7 95,7 second class =
18 1 2,0 2,2 97,8 lower bound of the
third class
20 1 2,0 2,2 100,0
Total 46 92,0 100,0
Missing System 4 8,0
Total 50 100,0
With three classes or categories you must realize that each category has about 33% of
the cases in it. More important is that the boundaries are round numbers. Focus your
attention on the Cumulative Percent column and look up the 33% value. So the upper
boundary of the first class is in the neighbourhood of 5. The upper boundary of the
second class (67%) will be around about 10. So we have:
3. Select the variable Expenditure from the list and move it into the transformation
list. Create a new output variable Expenditure_categories and label Expenditure
in restaurant. Click the Change button to create this new variable and click Old
and New Values to define the classification.
4. Enter the classification and use the three Range options as explained. (Note:
through in SPSS means up to and include).
Classification Old Value New Value
Lower through 5 Class 1 Range, Lowest through value: 5 1 and click Add
5 10 Class 2 Range: 5 through 10 2 and click Add
10 through highest Class 3 Range, value through Highest: 10 3 and click Add
5. Continue and click OK in the main dialog. (If the OK button is disabled you probably
did not click the Change button to apply the new variable name and label.)
The new variable arrives in the Data Editor (in the Data View) at the right side. In the
Variable View, the new variable is at the last line. Our last step is to create Value
labels.
Click here to
enter the Value
labels.
7. Change (in the Variable View) the columns Decimals (into 0) and Measure (into
ordinal) of the new variable.
8. Save your data file again! Many changes have been made. (SPSS puts an asterisk in
the title bar before the file name to remind you that the file has not been saved.)
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 56
9. Finally, make a frequency table and a bar chart of the new variable
Expenditure_categories.
Customize the layout as shown in Figure 3.13 and Figure 3.14
Please note that you do not use the original variable! In the FREQUENCIES dialog
you will find the new variable at the last entry of the variable list.
1. From the menus, choose: Transform Visual Binning. Drag and drop Expenditure
from the Variables list to the Variables to Bin list, and then click Continue.
2. In the main Visual Binning dialog, select Expenditure in the Scanned Variable List.
3. Enter Expenditure_Cat2 for the name of the new banded variable and Expenditure
(in ) for the variable label.
After that, SPSS comes up with the cutpoints 5 and 10. You can ask SPSS to make labels
(by clicking the button), but we prefer to do this ourselves.
8. Create (as a final check) a frequency table and a bar chart of the newly created
variable Expenditure_Cat2. Are there any differences compared to the previous
section?
It is also possible to print the SPSS output from the viewer. Again you need customized
tables and graphs because no one wants to receive garbage. In this chapter we have
discussed how to clean up a frequency table and get a graph ready for publication.
In this way it is
Rename every easy to retrieve
block to your results.
section xx or
task yy.
If you document your output file as shown here it will guarantee that you can retrieve
your work without any effort. That is why it is recommended to document your output
in this way.
An output file can be saved by choosing File Save. The contents will be saved in a
file with the extension .spv. Because we want to use meaningful names, we prefer to
use File Save As and give the output file a name by ourselves.
2. Save your output file again with the name Pandion Output Chapter 3. SPSS will
provide it with the extension .spv automatically. Please note that the file is stored in
the folder SPSS Basic Course.
3. Do not forget to save your data. Please save this file in the same folder.
Note If you use the button Open on the toolbar or the keystroke Ctrl+O,
SPSS will try to open a file of the same kind as you are working with.
So, from the data editor you can open another (new) data file, but
you will not be able to open an output file.
You have created three tables which are on top in the SPSS output viewer (see Section
3.2, instruction point 18). Note: only include one of the tables in your selection and
not other parts of the output.
1. Select one of the tables and, from the menus, choose: Edit Copy or right-click
and use the context menu.
3. In WORD: from the menus, choose Edit Paste Special and select the option
Picture (Enhanced metafile) for the best result. The SPSS-object is pasted as figure
into your WORD document.
In WORD 2007: Use the arrow below the Paste button on the ribbon, to get the
option Paste Special. You also can use the short cut Alt+Ctrl+V to get the Paste
Special dialog.
If you select the three tables in one selection, you will get a large
picture which contains all three tables. This is not to be advised,
because you cannot place the table images individually in your
document
1. Select the graph in the SPSS output viewer. In SPSS we have to use the option Copy
instead of Copy objects.
2. Switch to WORD.
3. In WORD: from the menus, choose Edit Paste special and select the option
Bitmap in the dialog.
Explanation In WORD there are two ways to style a picture. Either in line with
the text or floating (the other option). If you want to move the
picture with your mouse to another place on the page or want to
have text beside the picture then you choose one of the floating
styles. Click Advanced to open a dialog to input the coordinates of
the position of the picture on the page. A major drawback of this
floating style is that the picture floats by itself to a place where you
do not want it to be. The option In line with text does not have this
drawback. The picture is fixed in a paragraph like a (very) large
letter and cannot float anymore. That is why the in line with text
style is our favorite.
5. Save the WORD-document by clicking the savebutton (with the disk) on the
toolbar. Name it catering1.doc.
7. Finally enter a title at the top of your WORD-document reading Report of the Suxes
Survey at Pandion and a foot text with your name and class and a page number.
Save this document.
The data file (.sav) is displayed in the Data Editor. The information in the Data Editor
consists of variables and cases.
All your results from running a statistical procedure are displayed in the Output
Viewer. The output produced can be statistical tables, charts, graphs, or text,
depending on the choices you make when you run the procedure. This file always has
the extension .spv and is like a container with a tree structure. The structure is
displayed in the outline pane (on the left side) and the contents pane, containing the
actual output, is on the right handside. This has been discussed in Section 3.6.1.
SPSS syntax provides a method for you to control the product without navigating
through dialog boxes, viewers, or data editors. Instead, you control the application
through syntax-based commands. Nearly every action you can achieve through the
user interface can be achieved through syntax. Using syntax allows you to save the
exact specification used during a session. The easiest way to create syntax is to use the
Paste button located on most dialog boxes. This facilitates repetitive analyses on
several data files in an easy way. You save your syntax file with extension .sps.
Another way to automate tasks within SPSS is the scripting facility. In previous versions
of SPSS, this scripting language is called Visual Basic for Application and is used in
Microsoft Office applications as well. Since SPSS wants to provide software for different
operating systems, they have introduced the Python and the R scripting language into
their software. You can use the spss software not only on Windows systems (Microsoft
Windows XP (Professional, 32-bit) or Vista (32-bit or 64-bit), Windows 7), but also
on Apple Mac 10.5x (Leopard) and 10.6x (Snow Leopard), and Linux. In this
course we will not discuss the syntax and scripting facilities.
(3) How does one assess the range of choice in the basic assortment and in the
luxury assortment?
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 63
(4) How does one assess the customer service offered by the staff?
(6) Is there a need for products which are not in the assortment at this moment?
(7) Who is the customer (gender, student or lecturer, number of study years)?
(8) What is the overall level of satisfaction of the catering services expressed as a
score on a scale from 1 up to 10?
Research Question 1
In the frequency table we see that 28% of the respondents visit the restaurant 5 days
per week. Moreover 28% visit the restaurant 2 days per week and only 4% 4 days a
week. It is important to realize that those who never visit the restaurant are excluded
from the survey.
The chart will make clear that the answer 4 times a week is quite exceptional.
Research Question 2
The expenditures in the restaurant are between 4 and 8 Euro for the largest group of
respondents. The mean value is 8 Euro, but the spread is rather high (the standard
deviation is 4,47 Euro). The histogram clearly shows that the distribution is skewed to
the right.
Since the distribution is skewed to the right, the mean value will be greater than you
might expect. The boxplot shows that this is caused by two outliers of our data.
(See Sections 3.4.6 en 3.4.7 for instructions how on to create and edit a boxplot.)
Research Question 3
We can report the rating of the assortment by publishing two frequency tables.
It is clear that 28% of the customers are not satisfied with the variety of the basic
assortment and 22% are not satisfied with the variety of the luxury assortment.
In this table we can see that almost 50% of the respondents are very satisfied with the
customer service level of the staff. Only 8% think customer service is bad.
Research Question 6
Research Question 7
This research question deals with the variables Gender, Customer_type and YearStud.
We will create a frequency table and a chart for each variable.
Research Question 8
The marking of the catering services is positive. Only 18% of our response group gives
a negative score and the mean score is almost a 7.
To Conclude
In this section we have given an overview of tables and graphs needed to answer the
research questions 1 to 4 and 6 to 8. In your report it is important to be able to discuss
the graphs and tables. Tell a story and mention the remarkable outcomes of the graphs
or tables and explain to the reader what makes this outcome worthwhile. We did this
by formulating a conclusion after each graph or table. You should always consider
whether to include a graph or table in your main text or in an appendix. Remember, all
charts and tables in the appendices need to be referred to in the main text.
4.1 Introduction
In a research project there are usually research questions where the differences
between groups are of interest. In those situations we have a variable (factor) which
defines the groups or levels. A factor like registered years may have several numerical
levels (e.g. 1, 2, 3, 4, and 5) or a factor such as customer type may have several
categorical levels (e.g. Type 1, Type 2, Type 3). These groups or levels are compared to
each other by taking to the other variable into account. The variable which defines the
groups or levels is called a factor, grouping variable or independent variable. The
variable to be compared is called the dependent variable, because its value may be
dependent on the group.
Examples of such research questions in the Suxes Customer Survey at Pandion are (see
also Section 1.1):
(11) Is there a difference between students and lecturers in the amount spent?
From the examples it becomes clear that the dependent variable may have any level of
measurement. That level determines the statistical method to be used. Table 4.1 gives
an overview.
Level of measurement
of the dependent
variable Statistical methods to be used
The other kind of research questions in which two variables are involved deals with the
relationship between those variables. There is said to be a positive relationship if
higher values of the one variable lead to higher values on the second variable and of
course, lower values of the first variable lead to lower values of the second variable. If
we are dealing with a negative relationship, high values of the first variable lead to
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 69
lower values of the second variable. An example of such a research question from the
Suxes Survey is:
(13) Is there a relationship between the score given by customers and the amount
spent?
In Table 4.2 we again see that the level of measurement determines the method of
analysis. (This table only displays the most common situations).
In this chapter, research questions with respect to two variables are analyzed on a
descriptive level by means of statistics, a table or a chart. Later we will perform
statistical tests to see whether the results are significant, meaning valid for the
population as a whole or just (lucky or unlucky) coincidence. The theory can be found
in Berensons Basic Business Statistics.
We will start to analyze research question (11): Is there a difference between students
and lecturers in the amount spent?. The first step is creating a bar chart displaying
the mean expenditures. This analysis is not valid if the grouping variable (here
Customer_type) is dependent on the ratio level variable.
In SPSS, creating a bar chart displaying mean values is a special option in the dialog
Define Simple Bar.
2. From the menus, choose: Graphs Chart Builder. In the Gallery from the
category Bar choose the variant Simple Bar.
3. Move the variable Customer_type into the box representing the horizontal axis
and the variable Expenditure to the vertical axis.
5. Click OK to create the chart. Edit the chart into the layout of Figure 4.1.
Note: Scales on the axes, title of the axes, gridlines, layout of the chart title,
footnote and so on.
Figure 4.1 A bar chart displaying the mean expenditure of students and lecturers
From the chart it is clear that there is a difference in mean expenditure between
students and lecturers. For students the mean expenditure is 7 euro per week and for
lecturers the mean value is almost 12 euro per week. It seems that there is a
relationship between customer_type and expenditure. (An explanation and discussion
of the implications can be given in the Section conclusions and recommendations at
the end of your report).
1. From the menus, choose: Graphs Chart Builder. In this dialog, we will use the
Gallery category Boxplot to select the Simple Boxplot.
If you do not use the Reset button, you will notice that SPSS leaves the variables of
the previous operation in the screen of the Chart Builder.
2. At the tab page Basic Elements, there is the button Transpose which will rotate
your boxplot a quarter of a turn.
4. Again the graph needs some major editing. Make your graph like Figure 4.2.
(See Section 3.4.7 how to edit a boxplot.)
After your analysis, do not forget to undo the split by running SPLIT FILE with the
option Analyze all cases, do not create groups.
2. Select the option Compare groups. With this option the output of the separate
groups are organized in a table. If you want to have separate output blocks, select
the option Organize output by groups.
4. Click OK.
Of course, there is no output because no analysis has been done. The only thing done
which has been done is the a change of a setting allowing SPSS to run each command
as often as there are subgroups.
The output contains a table with the statistics for each group.
(2) After activating SPLIT FILE the data file is sorted to make the
groups. For restoring the original sorting we have created the
variable Respnum. That is why you must always have such a
variable.
(3) The SPLIT FILE status is not stored in the data file. It only
remains in effect for the rest of the session unless you turn it off. If
you start a new session you have to activate SPLIT FILE again.
After your analysis you must not forget to undo the split.
7. From the menus, choose: Data Split File and select the option Analyze all cases,
do not create groups. This resets the split of the data file.
Note If you want to delete the unselected cases you choose the
corresponding option in the panel Output. However, watch out,
after saving your data file, those deleted respondents will disappear
for ever. So, be careful with this option.
In the Data Editor (Data View) unselected cases are marked with a diagonal line
through the row number. Moreover in the status bar the message Filter On is
displayed. The selection procedure generates a new variable named filter_$ with a
value 1 for selected cases and a value 0 for unselected cases. The actual selection is
based on the values of a newly created variable. SPSS uses this variable to work with the
selection. Only after deactivating the filter are you free to delete this variable.
Note You can also make selections based on conditions involving two or
more variables. The &-sign can be used for the AND-operator, the |-
sign as OR-operator and the ~-sign as NOT. Here are two examples.
(1) All male students:
Customer_type = 1 & Gender = 2
(because Customer_type has 1 = Student and Gender has 2 =
Male).
(2) All respondents in the restaurant buying a cheese or ham
sandwich, an other sandwich, a donut, croissant or baguette, or
dairy products (see question 6 ).
product1 = 1 | product2 = 1 | product3 = 1 | product4 = 1
(because product1 up to product4 have 1 = Yes en 0 = No).
1. To create a histogram, in the chart builder, select the option Simple Histogram.
2. Move the variable Expenditure into the X-Axis box and enter an appropriate title
and footnote. That is important because the graph contains no information about
the selection on which it is based. So, enter Student Expenditures as a title and a
footnote containing the current date with your name and class directly after the
copyright sign .
4. Click OK and SPSS will create the next chart for you.
6. From the menus, choose: Data Select Cases and select All Cases.
After clicking OK you see that all cases are available again.
2. Move the variable to be analyzed, Expenditure, to the Dependent List and the
variable which defines the groups, Customer_type, to the Independent List.
The result is a table shown below. (Note: if you only get the row Student, you
probably have forgotten to deactivate the selection of the previous section, e.g. see
action point number 6 on page 79).
4.5 Crosstabs
One other way to compare groups is to do a cross tabulation analysis (called CROSSTABS
in SPSS) with percentages. The percentages can add up across the rows or down the
columns. Usually the percentages are calculated for the levels or subgroups defined by
the independent variable. The important distinction to the previous discussed analysis
is that the variables must have been classified and that both are allowed to have only a
nominal level of measurement. Moreover it is not important (for the CROSSTABS
procedure) which variable is independent and which one is dependent. However you
must know this for your conclusion of course.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 80
The cross tabulation is the basic technique for examining the relationship between two
categorical (nominal or ordinal) variables. The Crosstabs procedure offers tests of
independence and measures of association and agreement for nominal and ordinal
data. The purpose of a cross tabulation is to show the relationship (or lack thereof)
between two variables
We are going to examine the sample to see whether there are differences between male
and female respondents (variable Gender) with respect to the rating of the variety of
products in the basic assortment (Variety_basic). This is a part of research question
(9). Gender is the variable defining the groups or independent (demographical)
variable and the rating might be dependent (Variety_basic).
V=0 no association
V 0,10 a weak association
V 0,25 a rather strong association
V 0,50 strong association
V 0,75 very strong association
V=1 maximal association
By percenting in the correct direction (either within columns or within rows) you can
formulate a conclusion about the direction of association. Generally if you are
analyzing a crosstab with a behaviour variable and a demographic variable it is
preferred to calculate percentages within each category of this demographic variable.
2. Compute Cramrs V and explain its value (no association, a weak association,
strong association, )
3. Start the CROSSTABS procedure by clicking OK and switch to the Output Viewer.
The first output block is a summary with the number of processed cases. This can be
used to check how many observations are used in the table, so that gives you a check
but it is not meant to be inserted into reports.
The cells of the table show the count or number of cases for each joint combination of
values. For example, 5 female rate the variety of products in the basic assortment as
insufficient. It is often difficult to analyze a cross tabulation simply by looking at the
simple counts in each cell. The next sections will discuss how to insert percentages in
the cells and how to calculate statistics. After we make the crosstab again with these
adjustments.
Counts Observed frequency: the count or number of cases (this option must
always be selected).
Expected frequency: the expected count, or theoretical frequency if
there is no relationship between the two variables (statistically
independent).
When you include percentages in a crosstab, you choose either Row or Column, but
not all three options. With the observed counts, which must always be selected, only
one other can be included, because with more than two entries in a cell the table
becomes too large and to hard to interpret.
Note You must make a choice about the cells contents. This choice is
limited to two entries at the utmost, because otherwise the table
becomes too large and too hard to interpret. Usually you choose
between the following options:
- observed counts and row or column percentages;
- observed and expected counts.
Cramr's V is known
from section 4.5.1.
We will use Cramrs V to analyze the strength of the relationship in our response
group. If the relationship is significant, use this statistic to indicate the strength of the
association in the sample.
In Section 6.4 we will discuss the Chi-square cross tabulation test and we will explain
how to formulate the corresponding hypotheses, interpret the level of significance and
come to a correct conclusion.
Note Strictly speaking you are free to place a variable in the rows or
columns of a crosstab. Our advice is to place the independent
variable in the columns and to use column percentages.
2. Ask for percentages based on the subgroups of the demographic variable Gender
(in the columns). Use the button Cells and select the option Column. For details see
Section 4.5.3.
3. Click the button Statistics to calculate Cramrs V. For details see Section 4.5.4.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 84
The result is:
This output block is never included in the report. You just write in the text that
Cramrs V was computed and is equal to 0,279 and the meaning of that value.
Of course, you save this output block in the SPSS output file and it can be put in an
appendix of your report, if needed.
1. Select the contingency table and double-click to enter the edit mode. The edit mode
is characterized by the notched edge around the table.
2. If the toolbar is not visible, from the menus choose: View Toolbar.
The button to invoke the
Pivoting Trays
3. Click the third button of the toolbar to open the Pivoting Trays window.
Pivoting trays provide a way to move data between columns, rows and layers. Click one
of the pivot icons to see what it represents. The shaded area in the table indicates what
will be moved when you move the pivot icon. A pop-up label also indicates what the
icon represents in the table.
4. Drag the Statistics pivot icon from the Rows dimension to the bottom of the
Column dimension
Figure 4.7 The cross tabulation with counts and percentages on the same row
1. Use the Graphs Chart Builder and select the option Clustered Bar.
2. Select the variable which defines the categories and the variable which defines the
clusters (or subgroups).
3. We want to display percentages within the categories of the legend variable Gender.
In the dialog Element Properties you can change the Statistic into Percentage()
and via the button Set Parameters you can set the Denominator for Computing
Percentage. As said before, we want the Legend Variable to be used.
7. Remember how to edit a chart: First click the element of the graph, make the
changes on the tabs in the dialog Properties and click Apply to confirm and see the
results.
You need to change the following elements of the graph:
- height and width of the graph;
- footnote moved to the left under in a smaller font;
- percentages inside the bars and a change of colour and pattern;
- customizing justifying the labels of the axiss;
- adding gridlines and changing their line style into dashed and grey.
1. From the menus, choose: Graphs Chart Builder and select the option Stacked
Bar.
4. In the dialog Element Properties you can set the Statistic to Percentage. Please
note that the denominator for computing percentages now is the Total for Each X-
Axis Category.
5. Continue and click Apply and create the graph with a simple OK.
The result is a graph which is not ready for publication yet, but we will work on that.
7. To arrange the categories in the order Enough downwards to Poor, select the staves
and use the tab Categories on the Properties dialog, and change the Direction into
Descending. Confirm your choice with Apply.
8. In order to hide the text Gender, being the axis title, select the X-axis and uncheck
the option Display axis title on the Labels & Ticks tab of the Properties dialog.
Remember, since we have rotated (transposed) the chart, the X-axis now is in the
vertical direction.
9. You can use the button Hide Legend to hide the legend.
11. Select the horizontal axis (the Y-axis, since we transposed our graph) and (if
necessary) adjust the scaling to 100% as the maximum and the Major Increment to
10.
13. After closing the editor you can decrease the height of the picture in the viewer to
make it a little more sophisticated.
Note Please note that Figure 4.9 has the same percentages as the
crosstabulation of Figure 4.7.
1. From the menus, choose: Graphs Chart Builder and in the category Scatter/Dot
select the option Simple Scatter.
2. Move the variables Expenditure into the Y-Axis box, Visits into the X-Axis and the
respondent number Respnum to the Point Id Label. This latter box becomes
available by checking on the tab Groups/Point ID the checkbox Point ID label.
(Right now we will not use the facility to mark subgroups differently.)
3. Add the text Relationship between Visits and Expenditure as a title to the plot and
do not forget to include your footnote.
4. Double click the graph to activate the editor and take care of the following things.
From the menus, choose Element Hide data labels or use the button on the
toolbar to hide the respondent numbers.
From the menus, choose Options Show Grid Lines, without selecting the
horizontal and the vertical axes to get a grid in both directions.
Adjust the line style of the grid to dashed.
Add a euro sign (Alt+0128) to the numbers on the vertical axis (after selecting the
Y-axis, on the tab Number Format, the box Leading Characters).
The simple linear regression equation used to estimate the linear model reads:
Y = 0 + 1 X
Where
In this equation Y is the dependent variable, in our example Expenditures, and X the
independent variable, thus Visits. Just note that we made this choice in the previous
section already by placing these variables on the Y-axis and the X-axis respectively.
2. Move the dependent variable (Expenditure) to the Dependent text box and the
independent variable (Visits) to the Independent(s) text box.
This output block with the coefficients is important for the equation. The first line
contains the constant, the second line the slope. In our example the regression
equation reads:
Y = 1.005 + 2.236 X .
According to this equation, we can calculate (predict) the expenditure of a person who
pays two visits a week to the restaurant:
^
Expenditure = 1.005 + 2.236 2 = 5.48 euro
Because this outcome is not the real value but our estimation, we use a hat above the
name of the variable in the equation. As you can observe in Figure 4.11 the vertical line
at two visits a week contains observations above and below the amount of 5.48. The
prediction must be understood as an average spending by one who pays two visits a
week to the restaurant. The accuracy of this estimation will be discussed by means of
the coefficient of determination, our subject for the next section.
The closer the coefficient of correlation is to 1 or 1, the better the quality of the linear
relationship is, so the more accurate our prediction will be. In our example we have a
coefficient of correlation equal to 0.725 which indicates a positive association.
To conclude, the closer the points are to the regression line, the better the regression
model can be used for predictions. The measure for this quality concept is the
coefficient of correlation, which measures the strength of the relationship. The square
of the coefficient of correlation is the coefficient of determination, also known as the
percentage of variation explained by the model.
But it is possible to do the regression analysis excluding the constant in the equation.
This can be found as an option in the Linear Regression dialog. The equation reads:
Y = bX
Y = 2.5 X
This is easier to understand: Each extra visit to the restaurant increases the
expenditure with 2.50 euro. Or, the average expenditure in the restaurant is 2.50 euro
per day.
2. In the chart editor, from the menus, choose Elements Fit Line at Total. This
option is available in the context menu (right mouse button) also.
3. You can adjust the line style and enlarge the font size of the determination
coefficient.
(11) Is there a difference between students and lecturers in the amount spent?
Research Question 9
The customer satisfaction of the restaurant is measured by questions 3, 4 and 5 of the
questionnaire, the variables Variety_basic, Variety_luxe and Staff. Section 4.5
discussed how to make a cross tabulation. We are going to make three cross tabulation
with Gender and will also calculate Cramrs V.
Cramrs V
Note: this table was made in WORD on the basis of the three statistic outputs (symmetric measures) of SPSS.
The values of Cramrs V in the three cross tabulations show that there is a rather
strong association between customer satisfaction and gender in the sample. That
means that in our sample there are differences between male and female respondents.
On the whole we see that women are relatively more satisfied about the variety, but
that the male respondents are relatively more satisfied with the customer service level
of the staff. This can be illustrated with a clustered bar chart (see Section 4.6) or a
band diagram (see Section 4.7).
Cramrs V
The low values of Cramrs V indicate that the differences between students and
lecturers are rather small. The (small) differences in the sample can be seen in the
graphs. It is remarkable that 10% of the students rate the service level of the staff as
poor. However, you must be careful in formulating a conclusion. The sample is rather
small (only 40 students and 10 lecturers) to make conclusions which are statistically
valid. In Section 6.4 we shall discuss how to apply the chi-square cross tabulation test
and see that the differences are not significant.
From the graph and the table it becomes clear that the students expenditures are on
the average much lower. Students spend (on the average) 7 euro, whereas lecturers
spend almost 12 euro. It is striking that the spread (standard deviation) in the group
lecturers is much higher than that of the students.
Research Question 12
There seems to hardly be any difference between the mean values of both groups,
although there are some students who are very negative and relatively more lecturers
giving a score of 9. That explains that the mean mark of the lecturers (7.2) is a little bit
higher than that of the students (6.6).
In the scatter plot there is a relationship between Expenditure and Mark. Moreover,
the scatter seems to be around a straight line. So, we can try and see what a linear
model brings.
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -3,868 2,547 -1,518 ,136
Mark for the catering
1,755 ,369 ,582 4,750 ,000
service of Suxes
a. Dependent Variable: Expenditure in the restaurant on a weekly basis
Model Summary
Of course, this output must not be included in your research memorandum itself, but
in an appendix. We conclude that the regression equation reads:
Y = 3.868 + 1.755 X .
The relationship between the variables is not that strong. The coefficient of
determination is R2 = 0.339 meaning that 34% of the variation in expenditures can be
explained by the variation in the marks. (Note: please do not forget to use the word
Note In this example the value of R2 is rather low. That implies that the
association between mark and expenditure is weak and therefore
the equation not that useful. In this reader we will not discuss
statistical tests for the linear model.
In the research questions we examined the differences between men and women and
the differences between students and lecturers. It is obvious that we want to see these
differences in the scatter plot. This can be done with markers for the groups.
We applied the use of a marker in the next figure by activating the option
Grouping/stacking variable (see Figure 4.10). In the scatter plot we will get different
markers for students and lecturers. We conclude that there are no large differences
between the two groups, but, the group of lecturers is actually too small to be able to
make statements.
In this chapter we will discuss how to analyse multiple response variables by means
of tables and graphs. The questionnaire of the Suxes Survey contained one question
where more answers are allowed:
1. If your output file of the previous chapter is still open, close this file by choosing
from the menus: File Close.
We want to save the output of this chapter in a new output file.
As described in the code book (see Section 1.3), we have introduced the codes 1= Yes
and 0= No. Because we want to know how many respondents bought a certain
product, we must count the values 1 (Counted Value).
SPSSreports that a multiple response set was made. The output viewer shows you a
diagram.
This table is just for your information and there is no need to publish it. This table is
created for documenting the output file to remember which variables are included in
the set. So keep it in your output file, but change the entry into 5.1 Multiple Response
Set and collapse this item.
But the most important difference with the other procedures is the editing afterwards
to make the table suitable for publication. With CUSTOM TABLES you design the table
before actually creating it. So you will get tables, which are (almost) ready for
publication. That is very welcome to research companies doing surveys on a daily or
weekly basis, because they have not got time for all that editing work. They can even
paste the syntax of the CUSTOM TABLES procedure to make batch jobs to run an analysis
almost automatically.
The first time you use the CUSTOM TABLES dialog, SPSS advises you to define value labels
for all categorical variables and to set the measurement levels correctly. This is
because CUSTOM TABLES uses this information to build a preview of the table. We
already discussed the importance of Value labels and measurement levels in Chapter 1.
2. Click OK to continue.
In the Variables list, you can find the super variable $Product as the last entry. If you
select this set, the Categories textbox displays the variables contained in the multiple
response set.
3. Drag and drop the super variable $Product into the Rows area.
Later, we will discuss a number of extra facilities, but at this very moment we just want
to see the basic table.
This table shows us how many respondents checked a certain product, indicating that
they bought this recently. To make the table useful for a report, it needs to contain
extra information such as the total number of respondents buying products and not
only counts but also percentages.
6. If, in the CUSTOM TABLES dialog, the first column of the preview table is not selected,
select it with a mouse click. If it is selected, it gets a bright yellow background
and the button Summary Statistics is enabled.
Click this button and move from the Statistics panel the entry Column N %
into the Display list. Adjust the labels and change the decimals to 0.
Finish this dialog with a click on Apply to Selection to return to the main dialog.
We are not done yet. The next step is entering the total.
7. Click in the CUSTOM TABLES dialog on the other button: Categories and Totals. In
the lower part of this dialog you can click the option Total. You can adjust the
label if necessary. Click Apply to return to the main dialog again.
The last step is adding a title and a footer (Caption) to the table.
8. Select the second tab of the CUSTOM TABLES dialog called Titles.
3. Right-click on the Variable label of Gender and uncheck the option Show Variable
Label to remove it from the design.
Just a few clicks, and we have a perfect result (see Figure 5.3)!
Figure 5.3 An overview of the bought products, split up to men and women.
1. From the menus, choose: Graphs Chart Builder and on the tab Gallery under the
category Bar select Simple Bar.
2. Drag the variable $Product (the last one from the list Variables) into the X-Axis
box.
3. In order to get the correct percentages, you need to specify the percentage base. In
the dialog Element Properties, from the Statistics drop down menu, select the
option Response Percentage. Confirm this choice with Apply.
4. Please enter Summary of bought products as a title and do not forget to include
the footnote in this dialog. Click OK to produce the graph.
Rotate the graph (remember, the names X and Y are not updated).
Change the order of the labels of the vertical axis (X) into descending.
Hide the label of the X-axis.
Customize title and footnote, font family, size and justification.
Customize the horizontal axis (Y): scale, add a % sign as trailing character, add
gridlines and change then to dashed, colour grey.
Change the label of the horizontal axis (Y) (if you think this is necessary).
Change the chart size (canvas) into height 13 cm and width 20 cm (or height 300
and width 480, in the points measurement system).
Note If we display the counts in the graph, the shape of the graph is the
same. Only, for comparing purposes, it is better to work with
percentages.
1. Select the Categories tab and select Sort by Statistic in an Ascending direction.
Figure 5.5 A bar chart sorted by the number of products bought by the visitors
1. From the menus, choose: Graphs Chart Builder and select Clustered Bar.
2. Move the variable Gender into the box Cluster on X: set color.
3. Select the variables Product1 up to Product9 and change the level of measurement
into Scale with the context menu (right mouse button).
4. Drag the nine product variables into the Y-axis box. SPSS will ask you to confirm
this operation by showing a popup which contains the message that the variables
will be summarized and that the name of each variable will be used as a category in
the chart. Click OK to confirm.
Note Since the variables Product1 up to Product9 are coded with 1= Yes
and 0= No, the function Percentages greater that 0.5 gives as
exactly the percentage of people buying this product. If you have
coded the variables with 1= Yes and 2= No, you should use the
function Percentage greater than 1.5.
6. On the Basic Elements tab you can find the Transpose button to rotate your chart
by a quarter of a turn.
The last thing we need to adjust is the way SPSS will cope with missing values. If a
respondent has not filled out one of the product questions, but the others are correct,
we do not want to exclude this person from our graph. By default, SPSS does this the
severe way: the listwise deletion. However, we want to maximize the use of our date,
so we have to change this setting.
8. Use the Options button to get the Options dialog and in the pane Summary
Statistics and Case Values, check the option Exclude variable-by-variable to
maximize the use of data. Click OK and create the graph.
Note The vertical axis starts in the origin which is down to the left in the
diagram. That is why (in our perspective) the products are
displayed in reverse order. On the Categories tab of the Properties
dialog you can change the sort. Unfortunately we cannot adjust the
sort within the legend, that is why we place the two beside each
other, down rightin the chart.
In our analysis (see Section 5.5 how to create the chart) we compared men and
women. There were hardly any differences between the groups (except for the soup).
Of course we can compare students and lecturers. That leads to the next table.
By comparing the percentages, you can analyse the differences between students and
lecturers. But with a graph it is easier.
Since most of these psychological properties exist on a continuum ranging from one
extreme to another in the mind of the respondent, it is common practice to design
scaled-response questions in an assumed interval-scale format. Sometimes numbers
are used to indicate a single unit of distance between each position on the scale.
Usually, but not always, the scale ranges from an extreme negative through a neutral
to an extreme positive designation. The neutral point is not considered zero or an
origin; instead, it is considered a point along a continuum.
Extremely Neutral Extremely
Negative Positive
The Likert-type response format, borrowed from a formal scale development approach
developed by Rensis Likert, has been extensively modified and adapted by marketing
researchers, so much, in fact, that its definition varies from researcher to researcher.
Lifestyle inventories are valuable to marketers in a number of ways, not the least of
which is as a market segmentation basis and tool. To perform market segmentation, a
researcher must use a very large number of lifestyle statements, and a great many
respondents must be involved in the survey. Herein lies a dilemma, for potential
respondents, even panel members who are compensated for their participation in
surveys, dislike long questionnaires. See Burns and Bush Marketing Insight 10.3 that
describes a way to greatly reduce the size of the questionnaire but still achieve the goal
of a lifestyle market segmentation survey.
High prices _____ _____ _____ ____ _____ _____ _____ Low prices
Inconvenient location _____ _____ _____ ____ _____ _____ _____ Convenient location
For me _____ _____ _____ ____ _____ _____ _____ Not for me
Warm atmosphere _____ _____ _____ ____ _____ _____ _____ Cold atmosphere
Limited menu _____ _____ _____ ____ _____ _____ _____ Wide menu
Fast service _____ _____ _____ ____ _____ _____ _____ Slow service
Low quality food _____ _____ _____ ____ _____ _____ _____ High-quality food
A special place _____ _____ _____ ____ _____ _____ _____ An Everyday place
Figure 6.3 The semantic differential scale is useful when measuring store, company or
brand images
As you look at the phrases, you should note that they have been randomly flipped to
avoid having all the good ones on one side. This flipping procedure is used to avoid
the halo effect. We will explain this effect with an example. Suppose you have a very
positive image of Suxes at Pandion. If all of the positive items were on the right-hand
side, you might be tempted to just check all of the answers on the right-hand side.
However it is entirely possible that some specific aspect of the Suxes restaurant might
not be as good as the others. Perhaps the restaurant is not located in a very convenient
place, or the menu is not as broad as you would like. Randomly flipping favourable
and negative ends of the descriptors in a semantic differential scale minimizes the halo
effect.
One of the most appealing aspects of the semantic differential is the ability of the
researcher to compute averages and then to plot a profile of the brand or company
image. Each check line is assigned a number for coding. Usually, the numbers 1, 2, 3,
and so on, beginning from the left side, are customary. Then an average is computed
for each bipolar pair. The averages are plotted as you can see them, and the marketing
researcher has a very nice graphical communication vehicle with which to report the
findings to his or her client. We will discuss this in Section 6.7 (see Figure 6.10).
You have given us notice to leave. With this questionnaire we would like to get an insight in your
motivation, reasons for moving and get your evaluation of the house and its environment. Moreover we are
highly interested in your opinion about our services.
A. Personal reasons:
Considerations of health
Divorce or end of a relationship
Change in work or location of job
Change of the family
A marriage or get together
Buying a house
Moving to a retirement centre, service flat or sheltered accommodation
_______________________________________
5. Leasing History
The number of years I have leased this house or apartment
0 5 years
5 10 years
exceeding 10 years
Our co-ordinator will collect this form after his inspection visit.
Would you be so kind to fill it out before?
We appreciate your cooperation. Thank you!
4. Drag the variables Environment1 to Environment6 into the box Rows of the table
grid.
6. Moreover, you can suppress the display of the statistics labels in the column
headings by checking the option Hide.
The result is a concise table with an almost perfect lay-out, as is displayed in Table 6.5.
Table 6.5 Table with the frequencies for each aspect of the environment
3. Click (in the pane Define bottom left) the button Summary Statistics.
4. Add Row N % to the Display grid, adjust the labels (In %) and change the
Decimals setting into 0. Finally, click Apply to Selection to finish this dialog.
5. On the pane Summary Statistics, from the Position list, we select the option
Rows. Please note that the percentages will be placed directly below the counts
which improves the readability.
6. Click (in the pane Define) the button Categories and Totals and ask for
showing totals in the table.
Figure 6.6 Table with frequencies and percentages for each aspect
Please note that this table is ready for publication. The only adjustment we made is the
line style, the mark-up of the caption and the total column.
1. Copy the table from Section 6.3.1 (Table 6.5) into EXCEL.
2. Select the range A2:E8 and, from the menus, choose Insert -> (Graphs:) Bar.
From the category 2D-bar take the third option: 100% stacked bar.
Again it turns out that a picture is clearer than a table. The first three aspects, Green
place, Parking place, Road and traffic safety, are rated as Good or Sufficient by more
than 60% of the respondents. The other three aspects, Shops, Public transport, Social
security, have this positive rating by roundabout 80% of the respondents.
2. On the tab Gallery, select Bar and double click the icon Simple Bar.
In the SPSS CHART BUILDER, it is impossible to display mean values of variables having
an ordinal level of measurement. So we have to temporarily change the measurement
level into Scale.
4. Drag the variables Environment1 to Environment6 into the Y-axis box. SPSS will
announce that it will use the values to summarize the data and that it will use the
names of each variable as categories in the chart.
6. Use the fourth tab Titles/Footnotes to enter the title Rating of the environment.
Enter a footnote with the current date, your name and class. Do not forget to
apply your changes every time.
7. Finally click OK and SPSS will create the chart for you.
Rotate the chart a quarter of a turn (note that the references X en Y will be kept to
the original axes).
Reverse the labels of the vertical axes (X) on the tab Categories.
Adjust the scaling of the horizontal axis (Y) on the tab Scale: enter 1 as the
minimum, 4 as the maximum and a major increment of 1. Add gridlines also.
So, we are going to recode the variables as is pointed out in the next table. To be most
safe, we will create new variables in order to keep the originals.
TRANSFORMATION TABLE
Value 1 (= good) 4
Value 2 (= sufficient) 3
Value 3 (= insufficient) 2
Value 4 (= bad) 1
2. Enter the new names and labels and confirm by clicking Change. Click the button
Old and New Values and enter the transformation table (see next dialog).
4. Enter in the SPSS Data Editor the value labels for one of the new variables and
copy those to the other five.
7. Create a bar chart of the mean values of the six new variables in the same way as
the previous section.
8. Edit the chart in the same way (see the note at the end of this section). Moreover,
add four text boxes with the text Good, Sufficient, Insufficient and Bad.
Nowadays this line diagram (remembered as the thunderbolt diagram) is also used for
the modified likert scale and with an abuse of the name referred to as semantic
differential itself. In this section we will produce a semantic differential for the rating
of the environment (question 3 of the questionnaire) to compare people who move
after 0 to 5, 5 to 10 or more than 10 years. To get a diagram in the same style as the
previous section we will use the variables Environment1pos to Environment6pos of
Section 6.6 again.
1. From the menus, choose Graph Chart Builder and on the tab Gallery the option
Line. Activate the icon Multiple Line.
4. On the tab Basic Elements you can use the button Transpose to rotate your chart
by a quarter of a turn.
5. Activate on the tab Titles/Footnotes the options Title and Footnote1. Enter Rating
of the environment as a title and restate your name, class and current date in
Footnote1. Confirm (again and again) your actions with Apply.
Of course we need to style this chart a little to make it ready for publication.
A conclusion from the semantic differential could be that people living for 5 to 10 years
in their house are most satisfied with the environment of their houses. People moving
within 5 years are not that satisfied with the shops in the neighbourhood of their
houses. People who have rented their place over 10 years are less satisfied with the
social security.
6.8 Assignment
The questionnaire from the rental housing organisation MoveOn contains another
scaled-response question, i.e. question 4. Use the techniques we discussed in this
chapter to report the results of this part of our survey in tables and graphs. Please
finish Figure 6.11 to Figure 6.15 and give a conclusion after each chart.
Figure 6.14 A bar chart displaying the mean values of the transformed variables
If we take a look at the rating of the services of the rental housing organisation
MoveOn it is clear that all aspects are rated positive. A comparison between people
moving within 5 years, between 5 and 10 years and after 10 years does not show us
large differences between these groups.
Case
Aquariade is a swimming pool in a medium sized city in the Netherlands. The
management team of Aquariade has decided that they need to pay more attention to
the opinions and wishes of Aquariade visitors. To show these opinions and wishes,
the management has decided to carry out a survey with the visitors of the swimming
pool.
Problem definition
In which way can Aquariade improve their market position by giving more attention
to the opinions and wishes of the customers?
On the basis of this problem definition they have developed research objectives for
the quantitative research. The first four objectives are descriptive and the others are
explorative. By means of a statistical test one has to prove whether the statement is
valid for the population.
Research objectives
(1) What is the opinion about the entrance fee, the overall hygiene, the visiting
hours, the kindness of the staff and the temperature of the pool water (in the
sample)?
(2) Is there a difference between the three age groups in the opinions about these
four aspects (in the sample)?
(3) Is there a difference between the customers who visit the sauna and the
customers who do not visit the sauna in the opinions about these four aspects
(in the sample)?
(5) Does the sample give a good view of the total customer population?
(6) Are there significant differences between the three age groups in the usage of
the sauna?
(7) Are there significant differences between the three age groups in their opinion
about the overall hygiene (a), the visiting hours (b), the kindness of the staff (c)
and the temperature of the pool water (d)?
(8) Is there a significant difference between men and women in the number of
visits to Aquariade?
(9) Is there a significant difference between the three age groups in the number of
visits to Aquariade?
(10) Does the total opinion about Aquariade relate significantly to the gender or the
age of the visitor?
(11) Does the opinion towards the entrance fee have a significant influence on the
total number of visits that the customers paid in the last two months to
Aquariade?
(12) Is there a significant difference between customers who visit the sauna and
customers who do not visit the sauna in their rating of Aquariade?
According to the available information 35% of the customers are younger than 25,
20% are between 25 and 50 and 45% are 50 years old or older. It is also known that
60% of the customers are female and 40% are male.
We will start by making a frequency table for the variables age and gender.
We will use the chi-square goodness-of-fit test to compare our sample with the known
population distribution. Berenson (Basic Business Statistics) discusses the theoretical
aspects of the test, like hypotheses, calculation of the test statistics and assumptions.
The formal hypotheses are
We will start to analyse the variable gender. After that, we will discuss age.
4. On the second tab, Fields, remove all variables, and leave Gender as the only one
to be tested.
6. The Chi-Square test is the second one presented by SPSS as a customized test.
Click the Options button to enter the expected probabilities as relative frequencies.
For gender we expect a 40% - 60% distribution, so that makes the odds 4 to 6.
Note Although you might expect to enter the expected values, SPSS asks
you to enter them as percentages, in a decimal format. SPSS will
calculate the expected values. However, you must enter the figures
corresponding to the codes in the codebook. Because we have
defined 1= Male and 2= Female we start to enter the percentage
of males in the population and after that the percentage of females.
The only problem here is that we cannot enter 40% and 60%, so we
make it 0.4 and 0.6.
7. Close the Chi Square Test Options by clicking OK and Run the test.
SPSS will produce the following output. On basis of the p-
value (here 16,5%)
you can perform
the test at once.
If you double click on this item, SPSS will open the Model Viewer. You will see a bar
chart displaying the observed and expected values.
Figure 7.2 SPSS output of the chi-square goodness-of-fit test (variable Gender)
There are differences between the observed frequencies Observed N and the
(theoretical) expected frequencies Expected N. To perform the test we will use the
value of the Asymp. Sig. which means asymptotic significance. This value (here 0.165)
is the right tail probability (p-value) in the chi-square distribution. Because this value
exceeds =5% we cannot reject our null hypothesis (H0). The consequence will be that
we have no reason to doubt the representativeness of our sample with respect to
Gender.
8. Repeat this analysis for the variable Age. Use the theoretical percentages
mentioned in the beginning of this section.
Figure 7.3 SPSS output of the chi-square goodness-of-fit test (variable Age)
With the p-value of 0,316 (Asymp.Sig.) we can perform the test at once: It exceeds the
significance level =5%, so we have no reason to reject the null hypothesis. We must
(6) Are there significant differences between the three age groups in the usage of
the sauna?
To answer our research objective we can make a cross tabulation. To test whether the
relationship is significant we use the chi-square crosstab test. This option can be found
in the dialog Statistics (see also Section 4.5.4).
2. Move the variables Sauna and Age into the Row(s) and Column(s) textboxes
respectively.
Cramrs V is known
from section 4.5.1
4. Use the button Cells to display the expected frequencies in the cells (also).
Observed counts
(to be displayed always)
The output shows us a cross tabulation with the observed counts and the expected
counts displayed in the cells.
After this table SPSS gives the results of the chi-square analysis and of the calculation of
Cramrs V.
Cramrs V
equals 0.20
H0: In the population, there exists no relationship between age and the usage of
the sauna.
H1: In the population there is a relationship between these two variables (or,
the variables are dependent).
2. Calculate the value of the chi-square statistic and check whether the value
mentioned as Asymp. Sig. (p-value) is greater or less than the significance level
you are using.
5. With the value of Cramrs V you are able to characterise the magnitude of the
relationship (we refer to Section 4.5.1). Finally by percenting the crosstab you
can compare the elder people with the younger. You can for example show
whether the elder people are using the sauna more (or less) often than the
younger. This last conclusion applies only to the sample, strictly speaking.
Note If you conclude not to reject the null hypotheses, there is no need
to interpret the value of Cramrs V, because there is no evidence
for any relationship. It is important to say that the differences
between the column or row percentages are not significant, in this
case.
In our example we have a certain degree of dependence between Age and the Usage of
the sauna. We are eager to know to which age group the sauna is more favourable. By
comparing the expected frequencies in the cross tabulation you can discover where
discrepancies can be found. E.g. in the group 50 year we find an observed count of
34 where the expected count equals 25.7. Within the two other age groups it is the
other way around. We must conclude that in the sample the elder people 50 year
are using the sauna on a more frequent basis than the younger.
5. Recreate the cross tabulation of the variables Sauna and Age in the Row(s) and
Column(s) textboxes respectively. Ask for percentages to compare the age groups.
(The expected values are left out now, of course). Make the layout of your table the
same as in Figure 7.4.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 151
Figure 7.4 Crosstab of Age and Sauna usage, with column percentages
The table of Figure 7.4 indeed shows a clear difference between the age groups: Within
the group 50 years 45% use the sauna, in the two other (younger) groups this
percentage is equal to 26%.
6. From the menus, choose Graphs Chart Builder and on the tab Gallery for the
option Bar. Double click the icon Simple Bar.
8. Change (temporary within the Chart Builder) the measurement level of the
variable Sauna with the right mouse button into Scale and move it into the Y-axis
box.
Since the codes of the variable Sauna have been chosen as 1= Yes and 2= No and we
want to display the percentage Yes answers, we ask SPSS to calculate the percentage
of cases less than 1.5.
10. The button Set Parameters is used to enter the value 1.5 at the place of the
question mark.
12. Finally, create the chart and customise it into the lay-out of Figure 7.5.
Note It is most important that you present your research findings fast
and clear. For every analysis you have made, you must wonder
what the practical relevance is and what you want to say about it.
In the research report you include the crosstabulation, of course. In
the main text you only include the conclusion in plain text, without
all statistical details. The chi-square output can be included in an
appendix, if you want to. Of course you save all tables and graphs
in your spss output file to have them available quickly in case there
might be questions about the results.
Rule of Cochran
(1) All expected frequencies must exceed 1
(2) In at most 20% of the cells an expected frequency less than 5 is allowed.
In this section we will discuss an example which violates Cochrans rule. However, by
collapsing two rows in the cross tabulation we can satisfy Cochrans rule. In the next
section we will discuss when these elaborations are worthwhile in practice.
Research objective (7) deals with the opinion of customers about the visiting hours.
The management of Aquariade wants to know whether the customers are satisfied,
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 153
and whether there are differences between the age groups. We start by making a
frequency table of the opinion about the visiting hours. After that, we will make a cross
tabulation to see whether the differences between age groups are significant. In this
analysis the conditions of the chi-square test are not met. By collapsing classes we get
a smaller cross tabulation with the expected values large enough to do a valid analysis.
In this section we will show you how to collapse classes of a variable. The next section
will discuss when this is worthwhile in practise and when this method will have no
result.
1. Construct a frequency table of Aspect2, the opinion about the visiting hours.
Customise this table to make it suitable for publication.
From this table it is clear that 13% of the visitors are not satisfied with the visiting
hours. To know whether there is a relationship with the age of the visitors we construct
a cross tabulation and perform a chi-square crosstab test.
In this table we discover differences between the age groups. For example, within the
age group 25 -< 50 a relatively large group (13%) thinks the visiting hours are bad.
That is substantially more than within the other age groups. But there are more
differences between the age groups. By means of a chi-square crosstab test we are able
to detect whether these differences are significant.
With the numbers displayed in the footer of the chi-square output we can easily check
Cochrans rule.
In our case, the minimum expected value equals 1.82. So the first requirement is met.
Because in 5 out of 15 cells (that is 33%) the expected count is less than 5, the second
requirement is not met. So we need to adjust the table dimensions in order to perform
a valid chi-square analysis.
3. Construct the cross tabulation again, but display expected frequencies instead of
percentages.
The table in Figure 7.8 makes clear that the problems can be found in the last two
rows, the categories not so good and bad. In these rows we have expected values
which are less than 5. A solution is to collapse these two categories into one category,
with the label negative.
This new variable Aspect2Adjusted needs to get Value Labels and the right
measurement level.
8. Construct via the menus Analyze Descriptive Statistics Crosstabs a new cross
tabulation, but now with the variable Aspect2Adjusted instead of Aspect2, of
course. Calculate the chi-square p-value as well.
The footnote of the output block Chi-Square Tests shows us that no cells (0%) have an
expected count less than 5. The smallest value equals 5.01. Now we meet Cochrans
rule (easily).
The p-value (0,4%) is less than our value, so we must come to the conclusion that in
the population there is a relationship between age and the opinion about visiting
hours. More practically stated: A significant difference between the age groups with
respect to their opinion about the visiting hours exists.
(7) Are there significant differences between the three age groups in their opinion
about the overall hygiene (a), the visiting hours (b), the kindness of the staff (c)
and the temperature of the pool water (d)?
In the previous section we discussed the aspect of the visiting hours. In order to
perform a valid chi-square test we had to recode the variable Aspect2 and combine two
classes.
1. Create for the other three variables Apect1, Aspect3 en Aspect4 a cross tabulation
with the variable Age. Print the chi-square statistics also.
2. Check Cochrans rule. You will see that the cross tabulations with Aspect1 (overall
hygiene) and Aspect3 (Kindness of staff) do not satisfy Cochrans rule.
In both tables the p-value is (very much) greater than =5% and you cannot reject the
null hypothesis. Or, stated differently, in both cases there is no statistical evidence for
a relationship between age and that aspect in the population. So, in the population,
there is no (significant) difference between the age groups and their rating of overall
hygiene and kindness of staff.
But, Cochrans rule is not met! But before collapsing classes, it is wise to review a
practical rule formulated by Bert Nijdam:
(2) In at most 20% of the cells an expected frequency less than 5 is allowed.
If those two requirements are not met, you must take action, like collapsing classes,
or excluding classes from the analysis.
In a cross tabulation with a p-value greater than , you do not reject the null
hypothesis. There is no statistical evidence for a relationship between the variable and
even after collapsing classes, there will be no statistical evidence. In this situation
Cochrans rule is irrelevant.
Both chi-square analyses in Figure 7.10 have a p-value greater than . Although
Cochrans rule is not met, we do not have to collapse classes, because we are not able
to reject the null hypothesis. Our conclusion (there is no relationship in the
population) remains valid.
We will now focus on research objective (9): Is there a significant difference between
the three age groups in the number of visits to Aquariade? Another formulation might
be: Is there a relationship between age and the number of visits to the swimming pool
Aquariade?
1. Construct a cross tabulation with the variables Number of visits and Age.
You will see that this cross tabulation is far too large and that too many expected
frequencies are less than 5 (45 to be precise) and some are even less than 1 (because
the minimum expected count equals 0.25).
A way to make a better cross tabulation is to band the Number of visits into a new
variable with only a few classes. The creation of a categorical variable from a scale
variable is discussed in Section 3.5.
2. Create a suitable classification for the variable Number of visits in four classes.
Take care that every class contains at least 15% of the observations. See Section
3.5.1 to find the border values.
Aantal In %
Of course, your classification can be
Sometimes 48 27%
different. If you have at least 15% in the
Regular 42 24%
smallest class, it is fine.
Often 42 24%
Very Often 43 25%
Totaal 175 100%
Figure 7.12
Note Most likely, you do not want to use these vague terms to
characterize the classes. But we want you to find the borders
yourself and come up with your own classification!
Chi-Square Tests
Asymp.
Sig.
Value df (2-sided)
Pearson Chi-Square 19,648 a 6 ,003
Likelihood Ratio 19,691 6 ,003
Linear-by-Linear Association ,062 1 ,804
N of Valid Cases 175
Approx.
Value Sig.
Nominal by Phi ,335 ,003
Nominal Cramer's V ,237 ,003
N of Valid Cases 175
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null
hypothesis.
H0: In the population there is no relationship between age and the number of
visits to the swimming pool.
3. We have to reject the null hypothesis and conclude that there is a relationship
between Age and the Number of visits to Aquariade.
4. This means that there is statistical evidence indicating that the age groups are
different with respect to the number of visits to the swimming pool.
5. The value of Cramrs V in this crosstab equals 0,237. That means that in our
sample the relationship is rather strong. The differences between the age groups
are substantial in our survey. When we compare the percentages it becomes
clear that relatively younger people are visiting the swimming pool, but that
people in the age group 25-50 years are not visiting the swimming pool on a
frequent basis. This last conclusion is only valid for our survey, formally
speaking. A graphical representation of the percentages is shown in the band
diagram of Figure 7.16.
Figure 7.16 The band diagram displaying the differences between the age groups clearly
8.1 Introduction
The case of Chapter Error! Reference source not found., Aquariade, a swimming
pool in a medium sized city in the Netherlands, also had a couple of research questions
about the differences between groups of visitors:
(8) Is there a significant difference between men and women in the number of visits
to Aquariade?
(9) Is there a significant difference between the three age groups in the number of
visits to Aquariade?
(10) Does the total opinion about Aquariade relate significantly to the gender or the
age of the visitor?
(11) Does the opinion towards the entrance fee have a significant influence on the
total number of visits that the customers paid in the last two months to
Aquariade?
(12) Is there a significant difference between customers who visit the sauna and
customers who do not visit the sauna in their rating of Aquariade?
All these questions relate to two variables. We already discussed in Chapter 4 how to
analyze these kind of questions. It is important to recognize which variable is the
independent one, that defines the groups, and which one is the test variable or
independent variable. This chapter will extend Chapter 4 which only described
potential differences in tables or charts. This chapter will answer the question whether
the differences are significant in the population by running statistical tests. That
means that for every difference we have in the sample, we will test the null hypothesis
(There are no differences in the population). If we have to reject the null hypothesis,
we can report that we have statistical evidence supporting the statement that in the
population there is a difference between males and females with respect to the number
of visits to the swimming pool on a monthly basis, if we relate it to the first research
question.
The test variable (dependent variable) can have any level of measurement as you might
have noticed from the research questions. This measurement level determines the
statistical test to be used in combination with the number of groups to be compared. It
is important to distinguish whether two groups or more than two groups are involved.
The next table summarizes this and indicates which test is to be used.
Mann-Whitney-test Kruskal-Wallis-test
( 8.3) ( 8.5)
Mann-Whitney-test Kruskal-Wallis-test
Ordinal
( 8.3) ( 8.5)
This chapter will discuss how to run the tests in SPSS. However, before testing whether
the differences with respect to a variable are significant, we strongly advise you to start
by creating a chart to get some insight into the potential differences. Each section will
begin by discussing a graphical display first.
We will start to analyze research question (8) Is there a significant difference between
men and women in the number of visits to Aquariade?. It is clear that this involves
two groups. The test variable (dependent variable) in the research question is the
variable Visits, representing the number of visits to the swimming pool in the last two
months. The independent variable (which defines the groups) is Gender.
In Section 4.2 we discussed how to create a bar chart displaying the mean values of
each group and in Section 4.3 how to compare groups by means of a boxplot.
2. Create a bar chart that displays the mean number of visits of males and females.
Customize your chart to the lay-out of Figure 8.1 (see Section 4.2 if you need
help).
3. From the menus, choose Graphs Chart Builder. On the tab Gallery select the
category Bar and double click the icon Simple Error Bar.
4. Drag the variable Gender into the X-axis box, and the variable Visits into the Y-
axis box.
5. Use the tab Titles/Footnotes to enter the title 95% Confidence Interval for the
mean number of visits. Do not forget to include a footnote with your name and
class at this very moment.
If we take a look at these confidence intervals we see that there is a great overlap. So
we expect that the difference between men and women with respect to the number of
visits will not be significant. Of course, we will run the test to support this argument.
The statistical test involves a statement about the parameters of the two populations,
e.g. the mean number of visits of men (men) and the mean number of visits for women
(women). Let us start formulating the hypotheses.
The first block displays statistics for both groups. Our research has a sample with 79
men and 96 women. For the male group, the mean number of visits equals 11.95, for
the females it equals 11.33. The standard deviations in the two groups are almost
equal. The statistic Std. Error of the Mean is calculated by dividing the standard
deviation by n. If you multiply the standard error of the mean with the z-score 1.96
and respectively subtract from or add to the mean value you will get a 95% confidence
interval for the mean value of each group. It must be clear to you that these intervals
have a major overlap.
The second block displays the result of the t-test. Please note that the two lines refer to
two different situations. If the variances (or standard deviations) can be assumed to be
equal, the first row applies. If they are different, you must use the second row.
Conclusion: We must use the first line of the output (equal variances assumed)
Conclusion: In the population, there is no difference between men and women with
respect to the number of visits to the swimming pool.
A way to say this is: The difference between men and women with respect to the mean
number of visits is not statistically significant.
However, if we decide that the probability distributions in the two groups are
significantly different from the normal distribution, we have to use the method we will
discuss in the next section. This method will use rank numbers instead of the raw
scores.
In Section 8.5.2 we will discuss how to check whether the normal distribution fits the
sample distribution.
Since the Mann-Whitney rank test only uses the ordinal character of the test
variable, we are actually comparing the medians of the two groups instead of the mean
values. There are examples of test variables (like income) that have large outliers, so
the median is a better measure of central tendency than the mean in those situations.
These large outliers will raise the suspicion that the normal distribution will not fit.
This also motivates to apply a non-parametric test, like the Mann-Whitney rank test.
It will be clear that if the ordinal test variable only has a limited number of categories
(like a 5-point scale ranging from very bad to very good) the ranking process is hard
because of the large number of ties. We advise to apply the Chi-square crosstab test
and calculate the percentages to compare the groups.
We will discuss the Mann-Whitney rank test by answering the research question of the
previous section: (8) Is there a significant difference between men and women in the
number of visits to Aquariade?. The application of the t-test as described in the
previous section is to be preferred because all conditions are met. We can apply the
Mann-Whitney rank test as well, although the power of this test is less than the power
of the t-test.
1. Create (and customize) a boxplot to compare the two groups. In Section 4.3 you
can read the instructions.
The application of the Mann-Whitney rank test in SPSS is described in the next steps.
2. In the dialog Nonparametric tests, on the first tab Objective, select the option
Compare medians across groups.
3. On the second tab Fields, select the option Use custom field assignments, move
the variable Visits into the Test Fields box and Gender into the Groups box.
4. On the third Settings, select the option Mann-Whitney and unselect the Median
test, since the latter applies for more than two groups.
5. Click Run and SPSS will produce the results in a model. You will see the summary
in your output viewer.
The output might be a little overwhelming. The chart tries to display two histograms to
compare the two groups. In our opinion the boxplot we made in the previous section is
a better way to compare the two groups. The bottom line of the model viewer allows
you to navigate to the results of other tests and other variables. Since we only specified
one test for these two variables the lists do not contain other entries.
1. From the menus, choose Graphs Chart Builder. On the tab Gallery, choose the
option Bar and double click the icon Simple Error Bar. Use the instructions in
Section 8.2.1 to construct a 95% confidence interval for the mean number of Visits
for the three age groups.
This graph shows us a difference between the age groups. The middle group is totally
below the other two. People in the age of 25 to 50 do not come to the swimming pool
such a frequent basis. Again we must ask the question whether this holds for the
population as a whole. That is to be answered with the application of a statistical test.
H0: The mean values of the three groups are equal, e.g. 1=2=3
From this table we will use the significance. This value equals 0.6% and is less than
our alpha value of 5%. We will reject our null hypothesis and we can state that we have
statistical evidence that there is a difference between the groups. The three mean
values of the number of visits to the swimming pool are not equal.
Equal variances
or not?
In this multiple comparison the difference between the middle group and the other
two are significant, because both p-values (0.5% and 3.1%) are less than the alpha
value of 5%.
Conclusion: When we compare the age group 25 to 50 years with the other two
groups we can conclude that this group does not visit the swimming as frequently as
the other two groups. Since these differences are significant, our statement holds for
the population as a whole.
Another point of attentions is that the grouping variable is the independent variable
and the test variable the dependent one. To illustrate this: you cannot compare the
mean age of frequent swimmers with the age of anomalous swimmers to jump to the
conclusion that the former group is younger than the latter group. It is clear that age
can have its influence on the number of visits, but not the other way around, more
visits will not decrease the age. Although, it might be a good slogan for Aquariade:
Swimming keeps the age away.
2. Normality
The second assumption states that the sample values in each group are from a
normally distributed population. Just as in the case of the t test, the one-way ANOVA F
test is fairly robust against departures from the normal distribution. As long as the
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 174
distributions are not extremely different from a normal distribution, the level of
significance of the ANOVA F test is usually not greatly affected, particularly for large
samples. In Section 8.5.2 we will discuss how you can assess the normality of each
subgroup.
3. Homogeneity of Variance
The third assumption states that the population variances of the groups are equal (i.e.
12 = 22 = 32 ). If you have equal sample sizes in each group, inferences based on the
F distribution are not seriously affected by unequal variances. However, if you have
unequal sample sizes, then unequal variances can have a serious effect on inferences
developed from the ANOVA procedure. Thus, when possible, you should have equal
sample sizes in all groups.
A method to test whether all the variances of the populations are equal, is the Levene
test. We will test the null hypothesis:
H0: 12 = 22 = 32
against the alternative that not all variances are equal.
We will discuss Levenes test in SPSS right now (see also Section 8.2.2).
7. Click the Options button and check the option Homogeneity of variance test.
If you want to have the statistics for the subgroups you can check the option
Descriptives. A plot of the means is available also.
If we compare the significance, 42.9% with our alpha value (5%) it is clear that we can
conclude that the variances are equal. Our choice in Section 8.4.3 to take the Equal
Variances Assumed side for the post hoc test turns out to be correct.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 175
8.5 Comparing More than Two Groups on an Ordinal
Variable: Kruskal-Wallis Rank Test
If you want to compare more than two groups and think (or fear) that there is a
departure from the assumptions of ANOVA, or the level of measurement of the test
variable is ordinal, SPSS offers you non parametric test procedures. Since these
procedures use rank numbers (just as the Mann-Whitney test) there is no need for
assumptions about the probability distributions involved.
Again, we must warn you when you use these methods for a test variable having an
ordinal level of measurement with only has a few classes. A rather large sample size
will lead to many ties and we prefer to use the chi-square test for a crosstabulation (see
Section 7.2).
In this section we will discuss the Kruskal-Wallis test to answer the same research
question as we dealt with in the previous section: Is there a significant difference
between the three age groups in the number of visits to Aquariade?
1. Create (and customize) a boxplot to compare the three age groups. In Section 4.3
you can read the instructions.
In this figure, we see that the median of the middle group is far less than the medians
of the other two groups. So we might conclude that the number of visits of the 25 to 50
years group drops back. But does this statement hold for the entire population?
And the alternative hypothesis is that not all three medians are equal.
The Hypothesis Test Summary gives you the conclusion: since the p-value (0.9%) is
less than our alpha value (5%) we have to reject the null hypothesis, so we have
statistical evidence that the number of visits to the swimming pool are not equal for
the three age groups.
The post hoc comparison can be found in the model viewer as well.
The triangles displays significant difference with a yellow line. The table also displays
significant differences with a yellow background. This supports our conclusion that the
age group 25 -< 50 years spends significantly fewer visits to the swimming pool than
the other two age groups.
In neither of the three groups does the normal distribution fit the sample distribution,
so we have a departure from that important assumption of ANOVA, meaning that the
analysis of Section 8.4.2 is invalid. So research question (9) should be answered by the
Kruskal-Wallis test (see Section 8.5.1).
5. Finally, do not forget to switch off the SPLIT FILE by selecting the option Analyze
all cases, do not create groups.
Note that in this situation we face the problem that the variances are not supposed to
be equal since the significance of Levenes Test equals 0.4%. So we have to use the
bottom row of the SPSS output.
Conclusion: If we compare the mean mark between men and women the difference
is not significant at an alpha level of 5% (p-value = 7.5%).
Conclusion: If we compare men and women with respect to the mark, we cannot see
any significance difference (p = 10.7%).
Conclusion: The Kruskal-Wallis test also gives a significant result, so our conclusion
that the rating of the swimming pool is different among the age groups is valid.
Change the
measurement system
into centimetres.
(1) In the variable lists of the dialogs we prefer to have the names of the variables,
so change this into Display names.
(3) Change the measurement system into centimetres if you do not want to use
inches.
2. Click on the link to SPSS 18 and use your Saxion credentials to log in.
3. If your login was successful, the download will start. It is a 249Mb zip-file so your
download might take a minute (or two).
This zip-file contains a virtual application. After unzipping you can see one or more
files, these should stay together. The applications can be started by a "double click" on
the executable file PASW-Statistics-18 (filename ends with .exe).
A subdirectory with the name Thinstall has been created also. Do not remove this
directory because application specific information will be stored here (it is a virtual
registry and may contain virtual system files). If you empty this directory all specific
configuration changes you made, will be gone.
You are allowed to use this application when you belong to Saxion (as a student or staff
member). If you have any questions: mail to notebook@saxion.nl
4. Unzip the distribution to any folder on your computer. It will take 450 Mb of disk
space.
5. Find the file PASW-Statistics-18, double click to launch and (be patient) after a
couple of minutes SPSS will start.
6. You might encounter a warning from the firewall. This will only happen the first
time if you allow SPSS to run on your computer.
3. Select the range A2:E8 and, from the menus, choose Insert -> (Graphs:) Bar. Take
from the category 2D-bar the third option: 100% stacked bar.
Social security
Public transport
Shops Good
Sufficient
Road and traffic safety Insufficient
Bad
Parking place
Green place
We want to update the lay-out to get the result of Figure 6.7. We will use a couple of
Excel menus which are available only if you have the graph selected, so take care of
that.
4. From the menus, choose Layout > (Labels:) > Chart Title.
5. Choose the option Above chart to get a title at the top of the chart.
Marketing Research with SPSS 18 vs 54.docx 16/03/2011
J. Smits, Saxion Hogeschool Enschede, June 2010 page 191
The next thing to fix is to arrange the categories on the vertical axis the natural order,
with the first at the top of the axis.
6. Form the menus, choose Layout > (Axes:) > Axes > Primary Vertical Axis > More
Primary Vertical Axis Options. Choose the settings as indicated in the next dialog.
The last aspect we will adjust is gap between the staves. We want to have this reduced.
7. Click with the right mouse button on one of the staves and select the option
Format Data Series from the context menu.
Green place
Parking place
Social security
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Books
Berenson, M.L. & D.M. Levine & T.C. Krehbiel (2008),
Basic Business Statistics, concepts and applications (11th edition)
New Jersey: Pearson Prentice Hall, ISBN 0135009367
Groebner, D.F. & P.W. Shannon & P.C. Fry & K.D. Smith (2005),
Business Statistics (6th edition)
New Jersey: Pearson Prentice Hall.
Smits, J. and R.G. Edens, Onderzoek met SPSS en Excel 2nd edition (2009),
Amsterdam: Pearson Education, ISBN 9043017272 (in Dutch).
Internet References
www.prenhall.com/burnsbush
http://wps.pearsoned.co.uk/ema_uk_he_saunders_resmethbus_4
www.spss.com
www.surfspot.nl
http://notebook.saxion.nl/index.php//software/saxion-software/saxion