Marketing Research With SPSS

Marketing Research
with SPSS
A practical approach
Johan Smits
An Edition of Koala Press Limited
E-mail:
j.smits@koalapress.com
Marketing Research
with SPSS
A practical approach
Table of Contents
1. Starting with the Research Process 1
Case...................................................................................................................................................1
1.1 The Marketing Research Process .......................................................................................1
1.2 Measurement in Marketing Research ..................................................................................3
1.2.1. Question-Response Formats ..................................................................................................... 4
1.2.2. Scale Characteristics ................................................................................................................. 6
1.2.3. Levels of Measurement .............................................................................................................. 7
1.3 Coding Data and the Data Code Book ..............................................................................10
2. The SPSS Data Editor 13

2.1 Defining the Variables ........................................................................................................13
2.2 Deleting or Inserting Variables ...........................................................................................16
2.3 Data Type of Variables ......................................................................................................17
2.4 The Data Entry ...................................................................................................................18
2.5 Using the SPSS Codebook to Validate the Data .................................................................19
2.5.1. Checking the Value Labels ...................................................................................................... 19
2.5.2. Displaying Data File Information .............................................................................................. 19
2.5.3. Displaying a User Defined Codebook ...................................................................................... 21
3. Research Questions with Respect to One Variable 25

3.1 Introduction ........................................................................................................................25
3.2 Creating and Cleaning up Frequency Tables ....................................................................26
3.3 Calculating Statistics ..........................................................................................................33
3.3.1. FREQUENCIES ............................................................................................................................. 33
3.3.2. DESCRIPTIVES ............................................................................................................................ 34
3.4 Graphs: Pie, Bar, Boxplot and Histogram ..........................................................................35
3.4.1. Creating a Pie Chart ................................................................................................................ 36
3.4.2. Editing a Pie Chart ................................................................................................................... 38
3.4.3. Editing a Pie Chart (Extra) ....................................................................................................... 40
3.4.4. Creating a Bar Chart ................................................................................................................ 41
3.4.5. Editing a Bar Chart ................................................................................................................... 43
3.4.6. Creating a Boxplot ................................................................................................................... 47
3.4.7. Editing a Boxplot ...................................................................................................................... 48
3.4.8. Creating a Histogram ............................................................................................................... 50
3.4.9. Editing a Histogram .................................................................................................................. 51
3.4.10. Exporting Graphs ..................................................................................................................... 53
3.5 Creating a Categorical Variable from a Scale Variable .....................................................54
3.5.1. The Manual Classification Process .......................................................................................... 54
3.5.2. A Classification by SPSS (VISUAL BINNER) ................................................................................... 57
3.6 Documenting and Publishing SPSS-output .........................................................................60
3.6.1. Documenting the Output .......................................................................................................... 60
3.6.2. Opening an Existing Output File .............................................................................................. 61
3.6.3. Transferring SPSS Tables to WORD ............................................................................................ 61
3.6.4. Transferring SPSS Graphs to WORD ........................................................................................... 62
3.6.5. Different SPSS File Types ......................................................................................................... 63
3.7 Feedback on the Research Questions ..............................................................................63
4. Research Questions with Respect to Two Variables 69

4.1 Introduction ........................................................................................................................69
4.2 Comparing Groups with a Simple Bar Chart......................................................................70
4.3 Comparing Groups with a Boxplot .....................................................................................71
4.4 Creating Subgroups and Making a Comparison ................................................................72
4.4.1. Analysing Subgroups with SPLIT FILE ........................................................................................ 72
4.4.2. Analysing a Subgroup with SELECT CASES................................................................................. 74
4.4.3. Analysing Subgroups with MEANS ............................................................................................. 79
4.5 Crosstabs ...........................................................................................................................80
4.5.1. Strength and Direction of Association ...................................................................................... 81
4.5.2. The CROSSTABS Procedure ....................................................................................................... 81
4.5.3. Cell Display .............................................................................................................................. 82
Marketing Research with SPSS 18 vs 54.docx 16/03/2011

J. Smits et. al., Saxion Hogeschool Enschede, March 2011 page v
4.5.4. Calculating Statistics ................................................................................................................ 83
4.5.5. SPSS Output from CROSSTABS .................................................................................................... 84
4.5.6. Formulating Conclusions.......................................................................................................... 85
4.5.7. Pivoting Rows and Columns .................................................................................................... 86
4.6 Creating and editing a clustered bar chart .........................................................................87
4.7 Creating and Editing a Band Diagram ...............................................................................89
4.8 Regression and Correlation ...............................................................................................92
4.8.1. Making a Scatter Plot ............................................................................................................... 92
4.8.2. Calculating the Regression Line .............................................................................................. 94
4.8.3. Calculating the Coefficients of Correlation and Determination ................................................. 95
4.8.4. A Regression Model without an Intercept................................................................................. 96
4.8.5. Drawing the Regression Line in the Scatter Plot ...................................................................... 97
4.9 Feedback on the Research Questions ..............................................................................97
5. Dealing with Multiple Response 105

5.1 Defining the Multiple Response Set ................................................................................ 105
5.2 Creating a Simple Table with CUSTOM TABLES ................................................................ 107
5.3 Creating a Cross Tabulation with CUSTOM TABLES .......................................................... 111
5.4 Creating a Simple Bar Chart ........................................................................................... 113
5.5 Sorting the Categories .................................................................................................... 115
5.6 Creating a Clustered Bar Chart....................................................................................... 115
5.7 Feedback on the Research Questions ........................................................................... 119
6. Scaled Response Questions 121

6.1 Introduction to Scaled Response Questions................................................................... 121
6.2 Case Move On............................................................................................................... 123
6.3 Using CUSTOM TABLES...................................................................................................... 125
6.3.1. The Basic Lay-out .................................................................................................................. 125
6.3.2. Adding Percentages ............................................................................................................... 127
6.4 Creating a Band Diagram in EXCEL................................................................................. 129
6.5 Constructing a Bar Chart ................................................................................................ 130
6.6 Recoding a Scaled-Response for a Better Chart............................................................ 133
6.7 Semantic Differential Scale ............................................................................................. 136
6.8 Assignment ..................................................................................................................... 138
7. Chi-square tests 143

Case.............................................................................................................................................. 143
7.1 Chi-Square Goodness-of-Fit Test ................................................................................... 144
7.2 Chi-Square Crosstab Test .............................................................................................. 149
7.3 Conditions for Chi-Square Crosstab Test ....................................................................... 153
7.4 How to Use Cochrans Rule in Practice .......................................................................... 158
7.5 Banding a Variable to Do a Chi-Square test ................................................................... 159
8. Testing for Differences between Groups 163

8.1 Introduction ..................................................................................................................... 163
8.2 Comparing Two Groups on a Scale Variable: t-Test ...................................................... 164
8.2.1. Graphical Display of the Data ................................................................................................ 165
8.2.2. Running the Statistical Test ................................................................................................... 166
8.2.3. Checking the Conditions ........................................................................................................ 167
8.3 Comparing Two Groups on an Ordinal Variable: Mann-Whitney Rank Test .................. 168
8.3.2. Applying the Statistical Test ................................................................................................... 169
8.4 Comparing More than two Groups on a Scale Variable: Analysis of Variance .............. 172
8.4.3. Post Hoc Comparison of the Groups ..................................................................................... 173
8.4.4. ANOVA Assumptions ................................................................................................................ 174
8.5 Comparing More than Two Groups on an Ordinal Variable: Kruskal-Wallis Rank Test . 176
8.6 Assessing the Normality ................................................................................................. 179
8.7 Lab (with Answers) ......................................................................................................... 181
8.7.1. Elaboration of the t-Test ......................................................................................................... 181
J. Smits et. al., Saxion Hogeschool Enschede, March 2011 page vi
8.7.2. Elaboration of the Mann-Whitney Test ................................................................................... 182
8.7.3. Elaboration of ANOVA .............................................................................................................. 182
8.7.4. Elaboration of the Kruskal-Wallis Test ................................................................................... 183
9. Appendix 185
9.1 Adjustment of SPSS Settings ........................................................................................... 185
9.2 SPSS Distribution of Saxion ............................................................................................. 190
9.3 How to Create and Customise a Chart in EXCEL ............................................................. 190
10. Bibliography 194

11. Glossary 195

J. Smits et. al., Saxion Hogeschool Enschede, March 2011 page vii
J. Smits et. al., Saxion Hogeschool Enschede, March 2011 page viii
1. Starting with the Research
Process
Case
Suxes is the market leader on the Dutch catering market. The company is successful
in several market segments. For university campus restaurants it has developed a
concept that is appreciated by many students and lecturers. Main points are the
wishes and needs of the university and those of the users (students and lecturers).
The formula used by Suxes can be characterised as 'value for money' in an attractive
campus restaurant.
The Pandion University is situated in the eastern part of the Netherlands. For two
years it has been housed in a new building. All courses are given in this new building
with a large and brand-new campus restaurant for both students and lecturers.
From the start the catering was contracted all to Suxes. At Pandion University, Suxes
offers, in addition to the basic assortment (sandwiches, donuts, soup, coffee, tea,
milk), also a luxury assortment. This luxury assortment consists of nutritious soups,
products from the salad bar, extra luxury sandwiches, etcetera.
Now, two years after the start of Suxes catering activities at Pandion University, the
governing board of Pandion wants to do a customer satisfaction survey among the
students and employees.
1.1 The Marketing Research Process

The research process is the map that identifies the researchers path that leads to a
qualitative understanding of the market. This section introduces you to the steps
involved in the marketing research process. Nine steps can be identified in the
marketing research process. The steps are:
1. Establish the need for marketing research
2. Define the problem
3. Establish the research objectives
4. Determine the research design
5. Identify information types and sources: desk research
6. Conduct the field research: qualitative and quantitative
7. Collect and analyze the data
8. Interpretation of the data, leading to conclusions and recommendations
9. Prepare and present the final research report
Market research is needed when decision makers must make a decision and they do
not have the information to help them make the decision. When this is clear, the first
step is taken. The next step (step 2), defining the problem, is the most important step
in the research process, because a firm may spend literally hundreds of dollars doing
market research but, if it has not correctly identified the problem, those dollars will be
wasted. Often a form of exploratory research is needed to clearly identify the problem
so that proper research may be conducted. After that, research objectives, although
related to and determined by the problem definition, are set so that, when achieved,
they provide the information necessary to solve the problem. Some of the research
objectives can be answered by means of desk research or qualitative field research like
observation studies, depth interviews or focus groups conversations. Other research
questions must be answered by means of quantitative field research. This is done by
constructing a questionnaire, collecting the data and analysing it. By using tables,
graphs and statistical tests, an answer can be given to the research objectives.
J. Smits, Saxion Hogeschool Enschede, June 2010 page 1
For the customer satisfaction survey in this case, we have to use a questionnaire to
gather information from the employees and students of Pandion University. This
questionnaire is based on the next problem definition and research objectives.
Defining the problem

In which way can Suxes raise the customer satisfaction by better anticipating the
wishes and needs of the students and lecturers of Pandion University?
Research Objectives
(1) How many days per week does one use the restaurant?
(2) What is the average spending in the restaurant on a weekly basis?
(3) How does one assess the range of choice in the basic assortment and in the
luxury assortment?
(4) How does one assess the customer service offered by the staff?
(5) Which products does one buy in the restaurant?
(6) Is there a need for products which are not in the assortment at this moment?
(7) Who is the customer (gender, student or lecturer, number of study years)?
(8) What is the overall level of satisfaction of the catering services expressed as a
score on a scale from 1 up to 10?
(9) Is there a difference between men and women in appreciation?
(10) Is there a difference between students and lecturers in appreciation?
(11) Is there a difference between students and lecturers in the amount spent?
(12) Is there a difference between students and lecturers in overall satisfaction?
(13) Is there a relationship between the score given by customers and the amount
spent?
(14)
For the field research a questionnaire is constructed on the basis of the research
questions. Generally speaking, manual processing of the inquiry results is not an
option. In general the quantity data will be such that we need statistical software: SPSS.
Now days questionnaires are distributed via the Internet increasingly. There are
software products with which you can realise web inquiries. You can construct the
questionnaire with this software, put it online and invite people via email. During the
research, you can monitor the results daily, even on an ongoing basis. At the end you
can export all data to SPSS.
As an example, we will outline this process by means of the next questionnaire which
the researchers have established for the Suxes Customer Satisfaction Survey at
Pandion University.

1. How often (days per week) do you visit the restaurant (on average)?
days a week (fill is a 1, 2, 3, 4 or 5)
2. What is your average expenditure in the restaurant on a weekly basis?

euro per week
3. The variety of products in the basic assortment is

O Enough
O Sufficient
O Insufficient
O Poor
4. The variety of products in the luxury assortment

(salad bar, extra luxury sandwiches) is
O Enough
O Sufficient
O Insufficient
O Poor
5. Your satisfaction with the customer service of the staff of Suxes is ...
O Excellent
O Good
O Bad
O Very bad
6. Which products do you buy in the restaurant?

(more than one answer is allowed)
Cheese or ham sandwich
Other kinds of sandwiches
Donut, croissant or baguette
Dairy products
Coffee or tea
Fruits
Soup
Salads
Dinner
7. How do you rate the university caterer Suxes on a scale from 1 to 10?
8. I would like to have the assortment extended with the following products:
................................................................................................................
There are only a few more questions for clarification purposes.
9. Your relationship with Pandion University is:

O Student
O Lecturer (please continue with question 11)
10. How many years have you been registered as a student at Pandion University?
year(s)
11. Please indicate your gender:

O Female
O Male
Figure 1.1 The questionnaire for the Suxes Customer Satisfaction Survey for the students
and lecturers of Pandion University
1.2 Measurement in Marketing Research

Questionnaires are designed to collect information, but how is this information
collected? It is gathered via measurement, which is defined as determining the amount
or intensity of some characteristic of interest to the researcher. For instance, a
marketing manager may wish to know how a person feels about a certain product, or
how much of the product he or she uses in a certain time period. This information,
once compiled, can help to solve specific questions such as brand usage.
But what are we really measuring? We are measuring propertiessometimes called

attributes or qualitiesof objects. Objects include consumers, brands, stores,
advertisements, or whatever subject is of interest to the researcher working with a
particular manager. Properties are the specific features or characteristics of an object
that can be used to distinguish it from another object. For example, assume the object
we want to research is a consumer. The properties of interest to a manager who is
trying to define who buys a specific product are a combination of demographics such
as age, income level, gender, and buyer behaviour, which includes such things as the
buyer's impressions or perceptions of various brands. Note that each property has the
potential to further differentiate consumers. Once the object's designation on a
property has been determined, we say that the object has been measured on that
property. Measurement underlies marketing research to a very great extent because
researchers are keenly interested in describing market phenomena. Furthermore,
researchers are often given the task of finding relevant differences in the profiles of
various customer types.
1.2.1. Question-Response Formats

When the data collected by a questionnaire is to be processed by statistical software,
SPSS, it is necessary to transform each question into a variable. In order to be able to
make that transformation you should be aware of the four basic question-response
formats which are:
Closed-Ended
Open-Ended with numerical response
Open-Ended with text response
Multiple response questions
We will discuss these different formats in the next sections.
Closed-Ended Response Format Questions

The closed-ended question provides response options on the questionnaire that can be
answered quickly and easily. A dichotomous closed-ended question has only two
response options, such yes or no. If there are more than two options for the
response, then we are dealing with a multiple-category closed-ended question. Both
the dichotomous and the multiple-category closed-ended question formats are very
common on questionnaires because they facilitate the questioning process as well as
the data entry. Each question of this type corresponds to one variable in SPSS and the
response options are coded with numbers. These codes are numerical because
numbers are quick and easy to input, and computers work with numbers more
efficiently than they do with alphanumeric codes.
Examples from the Suxes Survey at Pandion are:
5. Your satisfaction with the customer service of the staff of Suxes is ...
O Excellent
O Good
O Bad
O Very bad
11. Please indicate your gender:

O Female
O Male
The response options to question 5 can be coded in the following way:

1 = Excellent
2 = Good

3 = Bad
4 = Very bad
By using these codes the data entry is limited to the entry of codes (1, 2, 3, 4) instead of
the entry of the literal answers (Excellent, Good, Bad, Very bad). We prefer to use
numbers because numbers are easier and faster to keystroke into a computer file.
We can code question 11 as:

1 = Female
2 = Male
Open-Ended Questions with Numerical Response

An open-ended question presents no response options to the respondent. The nature
of the question implies a numerical answer which is written down by the respondent
or interviewer. For questions like what is your age it is not possible (neither
convenient) to provide a list of all ages.
We can transform this type of question into one variable in SPSS. It is no use coding the
answers, because the response is already numerical and has its meaning by nature.
Examples from the questionnaire:
2. What is your average expenditure in the restaurant on a weekly basis?

euro per week
10. How many years have you been registered as a student at Pandion University?
year(s)
Open-Ended Questions with Text Response

Again no response options have been printed on the questionnaire. Instead there is a
box or a couple of lines where the respondent can write his/her answer. This answer
will be a text: a word, couple of words, a complete sentence, or perhaps a (short or
long) story.
An example from the questionnaire:
8. I would like to have the assortment extended with the following products:
................................................................................................................
There are two ways to transform the question and the responses into:
1. We can take over the answer literally and type in the text. For this we have to
define a text variable in SPSS. We can use an extra module of SPSS, called SPSS
Text Analysis for Surveys, to process the data and count twinge words. This is
not dealt with in this book.
2. We can define a numerical variable in SPSS and code the responses afterwards. If
the number of answers is limited, this method is to be recommended. During
the data input you maintain a list of answers and their codes. This is the way we
have dealt with question 8 of the questionnaire.
Multiple Response Questions

This type of question allows the user to give more than one response.
The example of the Suxes Survey questionnaire is:

(more than one answer is allowed)
Other kinds of sandwiches
Dairy products
Coffee or tea
Fruits
Soup
Salads
Dinner
In fact, in this example the respondent is asked nine questions:

Do you buy a cheese or ham sandwich?
Do you buy another kind of sandwich?
Etcetera
For each response category the respondent can only answer with yes or no. So each
response category transforms into a variable in SPSS. Of course, you are free to choose
your codes, but the standard approach is to have each response category option coded
with a 0 or a 1. The designation 0 will be used if the category is not checked, whereas
a 1 is used if it is checked by a respondent. Thus question 6 of the questionnaire is
processed in SPSS by defining nine variables each having a dichotomous response
structure: 1=yes, 0=no.
1.2.2. Scale Characteristics

On the surface, measurement may appear to be a very simple process. It is simple as
long as we are measuring objective properties, which are physically verifiable
characteristics such as age, income, number of bottles purchased, store last visited,
and so on. However, market researchers often desire to measure subjective
properties, which cannot be directly observed because they are mental constructs
such as a person's attitude or intentions. In this case, the market researcher must ask
a respondent to translate his or her mental constructs onto a continuum of
intensity no easy task. To do this, the market researcher must develop question
formats that are very clear and that are used identically by the respondents. This
process is known as scale development.
Scale development is designing questions to measure the subjective properties of an

object. There are various types of scale, each of which possesses different charac-
teristics. The characteristics of a scale determine the scales level of measurement. The
level of measurement, as you shall see, is very important. There are four characteristics
of scales: description, order, distance and origin.
Description
Description refers to the use of a unique descriptor, or label, to represent each
designation on the scale. For instance, yes and no, agree and disagree and the
number of years of a respondent's age are descriptors on a simple scale. All scales
include description in the form of characteristic labels that identify what is being
measured.
Order
Order refers to the relative sizes of the descriptors. Here, the key word is relative and
includes such descriptors as greater than, less than, and equal to. A respondent's
least-preferred brand is less than his or her most-preferred brand and respondents
who check the same income category are the same (equal to). Not all scales possess
order characteristics. For instance, is a buyer greater than or less than a nonbuyer?
We have no way of making a relative size distinction.
Distance
A scale has the characteristic of distance when absolute differences between the
descriptors are known and may be expressed in units. The respondent who purchases
three bottles of diet cola buys two more than the one who purchases only one bottle; a
three-car family owns one more automobile than a two-car family. Note that when the
characteristic of distance exists, we are also given order. We know not only that the
three-car family has more than the number of cars of the two-car family, but we also
know the distance between the two (one car). A scale is said to have the characteristic
of origin if there is a unique beginning or true zero point for the scale. Thus, 0 is the
origin for an age scale just as it is for the number of miles travelled to the store or for
the number of bottles of soda consumed. Not all scales have a true zero point for the
property they are measuring. In fact, many scales used by market researchers have
arbitrary neutral points, but they do not possess origins. For instance, when a
respondent says, No opinion, to the question Do you agree or disagree with the
statement The Lexus is the best car on the road today? We cannot say that the person
has a true zero level of agreement.
Perhaps you noticed that each scaling characteristic builds on the previous one. That
is, description is the most basic and is present on every scale. If a scale has order, it
also possesses description. If a scale has distance, it also possesses order and
description, and if a scale has origin, it also has distance, order, and description. In
other words, if a scale has a higher-level characteristic, it also has all lower-level
characteristics. But the opposite is not true, as is explained in the next section.
1.2.3. Levels of Measurement

You may ask, Why is it important to know the characteristics of scales? The answer is
that the characteristics possessed by a scale determine that scales level of
measurement. Throughout this course, we will try to convince you that it is very
important for a market researcher to understand the level of measurement of the scale
he or she chooses to use. Let us now examine the four levels of measurement. They are
nominal, ordinal, interval, and ratio. Table 1.1 shows how each scale type differs with
respect to the scaling characteristics we have just discussed.
Table 1.1 also introduces two new concepts: categorical versus metric scales. A
categorical scale is one that is typically composed of a small number of distinct values
or categories such as male versus female, or married versus single versus
widowed. As you can see in the table, there are two categorical scale types: nominal
and ordinal. These will be described in detail in this section. The other concept is a
metric scale, which is composed of numbers or labels that have an underlying
measurement continuum. There are two metric scales that are also described in this
section, and they are interval and ratio scales.
Scale characteristics possessed

Level of measurement Distanc
Description Order e Origin
Categorical scales
Nominal scale +
Ordinal scale + +
Metric Scales
Interval scale + + +
Ratio scale + + + +
Table 1.1
When you are interpreting the data by means of statistical analyses you have to come
up with answers to the research questions. The statistical method you can use depends
on the type of question in the questionnaire. To be more precise: the level of
measurement determines the statistical method to be used.
Just a small example to make this clear. An elementary data summarizing method is
calculating average values, e.g. computing mean values for variables. But it should not
surprise you that for some variables this is nonsense and for other variables this is
meaningful. It is no use talking about the mean Gender, but on the other hand, for
Expenditure we can calculate the mean value with sense. This is clearly related to the
level of measurement of the variable involved. As said, we distinguish four levels of
measurements.

Nominal Scales
Nominal scales are defined as those that only use labels; that is, they possess only the
characteristic of description. Examples include designations such as race, religion,
type of dwelling, gender, brand last purchased, buyer/nonbuyer; answers that involve
yes-no, agree-disagree; or any other instance in which the descriptors cannot be
differentiated except qualitatively. If you describe respondents in a survey according to
their occupationbanker, doctor, computer programmeryou have used a nominal
scale. Note that these examples of a nominal scale only label the consumers. They do
not provide other information such as greater than, twice as large, and so forth.
Some other examples of nominal-scaled questions are:
Your course at Pandion (Marketing, Int. Business and languages, Int. Business
administration, Management and law, Health studies, Security);
Smoker or non smoker (yes, no);
Choice of a supermarket (A&P, Wal-Mart, Sears, Aldi, other).
For variables with a nominal scale there are hardly any calculations available. You
cannot compute an average (mean) value. Calculating the median makes no sense
either. The only statistical activity is counting the frequencies. You might wonder how
SPSS is able to calculate a mean value for the variable gender. That is done on the basis
of the numbers (the codes) used for the values male and female. But the calculated
value of the mean is meaningless. Interpretation of the calculations done by SPSS is a
human activity.
Ordinal Scales
Ordinal scales permit the researcher to rank-order the respondents or their responses.
For instance, if the respondent was asked to indicate his or her first, second, third, and
fourth choices of brands, the results would be ordinally scaled.
Similarly, if one respondent checked the category Buy every week or more often on a
purchase-frequency scale and another checked the category Buy once per month or
less, the result would be an ordinal measurement. Ordinal scales indicate only relative
size differences among objects. They possess description and order, but we do not
know how far apart the descriptors are on the scale because ordinal scales do not
possess distance or origin. Examples of ordinal-scaled questions are:
Please rank each brand in terms of your preference. Place a 1 by your first choice,
a 2 by your second choice, and so on.
__ Sony
__ Zenith
__ Philips
__ BasF
__ Grundig
In your opinion, would you say the prices at Wal-Mart are

O Higher than Sears
O About the same as Sears
O Lower than Sears
What is your age?

O 15<25
O 25<40
O 40<60
O 60<90
Interval Scales
Interval scales are those in which the distance between each descriptor is known. The
distance is normally defined as one scale unit. For example, a coffee brand rated 3 in
taste is one unit away from one rated 4. Sometimes the researcher must impose a
belief that equal intervals exist between the descriptors. That is, if you were asked to
evaluate a store's salespeople by selecting a single designation from a list of extremely
friendly, very friendly, somewhat friendly, somewhat unfriendly, very unfriendly,
or extremely unfriendly, the researcher would probably assume that each designation
was one unit away from the preceding one. In these cases, we say that the scale is
assumed interval. As shown in the examples below, these descriptors are evenly
spaced on a questionnaire; as such, the labels connote a continuum and the check lines
are equal distances apart. By wording or spacing the response options on a scale so
they appear to have equal intervals between them, the researcher achieves a higher
level of measurement than ordinal or nominal. With higher-order scales, the
researcher is permitted to apply more powerful statistical techniques such as
correlation analysis.
Please rate each brand in terms of its overall performance

Rating (circle one)
Brand Very Poor Very Good
Mont Blanc 1 2 3 4 5 6 7 8 9 10
Parker 1 2 3 4 5 6 7 8 9 10
Cross 1 2 3 4 5 6 7 8 9 10
Indicate your degree of agreement with the following statements by circling the
appropriate number.
Strongly Strongly
Statement disagree agree
a. I always look for bargains 1 2 3 4 5
b. I enjoy being outdoors 1 2 3 4 5
c. I love to cook 1 2 3 4 5
Please rate the Pontiac Firebird by checking the line that best corresponds to your
evaluation of each item listed.
Slow pickup ___ ___ ___ ___ Fast pickup
Good design ___ ___ ___ ___ Bad price
Low price ___ ___ ___ ___ High price
Ratio Scales
Ratio scales are ones in which a true zero origin existssuch as an actual number of
purchases in a certain time period, dollars spent, miles travelled, number of children,
or years of college education. This characteristic allows us to construct ratios when
comparing results of the measurement. One person may spend twice as much as
another or travel one-third as far. Such ratios are inappropriate for interval scales, so
we are not allowed to say that one store was one-half as friendly as another. Examples
of ratio-scaled questions are:
Please indicate your age.

___ Years
Approximately how many times in the last month have you purchased anything
over $5 in value at a convenience store?
0 1 2 3 4 5 More (specify: ___)
How much do you think a typical purchaser of a $100,000 term life insurance
policy pays per year for that policy?
$ ____
What is the probability that you will use a lawyers service when you are ready to
make a will?
___ percent

Note There are different ways to measure a variable, leading to a
different level of measurement. We showed you an example of Age
measured on a ordinal level (with classes) and an example on a
ratio level. But suppose the question was formed this way:
To which category do you belong? Younger / Elder
or
Please indicate your age? Younger than 35 / 35 years or older
Measurement in this way is only on a nominal level.
Summary:
The level of measurement of a variable is determined by the way in which it is
measured. You have to take into account the possible responses. We distinguish
categorical scales (nominal and ordinal) and metric scales (interval and ratio). The
metric scale is denoted as Scale by SPSS.
1.3 Coding Data and the Data Code Book

For processing the collected data in SPSS you have to translate each question into a
variable. The data entry requires an operation called data coding, defined as the
identification of codes that pertain to the possible responses for each question on the
questionnaire. Typically, these codes are numerical because numbers are quick and
easy to input, and computers work with numbers more efficiently than they do with
alphanumeric codes. In large-scale projects, and especially in cases which the data
entry is performed by a subcontractor, researchers utilize a data code book, which
identifies all variable names and code numbers associated with each possible response
to each question that makes up the dataset. With a code book that describes the data
file, any researcher can work on the data set, regardless of whether or not that
researcher was involved in the research project during its earlier stages.
So the data code book is a list of transformations of questions into variables, their
(variable) labels, the codes of the answers with their (value) labels, and the level of
measurement. Recall that we discussed in Section 1.2 that each question corresponds
with one variable except multiple response questions where we need as many variables
as there are response options.
Some remarks at the choice of names for variables and codes:
Choose a short name for a variable in SPSS. The first character of the name must
be a letter, where letters and numbers are allowed for the other characters. The
use of symbols like @, #, $ and _ is also allowed. Spaces have been prohibited and
we strongly dissuade the use of a point or comma.
The name of the variable entered in SPSS is often a concise reproduction of the
characteristic measured in the questionnaire. This name of the variable is
extended with a (variable) label to provide SPSS with a full description. The
variable label is used as a title in tables and graphs. So this label must be very
clear and meaningful.
The codes you use for entering the data in SPSS must also be provided with labels.
SPSS will use these value labels in tables and graphs as well. So it is very important
to use clear and concise descriptions for value labels.
The level of measurement of a variable is especially important in the phase of data
analysing. As we have explained the statistical analysis to be used is restricted to
variables with the required level of measurement. In displaying variable lists
within dialogs of statistical procedures SPSS takes the level of measurement into
account.
In Figure 1.2 we have constructed the data code book for the Pandion survey.

Number Name of the Variable label Value labels Measure
variable (data codes)
RespNum Respondent number Not applicable Nominal
1 Visits Number of visits to the restaurant (days per Not applicable Ratio
week)
2 Expenditure Expenditure in the restaurant on a weekly Not applicable Ratio
basis
3 Variety_basic Variety of products in the basic assortment 1 = Enough Ordinal
2 = Sufficient
3 = Insufficient
4 = Poor
4 Variety_luxe Variety of products in the luxury assortment 1 = Enough Ordinal
2 = Sufficient
3 = Insufficient
4 = Poor
5 Staff Satisfaction with the customer service of the 1 = Excellent Ordinal
staff 2 = Good
3 = Bad
4 = Very bad
6 Product1 Cheese or ham sandwich 0 = No, 1 = Yes Nominal
Product2 Other sandwich 0 = No, 1 = Yes Nominal
Product3 Donut, croissant or baguette 0 = No, 1 = Yes Nominal
Product4 Dairy products 0 = No, 1 = Yes Nominal
Product5 Coffee or tea 0 = No, 1 = Yes Nominal
Product6 Fruits 0 = No, 1 = Yes Nominal
Product7 Soup 0 = No, 1 = Yes Nominal
Product8 Salads 0 = No, 1 = Yes Nominal
Product9 Dinner 0 = No, 1 = Yes Nominal
7 Mark Score for the catering service of Suxes Not applicable Ratio
8 Assortment Suggestions for extension of the assortment 1 = More biological products Nominal
2 = More snacks
3 = More kinds of soup
4 = More choice in fruits
5 = Hot Chocolate
6 = Pea soup (in the winter)
9 Customer_type Student or lecturer 1 = Student Nominal
2 = Lecturer
10 YearStud Number of years registered as a student at Not applicable Ratio
Pandion University
11 Gender 1 = Female Nominal
2 = Male
Figure 1.2 Data code book of the questionnaire in the Suxes Customer Satisfaction Survey
Attention Please remember that the ratio level of measurement is denoted as

Scale in SPSS.

2. The SPSS Data Editor
2.1 Defining the Variables

The data code book contains a list of variable names corresponding to the questions in
the questionnaire. We need to enter those variable names, their labels, value labels
and level of measurement.
1. Start SPSS.
If your version of SPSS starts with a dialog What would you like to do?, select the
option box Type in data (halfway through the dialog) and also select the option Dont
show this in the future (at the bottom of the dialog) to prevent this dialog from
reappearing.
Check this option.
After clicking OK SPSS will show you the Data Editor window.
2. Maximize this window.

Click here to create
variables.
This window has a certain resemblance to EXCEL, a worksheet with columns (variables)
and rows (the data of each respondent). This sheet is called Data View. SPSS data files
are organized by cases (rows) and variables (columns). In our data files, cases
represent individual respondents to a survey. Variables represent questions asked in
the survey.
We will use the second sheet Variable View to enter the data code book. On this
sheet we can define the variables.
3. Click on the second tab. (Variable View).
We will give a short description of the columns. The bold entries refer to the columns
of the data code book, which we discussed in Chapter 1 (see Figure 1.2).
Name Stores the name of the variable (must start with a letter)
Type Sets the data type for a variable. Most often you will use Numeric.
Other data types will be discussed in Section 2.3.
Width Specifies the maximum number of characters for a variable value.
Leave this at 8.
Decimals Sets the number of decimal places for a numeric value.
Change this to 0, unless you want to use decimals.
Label Stores the description used by SPSS to identify the variable in output.
Values Sets the labels for the coded values of a categorical variable.
In the Data View sheet you can tell SPSS to display these value labels by
selecting View Value Labels in the menu. (See also Section 2.5.1)
Missing Specifies whether the data set contains missing values, and the missing
values, if present, are coded.
Columns Sets the width of the column of the variable on the screen.
Align Sets the alignment of the column (only on screen).
Measure Specifies the level of measurement (see Section 1.2.3). Metric variables are
denoted as Scale by SPSS.
Role Specifies the role of the variable for advanced models. This originates from
the software programme Clementine (PASW MODELLER).

If the name of the variable is sufficiently clear you are allowed to leave the variable
label empty (see the variable Gender for instance).
Value labels can only be used with coded variables. The other columns of the Variable
View are used only if necessary.
Note If you are defining categorical variables, you should define them as
numerically coded variables and then establish the meaning of
those codes in the Label column. You should not define such
variables as string variables!
4. Fill in the Variable View. Refer to the data code book of Figure 1.2. The first
part of the window looks like this:
Button to build the

Value labels
In order to enter the Value labels you click in the cell which displays None. At that very
moment a button with three dots appears. Click this button.
5. After clicking the dialog Value Labels appears.
Use your mouse (or the Tab-button) to proceed to the next input box. Finally you click
the OK-button to leave the dialog because otherwise all your work will disappear.
6. Process all variables of the data code book in the Variable View.
Tip: Adjust the column width in order to fit all columns on the screen.
7. Do not forget to enter the correct level of measurement in the last column.
Please remember the three levels Nominal, Ordinal and Scale (which combines
the levels Interval and Ratio).
Have you entered all variables?
8. Switch to the Data View. Here you see the names of the variables appear as the
column names.
9. Now save your data file, in case something terrible happens

Create, on your memory stick or on your own hard disc, a folder SPSS Basic
Course and save your data file with the name Suxes Survey. As you will see
that SPSS automatically adds the extension .sav to it.
Note A spss data file must always have the extension .sav.
2.2 Deleting or Inserting Variables

This section gives you some extra information about deleting and inserting variables.
Just read it because you may need this information later. Sometimes there is a need to
add an extra variable, maybe because you skip one by accident. Fortunately this is very
easy in SPSS using the Variable View.
Click with the right mouse button on the row heading and choose the option Insert
Variables.
With the right mouse button

you can easily delete (clear) or
insert a variable.
Deleting a variable goes even faster than inserting:
Click with the left mouse button on the row heading and press the Delete button
of your keyboard.
Or:
Right click the row heading of the variable and select the option Clear.

2.3 Data Type of Variables
Besides the numerical data type which is generally used, there are some other data
types available which need to be used in special cases.
Among the other data types you will need to use there are Date and String. For date or
time related variables it is wise to use the Date data type, because you can use the date
and time functions of SPSS to make calculations. Examples of these kinds of
calculations are the computation of the age of a customer on the basis of the date of
birth or the number of days since the last visit of the customer on the basis of last
visiting date.
If you want to input text you will need the data type String.
Click on this button to choose

an other data type.
On the tab Variable View in the Data Editor you can choose another data type via the
corresponding button.
Adjust the number of

decimal places here.
To be used for Date or

Time variables.
To be used for text.
The data type Date has a number of different formats.

For the data type String you have to specify the maximum number of characters you
are about to use. There is no limitation to this maximum but it is wise not to make this
too high in order to keep your file size low. If you specify the number of characters to
be used as 8 or lower, you are still able to make a table of frequencies of this variable.
For analyzing text variables we advise you to use the procedure AUTORECODE which
creates a new numerical variable which uses the texts of the old text variable as data
codes. The discussion of procedures like Text Analyses for Surveys is beyond the
scope of this book.
2.4 The Data Entry

In the Data View every row contains the data of one respondent. In order to input the
answers on the questionnaire of a respondent you have to work row-wise. Click on the
row heading (the number) and the whole row is selected. Enter the (code) numbers of
each question and continue by pressing ENTER to enter to the next cell.
Work carefully in order to prevent making errors with the data entry. After entering
the respondent number, write it down on the questionnaire also, in order to be able to
locate this form if needed at a later stage. If you have activated the Value labels you
can use option lists to facilitate the input process. But if you prefer the codes over the
labels you can deactivate this option by clicking the same button again.

The button Value
labels is activated.
And now the good news is that your lecturer has entered the data for you already. The
data is available for you in the file Suxes Catering Services.xls and the only thing left
to do is a copy and paste action.
1. Start EXCEL and open the file Suxes Catering Services.xls

(from mim.saxion.nl/docent/sms/SPSS).
Select the data cells and click Copy.
Please note: Exclude the top line from your selection and select only the 50 data
rows. It is handy to start your selection in the right bottom cell and go upwards
to the left, excluding the top line.
The variable names must

be excluded from the
selection.
2. Switch to SPSS and paste the data via the menu Edit Paste. Please note that
the Data View-tab must be active. Check whether the cursor is in the top left
cell of the sheet.
3. After pasting the data you have to save the file (again).
From the menus, choose: File Save or use the corresponding icon of the
toolbar.
2.5 Using the SPSS Codebook to Validate the Data

2.5.1. Checking the Value Labels
With the Value labels button of the toolbar you can activate or deactivate the value
labels. If the Value labels are activated you can browse the data editor and check the
data. If there are codes without a label, there is an error. Either the code is not correct
or the value label is missing.
1. While you have the Data View of the Data Editor on screen, use the Value labels
button of the toolbar. Click one more time to see how this button toggles the
display. In this view check the column of your sheet.
2.5.2. Displaying Data File Information

The codebook as shown in Figure 1.2 can be produced by SPSS by using the option
Display Data File Information. SPPS will create a table displaying all characteristics of
all variables and adds another table which contains all value labels.
1. From the menus, choose File Display Data File Information working File.
The result is displayed in

Outline pane
Contents pane
Figure 2.1 The output of the command FILE INFORMATION
If this is your first time working with SPSS output it is worthwhile to spend a few
minutes to become familiar with the structure of the SPSS output window. The results
from running a statistical procedure are displayed in the Viewer. The output produced
can be statistical tables, charts, graphs, or text, depending on the choices you make
when you run the procedure.
The Viewer window is divided into two parts. The outline pane (on the left side)
contains an outline of all information stored in the Viewer. The contents pane (on the
right) contains statistical tables, charts, and text output.
Use the scroll bars to navigate through the windows content, both vertically and
horizontally. For easier navigation, click an item in the outline pane to display it in the
contents pane. If you think that there is not enough room in the Viewer to see an entire
table or that the outline view is too narrow, you can easily resize the window.
The results from most statistical procedures are displayed in pivot tables. In the next
chapter we will discuss how to edit a pivot table.
The output of the FILE INFORMATION procedure has a number of components, Title
(which contains the title of the block), Notes (containing the creation date, name of the
data file, etcetera), Active Dataset (a text output block with the full path and name of
the data file), Statistics (a table with the number of valid and missing observation for
each variable), and Frequency Table (which contains the frequency tables). The
content of each block is shown on the right side in the content pane.
2. Check the file information by comparing your SPSS output with the codebook
from Figure 1.2.
3. Scroll downwards and check whether all variables have the correct value labels.

Figure 2.2 The value labels defined for the variables.
2.5.3. Displaying a User Defined Codebook

In order to control the output of SPSS, SPSS has an option that allows you to specify
which elements from the codebook you want to display. This option, which has the
appropriate name CODEBOOK, will be used right now.
4. From the menus, choose Analyze Reports Codebook.

Three tabs to specify which
variables, which output and which
statistics you want to display.
Use this button after selecting

the variables in the left pane.
5. Select all variables except the respondent number and put them into the right
pane. You can select the variables most conveniently by clicking on the variable
Visits and, while holding the Shift key, clicking on the variable Gender.
At the second tab of the dialog, Output, you can select which variable and file
information you want to display by SPSS. The tab Statistics facilitates you to choose the
way of summarizing the data on the basis of the measurement label of the variable.
You do not need to change these settings right now.
6. By clicking the button OK, an overview of all selected variables will be produced.

Figure 2.3 The output of the command CODEBOOK
In the output file you can see a new branch added to the tree structure. This new
branch contains the name of the command as top label and each variable has its own
entry. Every leaf on the left (outline pane) corresponds to a table on the right side
(contents pane) of the window. Please note that variables with a ratio level of
measurement have a different table (see Figure 2.4) than the nominal and ordinal
variables (see Figure 2.5). For scale variables (this is ratio!) the table contains the
mean and standard deviation, while tables with nominal and ordinal variables contain
all observations and their frequencies and percentages.
Figure 2.4 The codebook display of the ratio variable Visits

Figure 2.5 The codebook display of the ordinal variable Variety_basic
Inspect the tables in the output file. You will discover that two frequency tables
contain an error.
Figure 2.6 In two frequency tables something is certainly wrong
7. Repair the errors in the data editor. The one error is clear, because the 11 must
of course be a 1, and the other is a male. Use the search button to find these
values in the data editor.
8. Save your data file again.
9. Remove the CODEBOOK tables (by selecting and deleting the whole output block).
Make new tables and check whether they are correct now.
10. Save the output file in the folder SPSS Basic Course also. Name it Suxes Survey
Output 1 and notice that SPSS adds the extension .spv.
Note A SPSS output file always has the extension .spv. In version 15 and
older the SPSS output files have the extension .spo.
Unfortunately CODEBOOK tables are not meant to be published in this raw format. They
can only be published after some elaborations have been made. In the next chapter we
will discuss how to create tables for a publication.
11. Exit SPSS. Close the output file and the data file. SPSS displays an alert to warn
you.

3. Research Questions with
R e s p e c t t o O n e Va r i a b l e
3.1 Introduction
In this chapter we will discuss analyses which deal with only one variable at a time.
This means that we will analyse the responses of one question of the questionnaire
without taking into account the answers to other questions. Examples of research
objectives with respect to one variable in the Suxes Survey are (see also Section 1.1):
(7) What is the customers gender?
To answer these questions you have to make a choice from the available statistical
analyses. Firstly the level of measurement of the variable determines this choice. The
research objectives (1) and (2) are related to ratio scaled variables. You can come to an
answer by calculating the mean value. Research question (4) is related to an ordinal
scaled variable and (7) to a nominal scaled variable. Calculating a mean value to
answer these questions makes no sense of course.
Secondly, it is up to the researcher how detailed the research question will be

answered. Calculating one statistic (mean, median or mode) only gives one indication
of the central (or most common) value of that variable. Also the researcher may wish
to report a statistic about the spread of the distribution of the variable. And, besides
calculating statistics, you can summarize the data by means of a table or graph. Again
the choice of the graph is dictated by the level of measurement.
Table 3.1 summarizes how research questions with respect to one variable can be
analysed. There are many numerical descriptive measures available for scale 1
variables, including:
Measures of central tendency. The most common measures of central

tendency are the mean (arithmetic average) and median (value at which half the
cases fall above and below).
Measures of dispersion. Statistics that measure the variation or spread in the
data include the standard deviation, range and inter quartile range (= difference
between third and first quartile).
For variables measured on an ordinal or nominal level, things are different because we
are dealing with categorical variables. Table 3.1 lists the available statistics and graphs.
1 SPSS uses scale to denote the interval and ratio level of measurement.
Level of measurement
Analysis Nominal Ordinal Interval/ratio
Locate the centre Mode, Median, Mean,

value Modal class Mode, Median,
(3.3.1) Modal class Modal class
(3.3.1) (3.3.1, 3.3.2)
Calculate the Range, Range,

spread Inter quartile range Inter quartile range
(3.3.1) Standard deviation
(3.3.1, 3.3.2)
Summarize with a Table of frequencies Table of frequencies A frequency table is not

table. (2.5.2 en 3.2) (2.5.2 en 3.2) useful if there are many
distinct values.
Summarize with a Pie chart Pie chart Boxplot
graph. (3.4.1 3.4.3) (3.4.1 3.4.3) (3.4.6 en 3.4.7)
Bar chart Bar chart Histogram

(3.4.4 en 3.4.5) (3.4.4 en 3.4.5) (3.4.8 en 3.4.9)
Table 3.1
In this chapter we will discuss how to conduct these statistical analyses with SPSS.
Furthermore we will show you how to edit tables and graphs so that you can use them
in your publications, like a research report in WORD or a POWERPOINT presentation.
Section 3.5 will discuss how to document your SPSS output file in order to retrieve
tables, graphs and other output elements easily.
The last section concludes by discussing all research questions with respect to one
variable. For each research question we will show the analysis in SPSS and we will give
an appropriate conclusion.
3.2 Creating and Cleaning up Frequency Tables

A frequency table displays the answers of a question and how often these answers have
been given. A frequency table displays the frequencies and the percentages. The
researcher often encounters problems, because not all respondents have answered all
questions. So, these respondents have blanks as values for these variables, the answer
is missing. When you clean up a frequency table to make it ready for publication, you,
being the researcher, have to decide how to display these missing values in your table.
In this section we will discuss how to create and clean up frequency tables and how to
display missing values.
1. Start SPSS and open the data file Suxes Survey.sav which you have created and
saved in the previous chapter.
2. From the menus, choose: File Open Output and open the output file Suxes
Survey Output 1.spv. This file has been created in the previous chapter as well and
it contains the code book tables of all variables of the survey. In order to save this
file by a new name, from the menus choose File Save as. Name your new file
Suxes Survey Chapter 3.spv
Note Save all your output of this chapter in the output file Suxes Survey
Chapter 3.spv. This file is a large container in which all tables and
graphs are stored. At the end of this chapter you will learn how to
structure this file so that you can retrieve your work without effort.
We are going to create a frequency table for the variables Variety of products in the
basic assortment, Score for the catering service of Suxes, Numbers of years
registered as a student at Pandion University. These variables have names like:
Variety_basic, Mark, YearStud.
3. If you prefer to use the names of the variables instead of their labels, please change
this setting of SPSS as is displayed in Section 9.1.
4. From the menus, choose Analyze Descriptive Statistics Frequencies.
5. Select the three variables from the list and put them into the right pane. Hint: If
you hold the Ctrl-button, you can select variables simultaneously. The tables will
be produced after clicking the OK button.
Figure 3.1 The output of the FREQUENCIES command
As you can see in Figure 3.1 SPSS added a new branch to the existing tree structure. In
the viewer pane (right side) the frequency tables are displayed. SPSS starts by showing
a summary with the number of valid outcomes and the number of missing
observations for each variable. Please note that for ten respondents the variable
YearStud has no value.
6. Locate the frequency table of the variable YearStud in the viewer pane.
A rough table is displayed in Figure 3.2.
Number of years registered as a student at Pandion University
Valid Cumulative
Frequency Percent Percent Percent
Valid 1 10 20,0 25,0 25,0
2 11 22,0 27,5 52,5
3 10 20,0 25,0 77,5
4 6 12,0 15,0 92,5
5 3 6,0 7,5 100,0
Total 40 80,0 100,0
Missing System 10 20,0
Total 50 100,0
Figure 3.2
The format of your tables is a critical part of providing clear, concise and meaningful
results. If your table is difficult to read, the information contained within that table
may not be easily understood. It is clear that there is a need for some adjustments,
like:
Hiding the columns Percent and Cumulative Percent.

Changing the display format of the percentages: without decimal places and with
a percent sign.
Applying a table look.
Note It is up to the researcher to decide which percentages to report. In

this example it is strange to communicate that there are for
example 20% freshmen. The point is that those 10 lecturers
skipped this question and that explains those 10 cases which are
reported as system missing. In the valid responses (students) it
turns out that a quarter (25%) are freshman. That is the reason we
display the column Valid Percent. You can choose to remove the
last two rows from the table for this reason as well.
7. Double-click on the frequency table to start the editor. As you can see there are
some changes in the interface: a notched edge around the table and some menu
entries have changed. Moreover the Formatting toolbar has appeared.
Figure 3.3 The output window with an active editor and the formatting toolbar
Note If the formatting toolbar does not show up, use the menu View
Toolbar.

On the screen with the active editor you can alter the table and its make-up. Please
note that if you click outside the editor area (that is outside the notched edge), the
editor will close and you will return to the output viewer.
We are going to hide two columns from the table. There are two ways to hide a
column: either you drag the right border of the column or you select the whole column
and choose Hide Category from the popup context menu.
8. With the left mouse button drag the right border of the column Percent to the left,
as far as possible. While dragging, the actual width of the column is displayed until
SPSS shows the message Hide. At that very moment you release the left mouse
button and the column will disappear.
Note If things go wrong and you happen to destroy the table, please do
not panic. It is very easy to recreate the table by running the
FREQUENCIES procedure again. However, there is also a Undo entry
in the Edit menu. Moreover, the first button of the toolbar gives
you that function as well. Appologies to heavy users: there is no
keystroke Ctrl-z available.
9. Ctrl-Alt-click on the Cumulative Percent column to select all of the cells in that
column. Right-click the highlighted column and choose Hide Category from the
pop-up context menu. This column is now hidden also.
Now we change the display format of the percentages in the pivot table.
10. Click on the Valid Percent column label to select it.

From the menus, choose: Edit Select Data Cells.
From the menus, choose: Format Cell Properties.

Select the second tab
Select ##,#% from the Format list.
Type 0 in the Decimals field to hide all

decimals in this column.
11. Select the second tab Format Values. Select ##,#% from the Format list.
Type 0 in the Decimals field to hide all decimals in this column.
Click OK to apply your changes.
12. From the menus, choose: Format Table Properties.

With the Mouse point at the
line you want to alter.
SPSS selects the corres-

ponding category by itself.
Select the line
style.
13. Switch to the tab Borders and point with your mouse at the line of interest. In the
Border list, the option Horizontal category border (rows) is selected
automatically. Select the appropriate line style: the dashed line.
14. Click OK to apply your changes.

The last action is shading the Total rows.
15. Ctrl-Alt-click on the label Total to select the whole row.

From the menus, choose: Format Cell Properties.
Switch to the tab Font and Background and select a Bold Style. To change the
background, first click at the square next to Background and choose your colour,
we prefer a gray (228) background.
Click OK to apply your changes.

16. Perform these actions for both Total rows. Change the label of the bottom row into
Grand Total.
17. Now we have finished. Click outside the notched edge to close the pivot table
editor. The final result is shown in Figure 3.4.
Figure 3.4 The customized frequency table
18. Customize the frequency tables of the variables Variety Basic Assortment and
Mark in the same way.

Figure 3.5 The customized frequency tables of two other variables
Now there are three frequency tables ready for publication.
19. Again, save your output file.

In Section 3.6.1 we will explain how to document your output file in order to retrieve
these tables in an easy way.
3.3 Calculating Statistics

There are many procedures in SPSS to produce statistics. We will introduce the
procedures FREQUENCIES and DESCRIPTIVES to you. Always take the level of
measurement into account because otherwise the results can be without any meaning.
3.3.1. FREQUENCIES
We will discuss some features of the FREQUENCIES procedure by using it for the
variable Mark.
1. From the menus, choose: Analyze Descriptive Statistics Frequencies, select

the variable Mark and move it into the Variable(s) list.
2. Click Statistics. Select Quartiles, Mean, Median, Mode, Std. deviation, Minimum,
and Maximum.
The meaning of the statistics:
Mean: arithmetic average

Median: value at which
half the cases fall above
and below
Mode: value with the
highest frequency
Std Deviation: spread of
the scores around the
mean value
Quartiles: values at which
25%, 50% and 75% of the
scores fall respectively.

3. Click Continue.
4. Deselect Display frequency tables in the main dialog box. (Often frequency tables
are not useful for scale variables since there may be almost as many distinct values
as there are cases in the data file).
5. Click OK to run the procedure.

The FREQUENCIES Statistics table is displayed in the Viewer window.
Figure 3.6 FREQUENCIES Statistics table
3.3.2. DESCRIPTIVES
The procedure DESCRIPTIVES also calculates statistics. If you deal with several variables
at the same time the output is more convenient than FREQUENCIES. The other differen-
ces with the procedure FREQUENCIES are not significant. DESCRIPTIVES provides the
calculation of z-scores, whereas FREQUENCIES provides the option to produce a bar
chart, pie chart or histogram.
1. From the menus, choose: Analyze Descriptive Statistics Descriptives.

Select the variables Visits, Expenditure and Mark and move them into the
Variable(s) list.
2. Click Options to select statistics.

Figure 3.7 shows the table with the statistics involved. It is possible to display the
variables in a different order, as you can see in the dialog Descriptives: Options.
Figure 3.7 The output of the DESCRIPTIVES command
3. Save the output file again. You always have to save after producing new output.
This is so obvious that we will not repeat this anymore.
3.4 Graphs: Pie, Bar, Boxplot and Histogram

You can create and edit a wide variety of chart types in SPSS. We have already seen that
graphs can be made within the procedure FREQUENCIES. To demonstrate the basics of
creating and editing charts, we will create a pie chart, bar chart, boxplot and histogram
in this section and show you the graph editor to edit charts.
We will use the CHART BUILDER wizard to create graphs, because this interface
facilitates your building process and shows you a preview of the graph. It is important
to know that SPSS can produce a good graph only if the measurement levels of your
variables have been set correctly. To emphasise this, SPSS will popup a warning when
you start the chart builder wizard and invites you to check the measurement levels.
1. From the menus choose the option Graphs Chart Builder.

Check this, if you do not want to
have this dialog in the future.
Since we have set all measurement levels for each variable and all our categorical
variables have value labels we can proceed by clicking OK.
The preview
pane of the
chart
The icon of a
pie chart
Figure 3.8 The dialog CHART BUILDER
The large dialog of the CHART BUILDER wizard will be shown. Please note that if you
select a categorical variable in the list, e.g. Staff, SPSS displays the value labels. The
preview displays these labels also, but not the real data. You can build your graph by
dragging the elements into the preview pane.
3.4.1. Creating a Pie Chart

We start by creating a simple pie chart that shows us how many respondents are
satisfied with the customer service of the staff of Suxes.
2. In the lower part of the dialog CHART BUILDER (Figure 3.8) at the tab Gallery,
select the option Pie/Polar. Drag the Pie chart icon into the preview pane.
The dialog Element Properties will show up, but we will discuss this dialog later. Right
now, we will concentrate on the preview pane.
3. From the list Variables drag the variable Staff and drop it in the box Slice by?
which is right below the pie.
In the preview you will see that SPSS replaces the vertical box with Count to let you
know that the chart will be based on the counts of each category of the variable Staff.
4. On the tab Titles/Footnotes, select the options Title 1 and Footnote 1.
Two elements are added to the preview of the pie chart and these two elements have
entered the option list Edit Properties in the dialog Element Properties also. In this
dialog you can set the properties of the elements of your chart.
Right now we are ready to discuss the dialog Element Properties
5. If this dialog is not visible, use the button Element Properties on the dialog of
the CHART BUILDER wizard to display it.
6. Enter todays date and your name and class right after the copyright symbol
(press Alt+0169) in the content box of Footnote 1. Click Apply to confirm.
7. Select Title 1 from the list and enter Rating customer service.
Again, click Apply to confirm.
Note It is important to place the title (and footnote) in the graph itself. If
the graph is copied to WORD (or exported to another software
package) only the graphical data is included. So the graphical file
must contain all the information.
8. Click OK to finish. SPSS will create the pie chart for you.

3.4.2. Editing a Pie Chart
The graph is rather empty because only the pie with the slices and a legend are
displayed. It is not suitable for publication. We will edit the graph including the
following operations:
Display the percentages of the slices

Edit the title and the footnote
Edit the colours and shading of the slices
1. Select the graph in the Output Viewer and double click the pie chart to open it in
the Chart Editor.
The Chart Editor will open the chart in a new window. As long as the chart is in
progress in the Chart Editor you see that it is shaded in the Output Viewer. The graph
is object oriented, which means that it consists of elements with their properties. You
can set the properties of each element of the graph after selecting it. The collection of
elements contains the graph as a whole, the interior part of the graph, the collection of
slices, each slice itself, the titles, the labels and so on.
2. Maximise the Chart Editor.
3. Right-click on the pie and choose the option Show Data Labels.
The dialog Properties appears. In this dialog you can change the properties of the
elements of the graph.

Note Move the dialog Properties to the right of the screen in order to see
both windows simultanuously. If you select an element of the graph
(left) the corresponding tabs in the Properties window are
displayed (right). After selecting another element of the graph
different tabs will appear.
The result of editing can be seen after clicking the Apply button.
You can also open the dialog Properties with the button on the
toolbar.
Hide labels
(= move
downwards).
Display labels
(= move upwards)
Position of labels
outside the pie.
4. Select the Number Format tab. You do not want the labels to display decimal
places, so type 0 in the Decimal Places text box.
5. Click Apply to see the result in the pie chart.
6. Select the Data Value Labels tab. In the Displayed list you see Percent and in
the Not Displayed list the value labels of the variable Staff. Those labels are
hidden in the chart. Since we want to display these labels in the chart, add them
to the box Displayed.
7. Choose the option Custom at Label Position in order to place the label outside
the slice (left button). At Display Options, check the option Display connecting
lines to label. Click Apply to update the labels properties.
Now the labels are outside the pie and can be positioned individually. However the
font size needs to be adjusted.
8. Select the Text Style tab and adjust the font size. Our preferred size is 9 which
makes the label size easily readable.
9. Select Footnote and change the font style to Italic.
10. Select Title and change the font family and enlarge to preferred size.
11. Since all information in the legend is in the chart itself, there is no need to have
a legend anymore. Hide the legend.
12. Change the colour of one slice and the border of another one. Click on the pie
and click again on a slice to select it. Use the Fill & Border tab to make the
changes.
13. When you are done, close the Chart Editor. The updated pie chart is shown in
the Viewer.
The result of our editing is shown in Figure 3.9.
Figure 3.9 The customized pie chart.
3.4.3. Editing a Pie Chart (Extra)

The dialog Properties has two more tabs of interest. The Categories tab can be used to
edit and sort the categories and at the Depth & Angle tab you can decorate the chart
with a shadow or a 3-D perspective. Please do some experiments with these options by
yourselves.

3.4.4. Creating a Bar Chart
Bar charts can be made for variables with a nominal or ordinal (classified) level of
measurement. For variables with a ratio level of measurement (scale) we prefer to use
a histogram (see Section 3.4.8).
We will create a bar chart of the variable Variety of products in the basic assortment.
1. From the menus, choose: Graphs Chart Builder.

Use the button Reset to clean up the settings of the previous task.
2. On the tab Gallery, select the option Bar and drag the icon Simple Bar into the
preview pane.

3. Drag the variable Variety_basic from the list into the box X-axis below the graph.
4. At the tab Titles/Footnotes, check the options Title 1 and Footnote 1.
5. In the dialog Element Properties, enter the title Variety of the basic assortment
and todays date, your name and class as footnote. Do not forget to confirm by
clicking Apply.
6. Still in the dialog Element Properties, select the element Bar1 and edit the property
Statistic into Percentage and click Apply.
7. Click OK and SPSS will create the bar chart for you.

3.4.5. Editing a Bar Chart
In order to make the chart ready for publication we need to edit a couple of things such
as the layout of the title, adjustment of the vertical axis and the position of the text on
the horizontal axis. Eventually the graph could be transposed as well.
1. Double-click on the chart to open the Chart Editor.
2. Select the title of the horizontal axis. After a (right-)click the Properties dialog
containing the properties of this object will appear.
3. From the dialog Properties select the Text Style tab. Choose Georgia from the Font
Family list, Style Italic, Size 12 and Colour Dark Blue. Confirm your choices by
clicking Apply.
There are three types of justification
available for an axis title
4. Select the Text Layout tab and choose a justification to the right (Justify).
5. The vertical axis also needs some adjustments. Click the button Y on the toolbar to
select the vertical axis.
6. Use the Number Format tab to suppress the decimal places.
7. On the Scale tab you can adjust the subdivisions on the vertical axis. So type 50 in
the Maximum text box and 5 in the Major Increment text box. Click Apply to see
the results.

Maximum at the
vertical axis.
Distance between
the ticks
8. Be sure that the vertical axis is still selected.

From the Chart Editor menus, choose: Options Show Grid Lines or use the
button on the toolbar. Horizontal grid lines are drawn and the Grid Lines tab
appears in the Properties dialog. Select Both major and minor ticks and click
Apply to confirm.
9. You see that gridlines are added to the graph. Select the Lines tab and choose a
dashed Lines Style with a grey colour.

10. Select the label of the vertical axis and on the Text Layout tab choose for left
justification in order to place this label at the start of the axis.
11. The bars: click on one of the bars to select them. From the Chart Editor menus,
choose Elements Show Data Labels. (This can also be achieved with the
corresponding button on the toolbar.)
12. Now, look at the Properties dialog on the Data Value Labels tab. In the Label
Position panel, select Custom, Below Centre. Click Apply to confirm.
Select the position of the

Data Labels.
13. On the Number Format tab, type 0 in the Decimal Places text box and on the Text
Style tab, choose a Font Size of 9.

14. Again select the bars and open the Fill & Border tab. Choose a nice colour to fill the
staves and a pattern if you wish.
15. Adjust the title. Choose the font family Bookman Old Style, change it to bold and
size 18. (Another font family is fine as well.)
16. Finally, change the Footnote into Italic, 8 points and locate it at the bottom left in
the chart.
The graph has been very much improved and now it is ready to be published.
Figure 3.10 The bar chart after the editing process
17. Close the Chart Editor and save your output file.
3.4.6. Creating a Boxplot

For variables with a ratio level of measurement (scale) we can display the frequency
distribution by means of a boxplot. Although the original scores are not visible
anymore, the median, quartiles and outliers will give you good information about the
distribution. Skewness of the distribution can be recognized easily in the boxplot.
The boxplot provides graphical representation of the data based on the five-number
summary that consists of
Xsmallest Q1 Median Q3 Xlargest
The vertical line in the middle of the box represents the median. The vertical line at the
left side of the box represents the location of Q1 and the vertical line at the right side of
the box represents Q3. Thus the box contains the middle 50% of the observations in the
distribution. The lines outside the box contain the lower 25% and the upper 25% of the
observations up to the outliers and extremes. These observations are represented with
a star or a circle symbol.
In this section we will introduce the simple boxplot to you, a boxplot of the
expenditures per week in the restaurant. In Section 4.3 we will produce a boxplot to
compare the expenditures of students and lecturers.
1. From the menus, choose: Graphs Chart Builder.

Use the button Reset to clean up the settings of the previous task.

2. On the tab Gallery, from the category Boxplot, drag the 1-D Boxplot icon into the
preview pane.
3. Select the variable Expenditure from the list and drag it into the box X-axis? at the
left site of the graph.
At the tab Titles/Footnotes, select the options Title 1 and Footnote 1 again to add
your texts to the to graph.
4. In the Element Properties dialog enter the title Expenditures in the restaurant on
a weekly basis and let the footer display the current date with your name and class
after the copyright symbol (as always). Do not forget to confirm by clicking the
Apply button.
5. Click OK to create the boxplot.
The boxplot displays the distribution of the variable Expenditure. The two circles
represent two outliers and the numbers are the corresponding respondent numbers.
3.4.7. Editing a Boxplot

This graph is not yet suitable for a publication. It takes too much space and its better
to draw the boxplot horizontally. Moreover the scale at the axis is too rough and it
should display the -sign somewhere in the label. That is why we are going to edit this
graph.
1. Double-click the chart to open the Chart Editor.

2. First, rotate the graph by clicking the button Tranpose chart coordinate system on
the toolbar.
3. Now, we want to adjust the horizontal axis. Use the X button on the toolbar to
select it.
Halve this value to

reduce the height
of the graph.
This number
determines the
scale of the axis.
Uncheck this
option because
you dont want the
width to be
changed.
4. Select the Scale tab and change the Major Increment into 2. Click Apply.
5. Now we need to change the size of the chart.

In the Properties dialog, select the tab Chart Size. In order to make the chart half as
height you halve the Height. However, you do not want the width to be changed so
uncheck the option Maintain aspect ratio before changing the height.
Note, aspect ratio is the ratio of width and height and we want to change that ratio.
Click Apply.
Note: If your PC shows the units in inches, you can change this setting via
the menu Edit Options. The tab General in this dialog has a
pane with an option list, which has centimeters as a third entry.
Please see Section 9.1 for further details about the settings of SPSS.
6. Since the chart title contains the same text as the axis label, we are going to change
the latter into Amount in (The -sign is inserted by hitting Alt+0128). Enlarge
the font size of the text and justify it to the right side of the axis. (You can also
change the font family etc, if you like).
7. The last step is to adjust the other texts of the graph.
Now the chart has improved a lot and it looks much better.

Figure 3.11 The boxplot with the adjusted layout.
3.4.8. Creating a Histogram

For scale variables (level of measurement ratio or interval) we can construct an
histogram. It is important that the original figures are available. You must have an
open ended question in the questionnaire and not a closed one with classes. The
histogram has the (possible) values of the variable on its horizontal axis. The vertical
axis represents the frequencies. SPSS constructs a set of classes where all class widths
are equal. You can adjust these classes if you wish.
For interval/ratio variables statistics like median, mean and standard deviation can be
computed as well. Use the CHART BUILDER wizard if you want to make the histogram
only. This graph will have a legend with the mean value, the standard deviation and
the number of observations. If you want to have a separate summary of the statistics
(as well), then you can use the histogram option of the FREQUENCIES command.
1. From the menus, choose Analyze Descriptive Statistics Frequencies and move
the variable Expenditure into the Variable(s) text box. (If this text box already
contains a variable, click Reset to cancel all previous choices).
Deselect to prevent the

displaying of frequency tables
2. Click Statistics and select the options displayed in the next figure.
Do the same in the Charts dialog.

3. Because it is no use displaying the frequency table (too many values) you need to
deselect the Display frequency tables option.
4. After that, click OK in the FREQUENCIES dialog to start the analysis. The first block of
the output contains the statistics. The histogram sees rather capriciously. If you are
dealing with a rather large number of classes and relatively low frequencies it is
preferred to adjust the classification.
3.4.9. Editing a Histogram

1. Open the Chart Editor by double-clicking the graph. In the Chart Editor, maximize
the window and open the Properties dialog and place it at the rights side of the
chart (see next figure).

2. The Properties dialog has a tab Binning on which you can adjust either the number
of staves (intervals) or the width of each stave (interval width) on the X axis. Here
we customize the X axis by typing a 4 in the Interval width text box. Click Apply to
inspect the result.
3. Give the horizontal and vertical axis a Major Increment of size 2.
4. Resize the graph to fill the whole width of the graph area it has behind it. (See next
figure which square to take.)
Dragging this
square to the
right enlarges
the graph to fill
the whole
canvas
5. Move the legend above the graph (upper right corner) and change the background
and the border colours.

6. Change the layout of the titles and gridlines so that your histogram matches Figure
3.12.
Figure 3.12 The histogram transformed into a layout suitable for a publication
3.4.10. Exporting Graphs

It is worthwhile to realise that all the graphs you have produced (four up to now) are
located in the SPSS output file. This file is like a container which contains all output (in
a tree structure) and is saved as a whole.
The File menu in the Chart Editor does not provide an option to save or export the
chart. It is however possible to save the layout as a template. This is very useful if you
have to produce a number of charts with the same layout. If you save the template SPSS
raises a dialog to select the layout elements to be saved.
With this option you can save the lay-out, not the
chart itself.
This is called a template.
With this option you can apply the lay-out of a chart

saved before. This is very useful if you want to
have all your charts with the same look (house
style)
In the output viewer you can export graphs to a graphical file, such as a jpeg file. You
can find this in the Viewer menus under File Export. There are, of course, many
other ways to export your work to different software applications. Browse this entry if
you are looking for a special format.

3.5 Creating a Categorical Variable from a Scale
Variable
Variables with a ratio scale usually have many outcomes, which makes a frequency
table or cross tabulation rather unusable. For those analyses it is more convenient to
create a categorical variable from that scale variable. For the variable Expenditure we
will show the transformation process into three classes. We will show you two
methods, the first is a manual method (the classical one) and in the second one SPSS
will do the job for you.
3.5.1. The Manual Classification Process

This first method has three steps, which we will follow in our instructions.
1. Realize yourself how many categories are needed and what the boundaries must
be.
2. From the menus, choose: Transform Recode into Different Variables. This
procedure creates a new variable with the classification.
3. The last step is entering appropriate value labels and the correct level of
measurement.
We will elaborate these three steps now.
1. Create a frequency table of the variable Expenditure.

This table is too large and hard to interpret. But we need the last column to find
out what the boundaries for a classification into three categories must be.
Expenditure in the restaurant on a weekly basis
Valid Cumulative
33% =
Frequency Percent Percent Percent
upper bound of the first
Valid 2 4 8,0 8,7 8,7
class =
3 3 6,0 6,5 15,2
lower bound of the
4 3 6,0 6,5 21,7 second class
5 10 20,0 21,7 43,5
6 2 4,0 4,3 47,8
7 1 2,0 2,2 50,0
8 3 6,0 6,5 56,5
10 11 22,0 23,9 80,4 67% =
12 3 6,0 6,5 87,0 upper bound of the
15 4 8,0 8,7 95,7 second class =
18 1 2,0 2,2 97,8 lower bound of the
third class
20 1 2,0 2,2 100,0
Total 46 92,0 100,0
Missing System 4 8,0
Total 50 100,0
With three classes or categories you must realize that each category has about 33% of
the cases in it. More important is that the boundaries are round numbers. Focus your
attention on the Cumulative Percent column and look up the 33% value. So the upper
boundary of the first class is in the neighbourhood of 5. The upper boundary of the
second class (67%) will be around about 10. So we have:
First category : all up to 5 euro.
Second category: between 5 euro and 10 euro.
Third category: exceeding 10 euro.
2. From the menus, choose: Transform Recode into Different Variables.

Use the Change button to
create the new output variable
3. Select the variable Expenditure from the list and move it into the transformation
list. Create a new output variable Expenditure_categories and label Expenditure
in restaurant. Click the Change button to create this new variable and click Old
and New Values to define the classification.
Entering the lower and the

upper boundary
Handy for the first class,

only the upper boundary is
needed
Handy for the last class,

only the lower boundary is
needed
4. Enter the classification and use the three Range options as explained. (Note:
through in SPSS means up to and include).
Classification Old Value New Value
Lower through 5 Class 1 Range, Lowest through value: 5 1 and click Add
5 10 Class 2 Range: 5 through 10 2 and click Add
10 through highest Class 3 Range, value through Highest: 10 3 and click Add

Note The order of the transformation rules is important. If you entered
the rules in another order, it can happen that the border values 5
and 10 end up in the other class.
5. Continue and click OK in the main dialog. (If the OK button is disabled you probably
did not click the Change button to apply the new variable name and label.)
The new variable arrives in the Data Editor (in the Data View) at the right side. In the
Variable View, the new variable is at the last line. Our last step is to create Value
labels.
Click here to
enter the Value
labels.
The new created variable with the

classification
6. Enter Value labels for the new variable

1 = 5 or less
2 = 5 10
3 = more than 10.
7. Change (in the Variable View) the columns Decimals (into 0) and Measure (into
ordinal) of the new variable.
8. Save your data file again! Many changes have been made. (SPSS puts an asterisk in
the title bar before the file name to remind you that the file has not been saved.)
9. Finally, make a frequency table and a bar chart of the new variable
Expenditure_categories.
Customize the layout as shown in Figure 3.13 and Figure 3.14
Please note that you do not use the original variable! In the FREQUENCIES dialog
you will find the new variable at the last entry of the variable list.
After editing the table should look like Figure 3.13.
Figure 3.13 The customized frequency table of the new

categorical variable Expenditure
Figure 3.14 Edited bar chart of the new categorical variable
3.5.2. A Classification by SPSS (VISUAL BINNER)

SPSS has a complete and beautiful procedure for classification, called VISUAL BINNING.
Binning means putting into a bin, that is a box or frame, so it will group your
continuous data into the same categories. Since the VISUAL BINNING procedure relies on
actual values in the data file to help you make good banding choices, it needs to read
the data file first. Since this can take some time if your data file contains a large
number of cases, this initial dialog box also allows you to limit the number of cases to
read (scan). This is not necessary for our data files.
1. From the menus, choose: Transform Visual Binning. Drag and drop Expenditure
from the Variables list to the Variables to Bin list, and then click Continue.

The new variable
to be created
2. In the main Visual Binning dialog, select Expenditure in the Scanned Variable List.
A histogram displays the distribution of the selected variable.
3. Enter Expenditure_Cat2 for the name of the new banded variable and Expenditure
(in ) for the variable label.
4. Click Make Cutpoint to define the border values of the classification.

For three classes, two border
values (cutpoints) are
needed.
It is rather simple to create a classification. Generally speaking, the number of cut-

points equals the number of classes minus 1. So, in order to make three classes, two
cutpoints are needed, and SPSS immediately understands that each category will have
33% of the observation, if the distribution was perfect.
5. Enter 2 in the Number of Cutpoints text box and click Apply.
After that, SPSS comes up with the cutpoints 5 and 10. You can ask SPSS to make labels
(by clicking the button), but we prefer to do this ourselves.
6. Enter the labels for the classes. (See figure above).
7. Click OK to create the new, banded variable (and click OK again).
8. Create (as a final check) a frequency table and a bar chart of the newly created
variable Expenditure_Cat2. Are there any differences compared to the previous
section?

3.6 Documenting and Publishing SPSS -output
All your results from running a statistical procedure are displayed in the Output
Viewer. The output produced can be statistical tables, charts graphs, or text,
depending on the choices you make when you run the procedure. This section will
discuss how to transfer (export) output to WORD. It is important to be able to insert
SPSS output into a WORD document or PowerPoint presentation because your research
report will be a word document and you want to tell the world about your results.
It is also possible to print the SPSS output from the viewer. Again you need customized
tables and graphs because no one wants to receive garbage. In this chapter we have
discussed how to clean up a frequency table and get a graph ready for publication.
3.6.1. Documenting the Output

The content of the output viewer is saved as one file. In order to be able to find your
results later, it is recommended that you rename the branches (blocks) in the tree. The
bare procedure name is not very informative.
In this way it is
Rename every easy to retrieve
block to your results.
section xx or
task yy.
Click the box

with the minus
sign (-) of the
procedure
whose results
you want to
hide.
If you document your output file as shown here it will guarantee that you can retrieve
your work without any effort. That is why it is recommended to document your output
in this way.
1. Document your output file as explained before.
An output file can be saved by choosing File Save. The contents will be saved in a
file with the extension .spv. Because we want to use meaningful names, we prefer to
use File Save As and give the output file a name by ourselves.
2. Save your output file again with the name Pandion Output Chapter 3. SPSS will
provide it with the extension .spv automatically. Please note that the file is stored in
the folder SPSS Basic Course.
3. Do not forget to save your data. Please save this file in the same folder.

4. Finally close all files and terminate your SPSS session.
3.6.2. Opening an Existing Output File

It is convenient to keep all output of a research task in the same output file. To open an
existing output file, from the menus choose: File Open Output.
Note If you use the button Open on the toolbar or the keystroke Ctrl+O,
SPSS will try to open a file of the same kind as you are working with.
So, from the data editor you can open another (new) data file, but
you will not be able to open an output file.
3.6.3. Transferring SPSS Tables to WORD

The frequency tables (see Section 3.2) must be transferred to our report in WORD. This
is just a simple copy and paste action. Please save your files before you switch to
another windows application.
You have created three tables which are on top in the SPSS output viewer (see Section
3.2, instruction point 18). Note: only include one of the tables in your selection and
not other parts of the output.
1. Select one of the tables and, from the menus, choose: Edit Copy or right-click
and use the context menu.
2. Switch (via the Windows-taskbar) to WORD, or start

This dialog will show up in your favourite
WORD.
language.
Select enhanced meta file to get the

best quality picture.
3. In WORD: from the menus, choose Edit Paste Special and select the option
Picture (Enhanced metafile) for the best result. The SPSS-object is pasted as figure
into your WORD document.
In WORD 2007: Use the arrow below the Paste button on the ribbon, to get the
option Paste Special. You also can use the short cut Alt+Ctrl+V to get the Paste
Special dialog.
4. Repeat this procedure for the other tables.

Note If you use the ordinary paste action in WORD (by pressing Ctrl+V)
your tables will be inserted as word tables. This gives you the
opportunity to do the mark up by yourselves in WORD.
If you select the three tables in one selection, you will get a large
picture which contains all three tables. This is not to be advised,
because you cannot place the table images individually in your
document
3.6.4. Transferring SPSS Graphs to WORD

The export of SPSS graphs is a little bit different.
1. Select the graph in the SPSS output viewer. In SPSS we have to use the option Copy
instead of Copy objects.
2. Switch to WORD.
3. In WORD: from the menus, choose Edit Paste special and select the option
Bitmap in the dialog.
The dialog Format Picture can be opened in several

ways, e.g. by double clicking on the picture, from the
context menu by right-clicking on the picture or via the
ribbon Format available after selecting the picture.
The button Text Wrapping has a
couple of dog buttons to change
the wrap style of the picture.
The dialog Format Picture can be

opened via the small button at the
right bottom of the size panel of the
ribbon.
Explanation In WORD there are two ways to style a picture. Either in line with
the text or floating (the other option). If you want to move the
picture with your mouse to another place on the page or want to
have text beside the picture then you choose one of the floating
styles. Click Advanced to open a dialog to input the coordinates of
the position of the picture on the page. A major drawback of this
floating style is that the picture floats by itself to a place where you
do not want it to be. The option In line with text does not have this
drawback. The picture is fixed in a paragraph like a (very) large
letter and cannot float anymore. That is why the in line with text
style is our favorite.
5. Save the WORD-document by clicking the savebutton (with the disk) on the
toolbar. Name it catering1.doc.

6. Export the output of DESCRIPTIVES (see Section 3.3.2) and the four charts (see
Section 3.4) to the WORD-document also.
7. Finally enter a title at the top of your WORD-document reading Report of the Suxes
Survey at Pandion and a foot text with your name and class and a page number.
Save this document.
3.6.5. Different SPSS File Types

In this section we will discuss the different file types of SPSS. You have used two file
types already, the data file (.sav) and the output viewer file (.spv).
The data file (.sav) is displayed in the Data Editor. The information in the Data Editor
consists of variables and cases.
In Data View, columns represent variables and rows represent cases

(observations).
In Variable View, each row is a variable, and each column is an attribute
associated with that variable.
In Data View, if you put the mouse cursor on a variable name (the column headings),
a more descriptive variable label is displayed if you have defined one for that variable.
By default, the actual data values are displayed. To display labels, from the menus
choose View Value Labels. Descriptive value labels are now displayed. This makes it
easier to interpret the responses. The switch from values to labels (and vice versa) can
be made by the button Value labels on the toolbar as well.
All your results from running a statistical procedure are displayed in the Output
Viewer. The output produced can be statistical tables, charts, graphs, or text,
depending on the choices you make when you run the procedure. This file always has
the extension .spv and is like a container with a tree structure. The structure is
displayed in the outline pane (on the left side) and the contents pane, containing the
actual output, is on the right handside. This has been discussed in Section 3.6.1.
SPSS syntax provides a method for you to control the product without navigating
through dialog boxes, viewers, or data editors. Instead, you control the application
through syntax-based commands. Nearly every action you can achieve through the
user interface can be achieved through syntax. Using syntax allows you to save the
exact specification used during a session. The easiest way to create syntax is to use the
Paste button located on most dialog boxes. This facilitates repetitive analyses on
several data files in an easy way. You save your syntax file with extension .sps.
Another way to automate tasks within SPSS is the scripting facility. In previous versions
of SPSS, this scripting language is called Visual Basic for Application and is used in
Microsoft Office applications as well. Since SPSS wants to provide software for different
operating systems, they have introduced the Python and the R scripting language into
their software. You can use the spss software not only on Windows systems (Microsoft
Windows XP (Professional, 32-bit) or Vista (32-bit or 64-bit), Windows 7), but also
on Apple Mac 10.5x (Leopard) and 10.6x (Snow Leopard), and Linux. In this
course we will not discuss the syntax and scripting facilities.
3.7 Feedback on the Research Questions

In this chapter we have discussed how to analyze research questions with respect to
one variable. In our research of Suxes at the Pandion University we are able to answer
research objectives 1, 2, 3, 4, 6, 7 and 8 right now. Create your own report in WORD and
mail this to your instructor together with your SPSS data file and SPSS output file. Do
not forget to document your output file as is explained in Section 3.6.1.
Research Objectives (Field research)

(3) How does one assess the range of choice in the basic assortment and in the
luxury assortment?
(6) Is there a need for products which are not in the assortment at this moment?
(7) Who is the customer (gender, student or lecturer, number of study years)?
(8) What is the overall level of satisfaction of the catering services expressed as a
score on a scale from 1 up to 10?
Research Question 1
In the frequency table we see that 28% of the respondents visit the restaurant 5 days
per week. Moreover 28% visit the restaurant 2 days per week and only 4% 4 days a
week. It is important to realize that those who never visit the restaurant are excluded
from the survey.
(See Section 3.2 for cleaning up a frequency table.)
The chart will make clear that the answer 4 times a week is quite exceptional.
Research Question 2
The expenditures in the restaurant are between 4 and 8 Euro for the largest group of
respondents. The mean value is 8 Euro, but the spread is rather high (the standard
deviation is 4,47 Euro). The histogram clearly shows that the distribution is skewed to
the right.

(See Sections 3.4.8 and 3.4.9 for instructions on how to create and edit a histogram.)
Since the distribution is skewed to the right, the mean value will be greater than you
might expect. The boxplot shows that this is caused by two outliers of our data.
(See Sections 3.4.6 en 3.4.7 for instructions how on to create and edit a boxplot.)
Research Question 3
We can report the rating of the assortment by publishing two frequency tables.
It is clear that 28% of the customers are not satisfied with the variety of the basic
assortment and 22% are not satisfied with the variety of the luxury assortment.

Research Question 4
In this table we can see that almost 50% of the respondents are very satisfied with the
customer service level of the staff. Only 8% think customer service is bad.
Research Question 6
Only 10 of our respondents mention specific products to extend the assortment.

Striking is that suggestions for more choice in soup and snacks are relatively frequent.
Research Question 7
This research question deals with the variables Gender, Customer_type and YearStud.
We will create a frequency table and a chart for each variable.

Summarizing our response group, we can conclude that on the whole there are a
slightly more men than women participating (27 males and 23 females). The response
group consists of 40 students and 10 lecturers. The student group represents all year
groups well, except for the fourth-year students who are a little under represented with
only 15% in our response group.
Research Question 8
The marking of the catering services is positive. Only 18% of our response group gives
a negative score and the mean score is almost a 7.
To Conclude
In this section we have given an overview of tables and graphs needed to answer the
research questions 1 to 4 and 6 to 8. In your report it is important to be able to discuss
the graphs and tables. Tell a story and mention the remarkable outcomes of the graphs
or tables and explain to the reader what makes this outcome worthwhile. We did this
by formulating a conclusion after each graph or table. You should always consider
whether to include a graph or table in your main text or in an appendix. Remember, all
charts and tables in the appendices need to be referred to in the main text.

4. Research Questions with
R e s p e c t t o Tw o Va r i a b l e s
4.1 Introduction
In a research project there are usually research questions where the differences
between groups are of interest. In those situations we have a variable (factor) which
defines the groups or levels. A factor like registered years may have several numerical
levels (e.g. 1, 2, 3, 4, and 5) or a factor such as customer type may have several
categorical levels (e.g. Type 1, Type 2, Type 3). These groups or levels are compared to
each other by taking to the other variable into account. The variable which defines the
groups or levels is called a factor, grouping variable or independent variable. The
variable to be compared is called the dependent variable, because its value may be
dependent on the group.
Examples of such research questions in the Suxes Customer Survey at Pandion are (see
also Section 1.1):
For example, in research question (11) Customer_type is the independent variable

which defines the groups. The (mean) expenditure may depend on the type of
customer. Expenditure is the dependent variable and is measured at a ratio level. In a
bar chart you can define two bars representing the mean expenditure of students and
lecturers. The difference in height illustrates the differences in mean expenditure.
Research question 9 will be analyzed in another way. We can construct a cross

tabulation and a clustered bar chart. The band diagram provides a graphical way to
compare the groups mutually.
From the examples it becomes clear that the dependent variable may have any level of
measurement. That level determines the statistical method to be used. Table 4.1 gives
an overview.
Level of measurement
of the dependent
variable Statistical methods to be used
Bar chart representing mean values (4.2)

Interval or
Boxplot for groups or levels (4.3)
Ratio
Comparing MEANS (4.4.3)
Contingency table or cross tabulation (4.5)

Ordinal or
Clustered bar chart (4.6)
Nominal
Band diagram (4.7)
Table 4.1
The other kind of research questions in which two variables are involved deals with the
relationship between those variables. There is said to be a positive relationship if
higher values of the one variable lead to higher values on the second variable and of
course, lower values of the first variable lead to lower values of the second variable. If
we are dealing with a negative relationship, high values of the first variable lead to
lower values of the second variable. An example of such a research question from the
Suxes Survey is:
spent?
In Table 4.2 we again see that the level of measurement determines the method of
analysis. (This table only displays the most common situations).
Level of measurement Statistical method
Cross tabulation (4.5) + Cramrs V (4.5.1)

Nominal Nominal
Clustered bar chart (4.6) or
Ordinal Ordinal Band diagram (4.7)
Regression and correlation (4.8)
Ratio Ratio Scatter plot (4.8.1)
Coefficient of correlation (4.8.3)

Table 4.2
In this chapter, research questions with respect to two variables are analyzed on a
descriptive level by means of statistics, a table or a chart. Later we will perform
statistical tests to see whether the results are significant, meaning valid for the
population as a whole or just (lucky or unlucky) coincidence. The theory can be found
in Berensons Basic Business Statistics.
4.2 Comparing Groups with a Simple Bar Chart

1. Start SPSS and open the data file Suxes Survey.sav.
Save all output from this chapter in a new output file, Suxes Chapter 4.spv.
We will start to analyze research question (11): Is there a difference between students
and lecturers in the amount spent?. The first step is creating a bar chart displaying
the mean expenditures. This analysis is not valid if the grouping variable (here
Customer_type) is dependent on the ratio level variable.
In SPSS, creating a bar chart displaying mean values is a special option in the dialog
Define Simple Bar.
2. From the menus, choose: Graphs Chart Builder. In the Gallery from the
category Bar choose the variant Simple Bar.
3. Move the variable Customer_type into the box representing the horizontal axis
and the variable Expenditure to the vertical axis.

4. On the tab Titles/Footnotes, select the options Title1 and Footnote 1. Directly after
ticking the checkbox, in the Element Properties dialog, add Expenditures in Suxes
Restaurant at Pandion as a title and a footnote with the current date and your
name and class directly after the copyright sign .
5. Click OK to create the chart. Edit the chart into the layout of Figure 4.1.
Note: Scales on the axes, title of the axes, gridlines, layout of the chart title,
footnote and so on.
Figure 4.1 A bar chart displaying the mean expenditure of students and lecturers
From the chart it is clear that there is a difference in mean expenditure between
students and lecturers. For students the mean expenditure is 7 euro per week and for
lecturers the mean value is almost 12 euro per week. It seems that there is a
relationship between customer_type and expenditure. (An explanation and discussion
of the implications can be given in the Section conclusions and recommendations at
the end of your report).
4.3 Comparing Groups with a Boxplot

In the previous section we only took the mean values of both groups into account. That
is rather limited, because may be there are only two lecturers who dine very
extensively and the other lecturers only buy a cup of soup and a donut. So, in order to
make a better comparison you have to take the spread (the mutual differences within
each group) into account as well. One way to do that is making a boxplot for both
groups in one graph. As indicated in the diagram of Section 4.1 this is the second way
to compare groups if the level of measurement of the dependent variable is ratio or
interval.
1. From the menus, choose: Graphs Chart Builder. In this dialog, we will use the
Gallery category Boxplot to select the Simple Boxplot.
If you do not use the Reset button, you will notice that SPSS leaves the variables of
the previous operation in the screen of the Chart Builder.
2. At the tab page Basic Elements, there is the button Transpose which will rotate
your boxplot a quarter of a turn.

3. Add a title and a footer and create the boxplot.
4. Again the graph needs some major editing. Make your graph like Figure 4.2.
(See Section 3.4.7 how to edit a boxplot.)
Figure 4.2 The boxplots, prepared for publication
4.4 Creating Subgroups and Making a Comparison

After analyzing the two graphs in the previous sections, we feel the need to compare
the statistics of both groups. We have already introduced two SPSS commands,
FREQUENCIES and DESCRIPTIVES by which statistics can be computed. By using SPLIT
FILE we can do this separately for each group. With SELECT CASES the analysis is done
for the selected group only. A third way is using COMPARE MEANS, because this
command is specialized in comparing groups with respect to a number of variables.
Which command you will actually use in your own research is a matter of personal
preference.
4.4.1. Analysing Subgroups with SPLIT FILE

The command SPLIT FILE creates groups in the data file based on the values of a certain
variable, in our example Custom_type. After running the SPLIT FILE command, all
analyses are done for each group separately. We will use the FREQUENCIES command
without displaying the frequency table.
After your analysis, do not forget to undo the split by running SPLIT FILE with the
option Analyze all cases, do not create groups.
1. From the menus, choose: Data Split File.

The first option disables SPLIT
FILE. Now choose the second
or third option.
2. Select the option Compare groups. With this option the output of the separate
groups are organized in a table. If you want to have separate output blocks, select
the option Organize output by groups.
3. Move the variable Customer_type to the Groups Based on text box
4. Click OK.
Of course, there is no output because no analysis has been done. The only thing done
which has been done is the a change of a setting allowing SPSS to run each command
as often as there are subgroups.
5. From the menus, choose: Analyze Descriptive Statistics Frequencies.

Select the variable Expenditure and ask for the statistics Mean, Median, Quartiles
and Std. deviation.
6. Do not display frequency tables, so uncheck that option.
Unchecking this option prevents the

display of the frequency tables.
The output contains a table with the statistics for each group.

Figure 4.3 A table with the statistics for each group in a separate row.
Note (1) If the split-file processing is in effect, the message Split by

will appear on the status bar at the bottom of the application
window.
Indicator SPLIT FILE
(2) After activating SPLIT FILE the data file is sorted to make the
groups. For restoring the original sorting we have created the
variable Respnum. That is why you must always have such a
variable.
(3) The SPLIT FILE status is not stored in the data file. It only
remains in effect for the rest of the session unless you turn it off. If
you start a new session you have to activate SPLIT FILE again.
After your analysis you must not forget to undo the split.
7. From the menus, choose: Data Split File and select the option Analyze all cases,
do not create groups. This resets the split of the data file.
4.4.2. Analysing a Subgroup with SELECT CASES

You can restrict your analysis to a specific subgroup based on criteria that include
variables and complex expressions. The criteria used to define a subgroup can include:
Variable values and ranges, date and time ranges, case numbers, arithmetic and logical
expressions and functions. The important difference with SPLIT FILE is that the analysis
is only executed once, for the selected group of cases. This is skilful when you only
want to analyse the respondents older than 40 years or want to focus on the students
in our data file.

In this section we will create a histogram for the student respondents. So we have to
make the selection first, and after that, we can create the histogram. Before you can
proceed with another analysis you must deactivate the selection by turning the
filtering off.
1. From the menus, choose: Data Select Cases.
Click this button to enter

the selection condition
With this option unselected cases

remain in the data file
Note If you want to delete the unselected cases you choose the
corresponding option in the panel Output. However, watch out,
after saving your data file, those deleted respondents will disappear
for ever. So, be careful with this option.
2. Select the option If condition is satisfied and click the button If .

3. Form the list, select the variable to be used in the selection process and click the
arrow button. This brings the variable (Customer_type) in the text box. Now, it is
important to realize that SPSS knows that there are two types of customers, students
and lecturers, but SPSS uses the codes to identify them. So you have to remember
that 1 = Student and 2 = Lecturer. You can either type or use the calculator pad to
complete the selection line (=1).
4. Click Continue and OK to activate the selection.
In the Data Editor (Data View) unselected cases are marked with a diagonal line
through the row number. Moreover in the status bar the message Filter On is
displayed. The selection procedure generates a new variable named filter_$ with a
value 1 for selected cases and a value 0 for unselected cases. The actual selection is
based on the values of a newly created variable. SPSS uses this variable to work with the
selection. Only after deactivating the filter are you free to delete this variable.

Unselected cases are not
included in the analysis.
Please note the indicator

in the status bar.
Note You can also make selections based on conditions involving two or
more variables. The &-sign can be used for the AND-operator, the |-
sign as OR-operator and the ~-sign as NOT. Here are two examples.
(1) All male students:
Customer_type = 1 & Gender = 2
(because Customer_type has 1 = Student and Gender has 2 =
Male).
(2) All respondents in the restaurant buying a cheese or ham
sandwich, an other sandwich, a donut, croissant or baguette, or
dairy products (see question 6 ).
product1 = 1 | product2 = 1 | product3 = 1 | product4 = 1
(because product1 up to product4 have 1 = Yes en 0 = No).
With our active selection we can start the analysis.
1. To create a histogram, in the chart builder, select the option Simple Histogram.
2. Move the variable Expenditure into the X-Axis box and enter an appropriate title
and footnote. That is important because the graph contains no information about
the selection on which it is based. So, enter Student Expenditures as a title and a
footnote containing the current date with your name and class directly after the
copyright sign .

3. In order to get percentages in our histogram, select Bar1 on the dialog Element
Properties and change the statistic into Histogram Percent.
Click Apply to confirm your action.
4. Click OK and SPSS will create the next chart for you.
As said before, the chart needs to be customized before it can be published.
5. Customize your histogram to the same layout as Figure 4.4.

See Section 3.4.9 how to edit a histogram.

Figure 4.4 The student expenditures shown in a histogram
For the next analysis you need to deactivate the filter.
6. From the menus, choose: Data Select Cases and select All Cases.
After clicking OK you see that all cases are available again.
4.4.3. Analysing Subgroups with MEANS

The statistical procedure MEANS is meant to compare groups with respect to their
means (what is in a word?). The level of measurement of the dependent variable needs
to be ratio or interval (Scale). Since you enter the variable which defines the groups in
the dialog of this procedure, there is no need to split the data file (see Section 4.4.1) or
to make a selection (see Section 4.4.2). You can even do the analysis of the subgroups
for more variables simultaneously.
1. From the menus, choose: Analyze Compare Means Means.
2. Move the variable to be analyzed, Expenditure, to the Dependent List and the
variable which defines the groups, Customer_type, to the Independent List.
3. Click Options to select the statistics you want to be computed.

4. Add Median to the Cell Statistics list and continue.
5. Run this command.
The result is a table shown below. (Note: if you only get the row Student, you
probably have forgotten to deactivate the selection of the previous section, e.g. see
action point number 6 on page 79).
6. Customize the table into the layout shown in Figure 4.5.
The result of some editing is:
Figure 4.5 Table with statistics
4.5 Crosstabs
One other way to compare groups is to do a cross tabulation analysis (called CROSSTABS
in SPSS) with percentages. The percentages can add up across the rows or down the
columns. Usually the percentages are calculated for the levels or subgroups defined by
the independent variable. The important distinction to the previous discussed analysis
is that the variables must have been classified and that both are allowed to have only a
nominal level of measurement. Moreover it is not important (for the CROSSTABS
procedure) which variable is independent and which one is dependent. However you
must know this for your conclusion of course.
The cross tabulation is the basic technique for examining the relationship between two
categorical (nominal or ordinal) variables. The Crosstabs procedure offers tests of
independence and measures of association and agreement for nominal and ordinal
data. The purpose of a cross tabulation is to show the relationship (or lack thereof)
between two variables
We are going to examine the sample to see whether there are differences between male
and female respondents (variable Gender) with respect to the rating of the variety of
products in the basic assortment (Variety_basic). This is a part of research question
(9). Gender is the variable defining the groups or independent (demographical)
variable and the rating might be dependent (Variety_basic).
4.5.1. Strength and Direction of Association

The strength of a relationship, or association, in a sample can be expressed by the
statistic Cramrs V. This measure of association is based on chi-square and computed
by SPSS (see Section 4.5.4). The outcome is a number between 0 and 1 and the table
shows how to come to a conclusion.
The strength of a relationship between two variables in a crosstab (in a sample)

can be expressed by the measure Cramrs V.
V=0 no association
V 0,10 a weak association
V 0,25 a rather strong association
V 0,50 strong association
V 0,75 very strong association
V=1 maximal association
By percenting in the correct direction (either within columns or within rows) you can
formulate a conclusion about the direction of association. Generally if you are
analyzing a crosstab with a behaviour variable and a demographic variable it is
preferred to calculate percentages within each category of this demographic variable.
Summary Association of Two Variables

The analysis always takes three steps:
1. Construct a cross tabulation with the correct percentages. For a comparison in

a horizontal direction you need column percentages and vice versa.
Note: In a crosstab only one type of percentage is allowed.
2. Compute Cramrs V and explain its value (no association, a weak association,
strong association, )
3. Give a conclusion based on Cramrs V and the differences between the

percentages. If there is a very weak association, then there are hardly
differences between the percentages, and, on the other hand, a strong
association implies major differences between the percentages, indicating
differences between the groups. It is important to describe those differences.
4.5.2. The CROSSTABS Procedure

In SPSS we have the CROSSTABS procedure to construct cross tabulations or
contingency tables.
1. From the menus, choose: Analyze Descriptive Statistics Crosstabs.

2. Move the variables respectively to the Row(s) and Column(s) text boxes.
3. Start the CROSSTABS procedure by clicking OK and switch to the Output Viewer.
The first output block is a summary with the number of processed cases. This can be
used to check how many observations are used in the table, so that gives you a check
but it is not meant to be inserted into reports.
The default crosstab is shown below:
The cells of the table show the count or number of cases for each joint combination of
values. For example, 5 female rate the variety of products in the basic assortment as
insufficient. It is often difficult to analyze a cross tabulation simply by looking at the
simple counts in each cell. The next sections will discuss how to insert percentages in
the cells and how to calculate statistics. After we make the crosstab again with these
adjustments.
4.5.3. Cell Display

In the dialog CROSSTABS the button Cells raises a dialog in which you can select the cell
contents. (In the figure here below we left out the Noninteger Weights panel. This
panel will not be used in this course).
The dialog shows you the available options: Counts, Percentages and Residuals. This
last option is not important at this moment.
Counts Observed frequency: the count or number of cases (this option must
always be selected).
Expected frequency: the expected count, or theoretical frequency if
there is no relationship between the two variables (statistically
independent).
Percentages Row: the percentages add up across the rows (horizontally).

Column: the percentages add up down the column (vertically).
Total: percentages based on the total number of observations.
When you include percentages in a crosstab, you choose either Row or Column, but
not all three options. With the observed counts, which must always be selected, only
one other can be included, because with more than two entries in a cell the table
becomes too large and to hard to interpret.
Note You must make a choice about the cells contents. This choice is
limited to two entries at the utmost, because otherwise the table
becomes too large and too hard to interpret. Usually you choose
between the following options:
- observed counts and row or column percentages;
- observed and expected counts.
4.5.4. Calculating Statistics

A number of statistics are available to determine the relationship between two cross
tabulated variables. In the dialog CROSSTABS, the button Statistics raises a dialog with
the available statistics.

The well-know Chi-
square test
Cramr's V is known
from section 4.5.1.
We will use Cramrs V to analyze the strength of the relationship in our response
group. If the relationship is significant, use this statistic to indicate the strength of the
association in the sample.
In Section 6.4 we will discuss the Chi-square cross tabulation test and we will explain
how to formulate the corresponding hypotheses, interpret the level of significance and
come to a correct conclusion.
4.5.5. SPSS Output from CROSSTABS

Now we are going to make a cross tabulation with (also) percentages and Cramrs V.
1. From the menus, choose: Analyze Descriptive Statistics Crosstabs.
This button gives you the

available statistics, such as
Cramrs V.
This button gives a dialog to

define the cell contents.
Note Strictly speaking you are free to place a variable in the rows or
columns of a crosstab. Our advice is to place the independent
variable in the columns and to use column percentages.
2. Ask for percentages based on the subgroups of the demographic variable Gender
(in the columns). Use the button Cells and select the option Column. For details see
Section 4.5.3.
3. Click the button Statistics to calculate Cramrs V. For details see Section 4.5.4.
The result is:
4. Customize the table into the layout shown in Figure 4.6.
Right-click and choose

Show Dimension Label to
make the corner text visible
Figure 4.6 The crosstab with a customized layout
The output of the statistics:
This output block is never included in the report. You just write in the text that
Cramrs V was computed and is equal to 0,279 and the meaning of that value.
Of course, you save this output block in the SPSS output file and it can be put in an
appendix of your report, if needed.
4.5.6. Formulating Conclusions

1. We constructed a contingency table with column percentages in order to compare
men and women. This table can be found in the previous section.
2. The value of Cramrs V is 0,279 which means that we have a rather strong
relationship. So there are some differences between men and women.
3. Inspecting the differences between men and women, we see that women are more
satisfied with the variety of the products in the basic assortment. As much as 15%
of the male respondents think the variety is poor.
4.5.7. Pivoting Rows and Columns
The results from most statistical procedures are displayed in pivot tables. The
default tables produced may not display information as neatly or as clearly as you
would like. With pivot tables, you can transpose rows and columns (flip the table),
adjust the order of data in a table, and modify the table in many other ways. For
example, you can change a short, wide table into a long, thin one by transposing rows
and columns. Changing the layout of one table does not affect the results. Instead, its
a way to display your information in a different or more desirable manner.
1. Select the contingency table and double-click to enter the edit mode. The edit mode
is characterized by the notched edge around the table.
2. If the toolbar is not visible, from the menus choose: View Toolbar.
The button to invoke the
Pivoting Trays
3. Click the third button of the toolbar to open the Pivoting Trays window.
Pivoting trays provide a way to move data between columns, rows and layers. Click one
of the pivot icons to see what it represents. The shaded area in the table indicates what
will be moved when you move the pivot icon. A pop-up label also indicates what the
icon represents in the table.
The Column tray.
Drag the Statistics

pivot icon from the
Rows dimension to
the bottom of the
Column dimension.
4. Drag the Statistics pivot icon from the Rows dimension to the bottom of the
Column dimension
The table is immediately reconfigured to reflect your changes.
Figure 4.7 The cross tabulation with counts and percentages on the same row
5. Edit the table into the layout of Figure 4.7.

4.6 Creating and editing a clustered bar chart
A clear picture is obtained by a clustered bar chart if it displays the same
percentages as the cross tabulation.
1. Use the Graphs Chart Builder and select the option Clustered Bar.
2. Select the variable which defines the categories and the variable which defines the
clusters (or subgroups).
Here comes the variable

which defines the subgroups.
3. We want to display percentages within the categories of the legend variable Gender.
In the dialog Element Properties you can change the Statistic into Percentage()
and via the button Set Parameters you can set the Denominator for Computing
Percentage. As said before, we want the Legend Variable to be used.
4. Use the following text as a Title and Footer:

Title 1: Variety of basis assortment
Title 2: (Rated by females and males)
Footnote 1: current date, your name and class
5. Click Continue and OK in the main dialog.

6. Double-click on the chart to activate the Chart Editor.
7. Remember how to edit a chart: First click the element of the graph, make the
changes on the tabs in the dialog Properties and click Apply to confirm and see the
results.
You need to change the following elements of the graph:
- height and width of the graph;
- footnote moved to the left under in a smaller font;
- percentages inside the bars and a change of colour and pattern;
- customizing justifying the labels of the axiss;
- adding gridlines and changing their line style into dashed and grey.
Figure 4.8 show the result of the customizing.
Figure 4.8 A clustered bar chart to compare men and women.

Note It is important to check whether the clustered bar chart displays
the same percentages as the corresponding crosstabulation. In our
example we see that the bar chart in Figure 4.8 displays the same
percentages as the table in Figure 4.6 and Figure 4.7.
4.7 Creating and Editing a Band Diagram

Another graph which displays the percentages is a band diagram. This is actually a
stacked bar chart in which every bar adds up to 100%. By means of the CHART BUILDER
we can construct this chart.
1. From the menus, choose: Graphs Chart Builder and select the option Stacked
Bar.
2. Move the variables, Variety_basic and Gender, to the corresponding boxes.

On the tab Basic Elements, you can use the transpose to rotate your chart a quarter
of a turn.
3. Do not forget to enter the titles:

Title 1: Variety of basic assortment
Title 2: (Rated by females and males)
Footnote 1: current date, your name and class.
4. In the dialog Element Properties you can set the Statistic to Percentage. Please
note that the denominator for computing percentages now is the Total for Each X-
Axis Category.
5. Continue and click Apply and create the graph with a simple OK.
The result is a graph which is not ready for publication yet, but we will work on that.

6. Double click the graph to activate the Chart Editor and transform your lay-out into
Figure 4.9.
We will help you with some remarks to edit your chart.
7. To arrange the categories in the order Enough downwards to Poor, select the staves
and use the tab Categories on the Properties dialog, and change the Direction into
Descending. Confirm your choice with Apply.
Select the option Descending to get

the Categories in the reverse order.
Uncheck this option to hide the

axis title.
8. In order to hide the text Gender, being the axis title, select the X-axis and uncheck
the option Display axis title on the Labels & Ticks tab of the Properties dialog.
Remember, since we have rotated (transposed) the chart, the X-axis now is in the
vertical direction.
9. You can use the button Hide Legend to hide the legend.

10. Select the staves and show the labels by clicking the button Show data labels. First,
adjust the decimals to zero, and after that, add the value labels. Note that after
adding text to the labels the Number Format has disappeared.
To display the value labels of a

variable you need to put its icon in
the box Displayed. The order in this
box corresponds to the order in the
chart. Do not forget to confirm by
clicking Apply.
11. Select the horizontal axis (the Y-axis, since we transposed our graph) and (if
necessary) adjust the scaling to 100% as the maximum and the Major Increment to
10.
12. Add gridlines and use a dashed line in grey.
13. After closing the editor you can decrease the height of the picture in the viewer to
make it a little more sophisticated.
Figure 4.9 show the final result of our editing.
Figure 4.9 The band diagram, ready for publication
Note Please note that Figure 4.9 has the same percentages as the
crosstabulation of Figure 4.7.

4.8 Regression and Correlation
If both variables are measured at a ratio level (scale), then can we use a scatter plot to
analyse the relationship. A scatter diagram is used to graphically display bivariate
numerical data. The strength of a relationship, or the association, between two
variables is typically measured by the coefficient of correlation, whose values range
from -1 for a perfect negative correlation up to +1 for a perfect positive correlation (see
Berenson, Chapters 3 and 13). The coefficient of correlation measures the degree of
linear association between two variables. The line is called the regression line.
4.8.1. Making a Scatter Plot

In our Suxes Survey we expect a relationship between the number of visits to the
restaurant and the expenditure (both on a weekly basis). The first step of the analysis
is making a scatter plot to see whether this is true. In the graph the one variable
(Expenditure) is put on the vertical axis, the other (Visits) on the horizontal and the
observations are displayed as points of the scatter. The shape and direction of this
scatter will give us an idea about a possible relationship. Moreover, it is possible to
invoke a third variable (like Gender) in the scatter and mark the point differently, for
example the male a red square and the female a green triangle.
1. From the menus, choose: Graphs Chart Builder and in the category Scatter/Dot
select the option Simple Scatter.
2. Move the variables Expenditure into the Y-Axis box, Visits into the X-Axis and the
respondent number Respnum to the Point Id Label. This latter box becomes
available by checking on the tab Groups/Point ID the checkbox Point ID label.
(Right now we will not use the facility to mark subgroups differently.)
Use this option to mark

subgroups differently.
Figure 4.10 Preview of the scatterplot, with options to include
3. Add the text Relationship between Visits and Expenditure as a title to the plot and
do not forget to include your footnote.

In the plot you will see that some points coincide and the marks get a little bold. There
are more respondent numbers close to that mark as well. If you would leave out these
numbers you would only have 24 markers, although we have 50 cases in our data file.
However, we will hide the respondent numbers to get a clear and clean plot. The result
is displayed in Figure 4.11, after some edits.
4. Double click the graph to activate the editor and take care of the following things.
From the menus, choose Element Hide data labels or use the button on the
toolbar to hide the respondent numbers.
From the menus, choose Options Show Grid Lines, without selecting the
horizontal and the vertical axes to get a grid in both directions.
Adjust the line style of the grid to dashed.
Add a euro sign (Alt+0128) to the numbers on the vertical axis (after selecting the
Y-axis, on the tab Number Format, the box Leading Characters).
Figure 4.11 Scatter plot of Expenditure and Visits (customized)

An examination of the graph leads to the conclusion that people who pay more visits to
the restaurant have a higher level of expenditure. So it seems reasonable to analyse
this relationship.
4.8.2. Calculating the Regression Line

If you take Figure 4.11 into account and calculate the mean value of expenditure at 1, 2,
3 up to 5 visits a week, then you notice that those values are close to a rising line. Let
us call this line the central line for a moment. Due to several causes we cannot expect
all our observations to be perfectly on that line. However, on the basis of our data we
can calculate an estimation of the weekly expenditures in the restaurant on the basis of
the number of visits per week if we have a formula for that central line. The accuracy
of our estimation will depend on the spread of our observations with respect to that
central line. So, we are looking for a line which minimizes the distance between the
data points and that line. This optimal line can be used for calculating predictions. To
distinguish between the real data (Y) and our own predictions, we introduce the
symbol Y (Y hat) for the latter.
If there is a reason to assume that the relationship can be described by means of a

straight line, you can find the equation of it by means of a regression analysis. Whether
we have theoretical evidence, or not, we will show you how the calculations are done
by SPSS.
The simple linear regression equation used to estimate the linear model reads:
Y = 0 + 1 X
Where
0 = the constant or intercept (sample Y intercept)

1 = the slope of the line (regression coefficient of X)
In this equation Y is the dependent variable, in our example Expenditures, and X the
independent variable, thus Visits. Just note that we made this choice in the previous
section already by placing these variables on the Y-axis and the X-axis respectively.
1. From the menus, choose: Analyze Regression Linear.
2. Move the dependent variable (Expenditure) to the Dependent text box and the
independent variable (Visits) to the Independent(s) text box.

The constant or intercept
The slope or regression coefficient
This output block with the coefficients is important for the equation. The first line
contains the constant, the second line the slope. In our example the regression
equation reads:
Y = 1.005 + 2.236 X .
If we use the names of the variables, it reads:

^
Expenditure = 1.005 + 2.236 Visits
According to this equation, we can calculate (predict) the expenditure of a person who
pays two visits a week to the restaurant:
^
Expenditure = 1.005 + 2.236 2 = 5.48 euro
Because this outcome is not the real value but our estimation, we use a hat above the
name of the variable in the equation. As you can observe in Figure 4.11 the vertical line
at two visits a week contains observations above and below the amount of 5.48. The
prediction must be understood as an average spending by one who pays two visits a
week to the restaurant. The accuracy of this estimation will be discussed by means of
the coefficient of determination, our subject for the next section.
4.8.3. Calculating the Coefficients of Correlation and Determination

The output block Model Summary shows us the coefficient of correlation R and the
coefficient of determination R2 (R square). These coefficients can tell us something
about the quality of the regression line, in other words: the fit of the line to the scatter.
The coefficient of determination

can be interpreted as a
percentage
The value of the coefficient of correlation R is always in between1 and 1. If the

coefficient of correlation equals 0, then there is no linear relationship. There might be
no association or the relationship is not linear, but has an other functional form (for
example quadratic). Although a discussion of non-linear regression models is beyond
the scope of this book, you must be aware of it and always take a close inspection to
the scatter and decide whether a linear regression is reasonable.
If the coefficient of correlation is negative, there is a negative association (descending

line). The regression coefficient (slope) will be negative. If the coefficient of correlation
is positive, our regression will rise. In our example: more visits a week will lead to a
higher level of expenditures. So, remember that the regression coefficient and the
coefficient of correlation always have the same sign, either both are positive, or both
are negative or both are zero.
The closer the coefficient of correlation is to 1 or 1, the better the quality of the linear
relationship is, so the more accurate our prediction will be. In our example we have a
coefficient of correlation equal to 0.725 which indicates a positive association.

We are using the regression line to calculate predictions. In our example we want to
estimate the expenditure on the basis of the number of visits to the restaurant. Of
course, expenditures will depend on other factors as well, for example: are we dealing
with a student or a lecturer. We wonder how much of the variation in the expenditures
is related to the number of visits, and how much is left unexplained. The proportion
of variation of Y that is explained by the independent variable X in the regression
model, is known as the coefficient of determination. This coefficient is just the square
of the coefficient of correlation.
To conclude, the closer the points are to the regression line, the better the regression
model can be used for predictions. The measure for this quality concept is the
coefficient of correlation, which measures the strength of the relationship. The square
of the coefficient of correlation is the coefficient of determination, also known as the
percentage of variation explained by the model.
R2 is the percentage variation explained for
1 R2 is the percentage unexplained
The coefficient of determination R2 is always between 0 and 1. In our example R equals

0.725. So R2 equals 0.526. Therefore, 52,6% of the variation in expenditures can be
explained by the variability in the number of visits per week.
4.8.4. A Regression Model without an Intercept

In our example it is hard to give a meaningful interpretation to the intercept (the
constant in the regression model). If somebody has 0 visits a week to the restaurant we
would expect the expenditure to be 0 euro. Later we will discuss how to perform a
statistical test whether or not this constant has a significant contribution to the model.
But it is possible to do the regression analysis excluding the constant in the equation.
This can be found as an option in the Linear Regression dialog. The equation reads:
Y = bX
In our example we would find:
Y = 2.5 X
This is easier to understand: Each extra visit to the restaurant increases the
expenditure with 2.50 euro. Or, the average expenditure in the restaurant is 2.50 euro
per day.

Although we might be glad that the coefficient of determination R2 (R square)
increased to 0.886 we must be careful, because you cannot compare this outcome with
a model which includes the intercept (the previous section).
4.8.5. Drawing the Regression Line in the Scatter Plot

Take the scatter plot of Section 4.8.1, copy and paste this after your last entry in the
output viewer. We want to add the regression line in a copy of the plot.
1. Activate the chart by double clicking.
2. In the chart editor, from the menus, choose Elements Fit Line at Total. This
option is available in the context menu (right mouse button) also.
Figure 4.12 The regression line added to the scatter plot
3. You can adjust the line style and enlarge the font size of the determination
coefficient.

This chapter has answered research questions with respect to two variables. It is
important to distinguish between research questions about comparing groups and
questions about a relationship between variables. We will give an answer to the
following research questions.
Research Questions (Field Research)


spent?
Research Question 9
The customer satisfaction of the restaurant is measured by questions 3, 4 and 5 of the
questionnaire, the variables Variety_basic, Variety_luxe and Staff. Section 4.5
discussed how to make a cross tabulation. We are going to make three cross tabulation
with Gender and will also calculate Cramrs V.
Cramrs V
Variety basic assortment gender 0,279
Variety luxury assortment gender 0,252
Satisfaction customer service level gender 0,205
Note: this table was made in WORD on the basis of the three statistic outputs (symmetric measures) of SPSS.
The values of Cramrs V in the three cross tabulations show that there is a rather
strong association between customer satisfaction and gender in the sample. That
means that in our sample there are differences between male and female respondents.
On the whole we see that women are relatively more satisfied about the variety, but
that the male respondents are relatively more satisfied with the customer service level
of the staff. This can be illustrated with a clustered bar chart (see Section 4.6) or a
band diagram (see Section 4.7).

Research Question 10
This research question can be answered in the same way as research question 9. Again
we construct cross tabulations and compute Cramrs V for each table. We will not
display the cross tabulations here, but represent them by means of a clustered bar
chart (see Section 4.6) or a band diagram (see Section 4.7).
Cramrs V
Variety basic assortment customer_type 0,195
Variety luxury assortment customer_type 0,230
Satisfaction customer service level customer_type 0,167
The low values of Cramrs V indicate that the differences between students and
lecturers are rather small. The (small) differences in the sample can be seen in the
graphs. It is remarkable that 10% of the students rate the service level of the staff as
poor. However, you must be careful in formulating a conclusion. The sample is rather
small (only 40 students and 10 lecturers) to make conclusions which are statistically
valid. In Section 6.4 we shall discuss how to apply the chi-square cross tabulation test
and see that the differences are not significant.

When you compare students and lecturers with respect to their expenditures you must
realize that the dependent variable has a ratio level of measurement. The diagram of
Section 4.1 indicates that you can use a boxplot to compare the two groups, or you can
use the SPSS procedure MEANS. This leads to the following results:

(See Section 4.3 how to create this boxplot.)
From the graph and the table it becomes clear that the students expenditures are on
the average much lower. Students spend (on the average) 7 euro, whereas lecturers
spend almost 12 euro. It is striking that the spread (standard deviation) in the group
lecturers is much higher than that of the students.
There seems to hardly be any difference between the mean values of both groups,
although there are some students who are very negative and relatively more lecturers
giving a score of 9. That explains that the mean mark of the lecturers (7.2) is a little bit
higher than that of the students (6.6).

Both variables in this research question have a ratio level of measurement. So we can
make a scatter diagram to see whether a linear regression model makes sense.
In the scatter plot there is a relationship between Expenditure and Mark. Moreover,
the scatter seems to be around a straight line. So, we can try and see what a linear
model brings.
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -3,868 2,547 -1,518 ,136
Mark for the catering
1,755 ,369 ,582 4,750 ,000
service of Suxes
a. Dependent Variable: Expenditure in the restaurant on a weekly basis
Model Summary
Adjusted R Std. Error of

Model R R Square Square the Estimate
1 ,582a ,339 ,324 3,673
a. Predictors: (Constant), Mark for the catering service of Suxes
Of course, this output must not be included in your research memorandum itself, but
in an appendix. We conclude that the regression equation reads:
Y = 3.868 + 1.755 X .
With the names of the variables instead:

^
Expenditure = 3.868 + 1.755 Mark .
The relationship between the variables is not that strong. The coefficient of
determination is R2 = 0.339 meaning that 34% of the variation in expenditures can be
explained by the variation in the marks. (Note: please do not forget to use the word

variation in your explanation.) It is important that you mention that this equation is
valid only for marks between 3 and 9, otherwise you get your money back.
Note In this example the value of R2 is rather low. That implies that the
association between mark and expenditure is weak and therefore
the equation not that useful. In this reader we will not discuss
statistical tests for the linear model.
In the research questions we examined the differences between men and women and
the differences between students and lecturers. It is obvious that we want to see these
differences in the scatter plot. This can be done with markers for the groups.
We applied the use of a marker in the next figure by activating the option
Grouping/stacking variable (see Figure 4.10). In the scatter plot we will get different
markers for students and lecturers. We conclude that there are no large differences
between the two groups, but, the group of lecturers is actually too small to be able to
make statements.

5. Dealing with Multiple Response
In this chapter we will discuss how to analyse multiple response variables by means
of tables and graphs. The questionnaire of the Suxes Survey contained one question
where more answers are allowed:

(more answers are allowed)
Other kind of sandwiches
Dairy products
Coffee or tea
Fruits
Soup
Salads
Dinner
In Section 1.2.1 we introduced 9 dichotomous variables to process the answers to this

question. Now we want to summarize those separate variables into one table or graph.
We will use the procedure CUSTOM TABLES to tabulate, because the user interface with
its preview of the table is very useful. The graphs will be made by means of the CHART
BUILDER.
5.1 Defining the Multiple Response Set

We start to combine the nine separate variables into one set, called a Multiple
Response Set. This results in a kind of super variable containing the count of the
separate ones. If you save your data file, this set will also be saved within the data file.
So in future sessions the set is still available and that is very convenient.
1. If your output file of the previous chapter is still open, close this file by choosing
from the menus: File Close.
We want to save the output of this chapter in a new output file.
2. From the menus, choose: Analyze Tables Multiple Response Sets.

Counts the value 1 for all
variables in the set.
Figure 5.1 Define Multiple Response Set dialog box
As described in the code book (see Section 1.3), we have introduced the codes 1= Yes
and 0= No. Because we want to know how many respondents bought a certain
product, we must count the values 1 (Counted Value).
3. Fill in the dialog Multiple Response Sets as shown in Figure 5.1.
SPSSreports that a multiple response set was made. The output viewer shows you a
diagram.
This table is just for your information and there is no need to publish it. This table is
created for documenting the output file to remember which variables are included in
the set. So keep it in your output file, but change the entry into 5.1 Multiple Response
Set and collapse this item.

4. Save your output file as Suxes Survey Output Chap 5 and save your data file as
well. You must realize that this file has changed, because a new multiple response
set has been created and added to the data file.
5.2 Creating a Simple Table with CUSTOM TABLES

There are several ways to create tables. In this chapter we choose to use the procedure
CUSTOM TABLES. The procedures FREQUENCIES and CROSSTABS from the menus:
Analyze Multiple Response must be dissuaded because their output is not that
useful. Moreover, with respect to user interface and the layout of the table the
procedure CUSTOM TABLES excels all other procedures, although it might seem to be a
little more complicated than the cross tabs procedure we used in the previous chapter.
But the most important difference with the other procedures is the editing afterwards
to make the table suitable for publication. With CUSTOM TABLES you design the table
before actually creating it. So you will get tables, which are (almost) ready for
publication. That is very welcome to research companies doing surveys on a daily or
weekly basis, because they have not got time for all that editing work. They can even
paste the syntax of the CUSTOM TABLES procedure to make batch jobs to run an analysis
almost automatically.
1. From the menus, choose: Analyze Tables Custom Tables.
The first time you use the CUSTOM TABLES dialog, SPSS advises you to define value labels
for all categorical variables and to set the measurement levels correctly. This is
because CUSTOM TABLES uses this information to build a preview of the table. We
already discussed the importance of Value labels and measurement levels in Chapter 1.
If you do not want to see this

dialog again, click this option.
2. Click OK to continue.

The Multiple
Response Set
$Product
containing the
variables
Product1 up to
Product9
In the Variables list, you can find the super variable $Product as the last entry. If you
select this set, the Categories textbox displays the variables contained in the multiple
response set.
3. Drag and drop the super variable $Product into the Rows area.
Later, we will discuss a number of extra facilities, but at this very moment we just want
to see the basic table.
4. Click OK and the next table is created by SPSS.
This table shows us how many respondents checked a certain product, indicating that
they bought this recently. To make the table useful for a report, it needs to contain
extra information such as the total number of respondents buying products and not
only counts but also percentages.
5. Reopen the CUSTOM TABLES dialog.

Note Instead of choosing from the menu and searching for the entries to
open a recently used dialog, it is more convient to use the Dialog
Recall button. This button is on the toolbar, in the Data Editor it is
the fourth one from the left side, in the Output Viewer it is the sixth
one from the left. If you click this Dialog Recall button, a list with
the recently used procedures pops up. The top entry of this list is
the last used procedure.
Now we are going to add percentages and a total row.
6. If, in the CUSTOM TABLES dialog, the first column of the preview table is not selected,
select it with a mouse click. If it is selected, it gets a bright yellow background
and the button Summary Statistics is enabled.
Click this button and move from the Statistics panel the entry Column N %
into the Display list. Adjust the labels and change the decimals to 0.
Finish this dialog with a click on Apply to Selection to return to the main dialog.
We are not done yet. The next step is entering the total.
7. Click in the CUSTOM TABLES dialog on the other button: Categories and Totals. In
the lower part of this dialog you can click the option Total. You can adjust the
label if necessary. Click Apply to return to the main dialog again.

Click this option to insert a total.
Adjust the label if necessary
The last step is adding a title and a footer (Caption) to the table.
8. Select the second tab of the CUSTOM TABLES dialog called Titles.
Select the Titles tab.
A smart way to enter the current date.
9. Finally, run the procedure CUSTOM TABLES.

The table shows us, in the total end row, that the number of respondents buying one or
more products is 47, and that is the basis of the percentage calculations. So 47 makes
100%. Of course the percentages in the column do not add up to 100% because more
than one answer is allowed for. In our data file, there are 50 respondents, so three of
them did not buy any product. It is important to realise that those three are excluded
from this table. However, it is possible to include those three respondents in the table
by defining an extra variable which registrates No product bought.
10. Customize the table to the layout shown in Figure 5.2.
Use the option Show

Dimension Label in the context
menu (right mouse button).
Figure 5.2 Summary of products bought by visitors of the restaurant
5.3 Creating a Cross Tabulation with CUSTOM TABLES

Many research questions are about differences between subgroups with respect to the
products they buy. In our Suxes Survey, one of the research questions is whether there
are differences between men and women with respect to the products they buy. If we
extend our table and insert the variable Gender in the columns, it becomes easy to
make the comparison.
1. Recall the CUSTOM TABLES dialog.

2. Drag and drop the variable Gender into the columns area.
3. Right-click on the Variable label of Gender and uncheck the option Show Variable
Label to remove it from the design.
4. Insert a column with the row totals.
5. The other settings remain in effect, so it is done already.
Just a few clicks, and we have a perfect result (see Figure 5.3)!
Figure 5.3 An overview of the bought products, split up to men and women.
Note With this table, it is easy to come to a conclusion. It is clear that

there are almost no differences between men and women. The only
product with a difference is soup, more favourite to men (37%)
than women (10%).

5.4 Creating a Simple Bar Chart
We start to make a simple bar chart of the variables Product1 up to Product9. In order
to create the chart we need the Multiple Response Set we constructed in the first
section. The charts will be based upon this super variable.
1. From the menus, choose: Graphs Chart Builder and on the tab Gallery under the
category Bar select Simple Bar.
2. Drag the variable $Product (the last one from the list Variables) into the X-Axis
box.
3. In order to get the correct percentages, you need to specify the percentage base. In
the dialog Element Properties, from the Statistics drop down menu, select the
option Response Percentage. Confirm this choice with Apply.
4. Please enter Summary of bought products as a title and do not forget to include
the footnote in this dialog. Click OK to produce the graph.
5. Edit the graph (see Figure 5.4).

Figure 5.4 A bar chart displaying the number of products bought by the visitors
A short enumeration of the concerning operations:
Rotate the graph (remember, the names X and Y are not updated).
Change the order of the labels of the vertical axis (X) into descending.
Hide the label of the X-axis.
Customize title and footnote, font family, size and justification.
Customize the horizontal axis (Y): scale, add a % sign as trailing character, add
gridlines and change then to dashed, colour grey.
Change the label of the horizontal axis (Y) (if you think this is necessary).
Change the chart size (canvas) into height 13 cm and width 20 cm (or height 300
and width 480, in the points measurement system).
Note If we display the counts in the graph, the shape of the graph is the
same. Only, for comparing purposes, it is better to work with
percentages.

5.5 Sorting the Categories
SPSS provides an easy way to sort the categories in the graph. You can sort the staves by
their label, the values, the statistics or fully customized. A useful display is the Pareto
order.
1. Select the Categories tab and select Sort by Statistic in an Ascending direction.
The vertical axis starts on

the horizontal axis and
runs upwards.
Figure 5.5 A bar chart sorted by the number of products bought by the visitors
5.6 Creating a Clustered Bar Chart

In order to compare men and women, we create a clustered bar chart which displays
the percentages for males and females. Of course, this chart must display the same
percentages as Figure 5.3. The CHART BUILDER does not facilitate you using the
multiple response set as the percentage base and SPSS will come up with different
percentages. A work-around is using the separate (original) variables after changing
the level of measurement within the CHART BUILDER. By (temporarily) changing the
level of measurement into a ratio level, we can do some calculations with these
variables. So, constructing a clustered bar chart will be quite different.
1. From the menus, choose: Graphs Chart Builder and select Clustered Bar.
2. Move the variable Gender into the box Cluster on X: set color.
3. Select the variables Product1 up to Product9 and change the level of measurement
into Scale with the context menu (right mouse button).

After changing the measurement
level into Scale, SPSS is able to
calculate statistics.
4. Drag the nine product variables into the Y-axis box. SPSS will ask you to confirm
this operation by showing a popup which contains the message that the variables
will be summarized and that the name of each variable will be used as a category in
the chart. Click OK to confirm.

5. In the dialog Element Properties we have to change the statistic function into
Percentage Greater Than (?) and via the button Set Parameters you can enter 0.5
as a value for the question mark. Apply this procedure to the other variables as well
(sorry ) and finally confirm this whole operation by clicking the Apply button.
Yes, all of them,

one by one, sorry.
Dont forget to
Apply this at the
end.
Note Since the variables Product1 up to Product9 are coded with 1= Yes
and 0= No, the function Percentages greater that 0.5 gives as
exactly the percentage of people buying this product. If you have
coded the variables with 1= Yes and 2= No, you should use the
function Percentage greater than 1.5.
6. On the Basic Elements tab you can find the Transpose button to rotate your chart
by a quarter of a turn.

7. Please use the next text as title and footnote:
Title 1: Summary of bought products
Title 2: (Comparison of men and women)
Footnote 1: current date, your name and class
The last thing we need to adjust is the way SPSS will cope with missing values. If a
respondent has not filled out one of the product questions, but the others are correct,
we do not want to exclude this person from our graph. By default, SPSS does this the
severe way: the listwise deletion. However, we want to maximize the use of our date,
so we have to change this setting.
8. Use the Options button to get the Options dialog and in the pane Summary
Statistics and Case Values, check the option Exclude variable-by-variable to
maximize the use of data. Click OK and create the graph.
9. Edit the chart into the layout of Figure 5.6.

Figure 5.6 A clustered bar chart displaying the products bought by men and women.
Note The vertical axis starts in the origin which is down to the left in the
diagram. That is why (in our perspective) the products are
displayed in reverse order. On the Categories tab of the Properties
dialog you can change the sort. Unfortunately we cannot adjust the
sort within the legend, that is why we place the two beside each
other, down rightin the chart.

This chapter deals with multiple response questions. We discussed how to create
tables and graphs to answer research question (5).
(See Section 5.2 how to create this table.)
In our analysis (see Section 5.5 how to create the chart) we compared men and
women. There were hardly any differences between the groups (except for the soup).
Of course we can compare students and lecturers. That leads to the next table.

Figure 5.7 A summary of bought products, with respect to customer type
By comparing the percentages, you can analyse the differences between students and
lecturers. But with a graph it is easier.
Figure 5.8 A summary of bought products, with respect to customer type

6. Scaled Response Questions
6.1 Introduction to Scaled Response Questions

In most cases, we must actually measure concepts that we are studying as we conduct
marketing research. How we measure sales potential, demand, attitudes,
intensions, and so on is very important when it comes time to interpret our study.
The way the researcher decides how to measure a concept greatly impacts what he or
she can or cannot say about these concepts. For instance, brand loyalty can be defined
as the last brand acquired, or it can be defined as the persons most preferred brand. If
a competitor is giving away free samples, the first definition will give a false reading on
brand loyalty, whereas the second one will yield a truer measurement. A good under-
standing of measurement is basic knowledge among marketing researchers.
(from: Burns and Bush, Marketing Research).
Marketing researchers often wish to measure subjective properties of consumers, such

as attitudes, opinions, evaluations, perceptions, feelings and intentions. All these
constructs share the measurement difficulty that they are unobservable. So the
marketing researcher must develop some means of allowing respondents to express
the direction and the intensity of their impressions in both a convenient and
understandable manner. To do this, the marketing researcher uses scaled-response
questions, which are designed to measure unobservable constructs.
Since most of these psychological properties exist on a continuum ranging from one
extreme to another in the mind of the respondent, it is common practice to design
scaled-response questions in an assumed interval-scale format. Sometimes numbers
are used to indicate a single unit of distance between each position on the scale.
Usually, but not always, the scale ranges from an extreme negative through a neutral
to an extreme positive designation. The neutral point is not considered zero or an
origin; instead, it is considered a point along a continuum.
Extremely Neutral Extremely
Negative Positive
Strongly Somewhat Neither Agree Somewhat Strongly

Disagree Disagree nor Disagree Agree Agree
1 2 3 4 5
Extremely Very Somewhat No Opinion Somewhat Very Extremely

Dissatisfied Dissatisfied Dissatisfied 4 Satisfied Satisfied Satisfied
1 2 3 5 6 7
Extremely Very Somewhat No Opinion Somewhat Very Extremely

Unfavourable Unfavourable Unfavourable 4 Favourable Favourable Favourable
1 2 3 5 6 7
Figure 6.1 The intensity continuum underlying scaled-response question forms
Marketing researchers often fall back on standard types of scaled-response question

forms used by the industry. These scales include the modified Likert scale, the life-
style inventory, and the semantic differential.
The Modified Likert Scale

The modified Likert scale is a scaled-response form in which respondents are asked to
indicate their degree of agreement or disagreement on a symmetric agree-disagree
scale for each of a series of statements. The value of this scale is apparent because
respondents are asked how much they agree or disagree with the statement. That is,
the scale captures the intensity of their feelings.
The Likert-type response format, borrowed from a formal scale development approach
developed by Rensis Likert, has been extensively modified and adapted by marketing
researchers, so much, in fact, that its definition varies from researcher to researcher.

Some assume that any intensity scale using descriptors such as strongly,
somewhat, slightly, or the like is a Likert variation. Others use the term only for
questions with agreedisagree response options. We tend to agree with the second
opinion and prefer to refer to any scaled measurement other than an agree-disagree
dimension as a sensitivity or intensity scale.
The Lifestyle Inventory

Lifestyle questions measure a persons activities, interests, and opinions (AIOs) with a
Likert scale. These questions can be used to distinguish among types of purchasers
such as heavy versus light users of a product, store patrons versus nonpatrons, or
media vehicle users versus nonusers. They can assess the degree to which a person is
price-conscious, fashion-conscious, an opinion giver, a sport enthusiast, child
oriented, home centred, or financially optimistic. These attributes are measured by a
series of AIO statements, usually in the form presented in.
Neither
Strongly Agree Nor Strongly
Statement Disagree Disagree Disagree Agree Agree
I shop a lot for specials.
1 2 3 4 5
I usually have one or more outfits that
1 2 3 4 5
are of the very latest style.
My children are the most important thing
1 2 3 4 5
in my life.
I usually keep my house very neat and
1 2 3 4 5
clean.
I would rather spend a quiet evening at
1 2 3 4 5
home than go out and party.
I think I have more self-confidence than
1 2 3 4 5
most people.
Figure 6.2 Examples of Lifestyle Statements on a questionnaire
The technique was originated by advertising strategists who wanted to obtain

descriptions of groups of consumers as a means of establishing more effective
advertising. The underlying belief is that knowledge of consumers lifestyles, as
opposed to just demographics, offers direction for marketing decisions. Many
companies use psychographics as a market targeting tool.
Lifestyle inventories are valuable to marketers in a number of ways, not the least of
which is as a market segmentation basis and tool. To perform market segmentation, a
researcher must use a very large number of lifestyle statements, and a great many
respondents must be involved in the survey. Herein lies a dilemma, for potential
respondents, even panel members who are compensated for their participation in
surveys, dislike long questionnaires. See Burns and Bush Marketing Insight 10.3 that
describes a way to greatly reduce the size of the questionnaire but still achieve the goal
of a lifestyle market segmentation survey.
The Semantic Differential Scale

The semantic differential scale contains a series of bipolar adjectives for the various
properties of the object under study, and respondents indicate their impressions of
each property by indicating locations along its continuum. The focus of the semantic
differential is on the measurement of the meaning of an object, concept, or person.
Because many marketing stimuli have meaning, mental associations, or connotations,
this type of scale works very well when a marketing researcher is attempting to
determine brand, store or other images.
The construction of a semantic differential scale begins with the determination of a

concept or object to be rated. The researcher then selects bipolar pairs of words or
phrases to be used to describe the objects salient properties. Depending on the object,
some examples might be friendlyunfriendly, hotcold, convenient
inconvenient, high qualitylow quality. The opposites are positioned at the
endpoints of a continuum of intensity, and it is customary, although not mandatory, to
use seven separators between the pairs. The respondent then indicates his or her
evaluation of the performance of the object, say a brand, by checking the appropriate
line. The closer the respondent checks to an endpoint on a line, the more intense is his
or her evaluation of the object being measured.

Indicate your impression of the Suxes restaurant at Pandion by checking
the line corresponding to your opinion for each descriptors.
High prices _____ _____ _____ ____ _____ _____ _____ Low prices
Inconvenient location _____ _____ _____ ____ _____ _____ _____ Convenient location
For me _____ _____ _____ ____ _____ _____ _____ Not for me
Warm atmosphere _____ _____ _____ ____ _____ _____ _____ Cold atmosphere
Limited menu _____ _____ _____ ____ _____ _____ _____ Wide menu
Fast service _____ _____ _____ ____ _____ _____ _____ Slow service
Low quality food _____ _____ _____ ____ _____ _____ _____ High-quality food
A special place _____ _____ _____ ____ _____ _____ _____ An Everyday place
Figure 6.3 The semantic differential scale is useful when measuring store, company or
brand images
As you look at the phrases, you should note that they have been randomly flipped to
avoid having all the good ones on one side. This flipping procedure is used to avoid
the halo effect. We will explain this effect with an example. Suppose you have a very
positive image of Suxes at Pandion. If all of the positive items were on the right-hand
side, you might be tempted to just check all of the answers on the right-hand side.
However it is entirely possible that some specific aspect of the Suxes restaurant might
not be as good as the others. Perhaps the restaurant is not located in a very convenient
place, or the menu is not as broad as you would like. Randomly flipping favourable
and negative ends of the descriptors in a semantic differential scale minimizes the halo
effect.
One of the most appealing aspects of the semantic differential is the ability of the
researcher to compute averages and then to plot a profile of the brand or company
image. Each check line is assigned a number for coding. Usually, the numbers 1, 2, 3,
and so on, beginning from the left side, are customary. Then an average is computed
for each bipolar pair. The averages are plotted as you can see them, and the marketing
researcher has a very nice graphical communication vehicle with which to report the
findings to his or her client. We will discuss this in Section 6.7 (see Figure 6.10).
6.2 Case Move On

The rental housing organisation Move On wants to know how their customers
evaluate their house or apartment. When a customer is about to move, he or she will
be asked to fill out a short questionnaire.
Dear customer,
You have given us notice to leave. With this questionnaire we would like to get an insight in your
motivation, reasons for moving and get your evaluation of the house and its environment. Moreover we are
highly interested in your opinion about our services.
1. The reasons for moving

What is your most urgent reasons for moving?
(You can tick more answers.)
A. Personal reasons:
Considerations of health
Divorce or end of a relationship
Change in work or location of job
Change of the family
A marriage or get together
Buying a house
Moving to a retirement centre, service flat or sheltered accommodation
_______________________________________
B. Reasons related to the house:

The house is too large
The house is too small
The house is in bad shape
Too high a rent
I would like to have another type of house
_______________________________________
C. Reasons with respect to the environment:

Uncomfortable neighbourhood
Too few facilities
I want to live in another neighbourhood
Unsafe neighbourhood
The neighbourhood is too polluted
We have difficulties with our neighbours
Noise pollution
_______________________________________
2. Rating of the house

Does your house have any shortcomings or failings?
(More answers are allowed)
Sound insulation
Thermal insulation
Moisture problems
Problems with the roof, ceiling or floors
Problems with bathroom, shower or lavatory
No central heating
No separate kitchen
Storage room is too small or inconvenient
_______________________________________
3. Evaluation of the environment

Please rate the next subjects with respect to your former neighbourhood.
Circle your choice.
1=Good 2=Sufficient 3=Insufficient 4=Bad

Green place 1 2 3 4
Parking place 1 2 3 4
Road and traffic safety 1 2 3 4
Shops 1 2 3 4
Public transport 1 2 3 4
Social security 1 2 3 4
4. Evalation of our services

In order to improve our service, please rate the next aspects.
1=Good 2=Sufficient 3=Insufficient 4=Bad

Openings hours 1 2 3 4
Contacts with Residence Office 1 2 3 4
Technical Services 1 2 3 4
Rate of fixing technical problems 1 2 3 4
Establishment of leasing agreement 1 2 3 4
Support of doorkeeper 1 2 3 4
Contacts with our administration desk 1 2 3 4
Contactability by phone 1 2 3 4
General opinion about our services 1 2 3 4
5. Leasing History
The number of years I have leased this house or apartment
0 5 years
5 10 years
exceeding 10 years
Our co-ordinator will collect this form after his inspection visit.
Would you be so kind to fill it out before?
We appreciate your cooperation. Thank you!

In this chapter we will focus on the questions 3 and 4 as they are the scaled response
questions of this questionnaire. Each item is a variable so our code book lists a number
of variables.
Question Variable Variable label Value labels Measurement

number name
3 environment1 Green place 1 = Good ordinal
environment2 Parking place 2 = Sufficient ordinal
environment3 Road and traffic safety 3 = Insufficient ordinal
environment4 Shops 4 = Bad ordinal
environment5 Public transport ordinal
environment6 Social security ordinal
4 service1 Openings hours 1 = Good ordinal
service2 Contacts with Residence Office 2 = Sufficient ordinal
service3 Technical Services 3 = Insufficient ordinal
service4 Rate of fixing technical problems 4 = Bad ordinal
service5 Establishment of leasing agreement ordinal
service6 Support of doorkeeper ordinal
service7 Contacts with our administration desk ordinal
service8 Contactability by phone ordinal
service9 General opinion about our services ordinal
Table 6.4 The codebook for the questions 3 and 4
6.3 Using CUSTOM TABLES

In SPSS we can construct one table of frequencies for separate variables if those
variables have the same codes, i.e. the same set of value labels. Of course, for scaled
response questions this is evident. We will use the module CUSTOM TABLE as we did in
the previous chapter. Again, you will enjoy the table preview which has turned out to
be very helpful. It is clear that constructing separate tables for each item will not lead
to a concise report.
6.3.1. The Basic Lay-out

We will start to download the data file from the site mim.saxion.nl/docent/sms/spss.
1. Open your browser and download the data file MoveOn.sav.
2. Start SPSS and open the data file MoveOn.sav.
3. From the menus, choose Analyze Tables Custom Tables.

If you did not (re)start SPSS when you started this chapter, you can use the Reset
button in this dialog to remove the old settings.
4. Drag the variables Environment1 to Environment6 into the box Rows of the table
grid.

5. In our table, we want to have the value label Good, , Bad in the column
headings. To do this, select in the list Category Position the option Row labels in
Columns.
6. Moreover, you can suppress the display of the statistics labels in the column
headings by checking the option Hide.

7. On the second tab of this dialog, enter a title and caption for the table.
Click OK to create the table.
The result is a concise table with an almost perfect lay-out, as is displayed in Table 6.5.
Table 6.5 Table with the frequencies for each aspect of the environment
6.3.2. Adding Percentages

To improve the table we will add a column with the row based percentages.
1. From the menus choose Analyze Tables Custom Tables.

2. Select the six variables in the box Rows. Hint: Use the Ctrl-button to increase your
selection.
3. Click (in the pane Define bottom left) the button Summary Statistics.
4. Add Row N % to the Display grid, adjust the labels (In %) and change the
Decimals setting into 0. Finally, click Apply to Selection to finish this dialog.
5. On the pane Summary Statistics, from the Position list, we select the option
Rows. Please note that the percentages will be placed directly below the counts
which improves the readability.
6. Click (in the pane Define) the button Categories and Totals and ask for
showing totals in the table.

7. Apply this setting and click OK to create the table.
Figure 6.6 Table with frequencies and percentages for each aspect
Please note that this table is ready for publication. The only adjustment we made is the
line style, the mark-up of the caption and the total column.
6.4 Creating a Band Diagram in E XCEL

To display the data of the table in a banddiagram is done in EXCEL quite efficiently.
1. Copy the table from Section 6.3.1 (Table 6.5) into EXCEL.
2. Select the range A2:E8 and, from the menus, choose Insert -> (Graphs:) Bar.
From the category 2D-bar take the third option: 100% stacked bar.

3. Follow the steps of Appendix 3 to create the next chart.
Figure 6.7 Evaluation of the environment
Again it turns out that a picture is clearer than a table. The first three aspects, Green
place, Parking place, Road and traffic safety, are rated as Good or Sufficient by more
than 60% of the respondents. The other three aspects, Shops, Public transport, Social
security, have this positive rating by roundabout 80% of the respondents.
6.5 Constructing a Bar Chart

When we construct a bar chart which displays the mean values of the aspects, one can
easily inspect the differences between the aspects. Although this is disputable from a
theoretic point of view (since the measurement level of these variables is ordinal), it is
done in practice quite often. A drawback of this diagram is its counter intuitive lay-out:
small staves are displaying the better ratings, because the original coding was 1 = good
up to 4 = Bad. If you are going to create such bar charts, we prefer to have this the
other way around, i.e. large staves representing the better ratings. Fortunately, with
SPSS it is very easy to recode the variables in the reverse way and create a bar chart on
the basis of the new variables. We will discuss this in Section 6.6.
1. From the menus, choose Graphs Chart Builder.
2. On the tab Gallery, select Bar and double click the icon Simple Bar.
In the SPSS CHART BUILDER, it is impossible to display mean values of variables having
an ordinal level of measurement. So we have to temporarily change the measurement
level into Scale.

3. Select the variables Environment1 up to Environment6 and use the context menu
to change the measurement level into Scale.
Select the option Scale to

change the measurement
level.
4. Drag the variables Environment1 to Environment6 into the Y-axis box. SPSS will
announce that it will use the values to summarize the data and that it will use the
names of each variable as categories in the chart.
5. Since this is exactly what we are up to, click OK.
6. Use the fourth tab Titles/Footnotes to enter the title Rating of the environment.
Enter a footnote with the current date, your name and class. Do not forget to
apply your changes every time.
7. Finally click OK and SPSS will create the chart for you.

8. Improve the lay-out of the chart to get Figure 6.8.
Figure 6.8 Rating of the environment
A short list of the actions to take:
Rotate the chart a quarter of a turn (note that the references X en Y will be kept to
the original axes).
Reverse the labels of the vertical axes (X) on the tab Categories.
Change font and position of the title and footnote.
Adjust the scaling of the horizontal axis (Y) on the tab Scale: enter 1 as the
minimum, 4 as the maximum and a major increment of 1. Add gridlines also.
Adjust the caption of the horizontal axis (Y).
Finally, choose a nice colour and shading for the staves.

6.6 Recoding a Scaled-Response for a Better Chart
A huge drawback of the chart we have created in the previous section is that low staves
actually correspond to a better rating than the tall ones. If you present this graph to
your manager, he or she will definitely expect it the other way around and you need to
explain that your graph does not follow the intuitive rule that higher staves correspond
to better grades.
So, we are going to recode the variables as is pointed out in the next table. To be most
safe, we will create new variables in order to keep the originals.
TRANSFORMATION TABLE
Variable Environment1 Environment1pos

to to
Environment6 Environent6pos
Value 1 (= good) 4
Value 2 (= sufficient) 3
Value 3 (= insufficient) 2
Value 4 (= bad) 1
1. From the menus, choose Transform Recode into Different Variables.
2. Enter the new names and labels and confirm by clicking Change. Click the button
Old and New Values and enter the transformation table (see next dialog).

3. Continue and click OK to create the new variables.
The new variables need

to have value lables.
4. Enter in the SPSS Data Editor the value labels for one of the new variables and
copy those to the other five.
5. Finally, change the measurement level of these variables into ordinal.

Note Although we will use these new variables in a diagram in which we
will temporarely set the measurement level to scale, it is important
to have the correct measurement level in the data file. Since these
variabels have a ordinal level, that is the level we will define in our
data file.
6. Save your data file now.
7. Create a bar chart of the mean values of the six new variables in the same way as
the previous section.
8. Edit the chart in the same way (see the note at the end of this section). Moreover,
add four text boxes with the text Good, Sufficient, Insufficient and Bad.
Figure 6.9 Rating the environment (adjusted version)
This chart is ready to be published in a report or presentation. The reader will

intuitively understand that the first three aspects, although not negative, are rated a
little lower than the last three aspects. Since readers will follow their intuition by
nature, you should always present charts that meet this requirement, so transforming
a scaled-response scale is not a superfluous extra but a basic requirement in your data
processing activities.

Note Since this chart must have the same lay-out as the one you created
in Section 6.5, you might wonder whether you should do all this
lay-out work again. Perhaps you just want to apply the lay-out of
the old (6.5) chart to this new one? Well, spss facilitates that in a
simple way: chart templates.
1. Open the old chart with your lay-out with a double click.
2. From the menus, choose File Save Chart Template.
3. Select All settings and Continue.
4. Save your template file in your spss folder on your usb drive
(and remember where you saved it!!).
5. Close the chart editor.
6. Open the new chart and choose File Apply Chart Template
to get your lay-out.
Please note that you can specificy exactly which elements of the
chart you want to store in your template file. This is a very
convenient way to apply your standard style to the charts you will
create. You can even specify the template file in the CHART BUILDER
options to instruct SPSS to apply it automatically when it creates the
chart.
6.7 Semantic Differential Scale

The semantic differential scale is useful when measuring store, company, or brand
images. A semantic differential scale is used to translate a persons qualitative
judgements into quantitative estimates and contains a series of bipolar adjectives for
the various properties of the object under study. The averages for each bipolar scale
are plotted and displayed in a line diagram, which is a nice communication vehicle to
report the finding.
Nowadays this line diagram (remembered as the thunderbolt diagram) is also used for
the modified likert scale and with an abuse of the name referred to as semantic
differential itself. In this section we will produce a semantic differential for the rating
of the environment (question 3 of the questionnaire) to compare people who move
after 0 to 5, 5 to 10 or more than 10 years. To get a diagram in the same style as the
previous section we will use the variables Environment1pos to Environment6pos of
Section 6.6 again.
1. From the menus, choose Graph Chart Builder and on the tab Gallery the option
Line. Activate the icon Multiple Line.
2. Select the variables Environment1pos to Environment6pos and (temporarily)

change, just as we did in the previous two sections, the measurement level to
Scale. (Use the context menu within the CHART BUILDER).
3. Move the variables Environment1pos to Environment6pos into the box Y-axis.

SPSS shows you the dialog Create Summary Group which you can confirm by
clicking OK. Place the variable History in the box Set Color.
4. On the tab Basic Elements you can use the button Transpose to rotate your chart
by a quarter of a turn.
5. Activate on the tab Titles/Footnotes the options Title and Footnote1. Enter Rating
of the environment as a title and restate your name, class and current date in
Footnote1. Confirm (again and again) your actions with Apply.

The variables Environment1pos to
Environment6pos with a (temporarily
changed) Scale measurement level.
Use the Transpose button on the tab

Basic Elements to rotate the chart a
quarter of a turn.
The option Multiple Line.
6. Finally, click OK to create the chart.
Of course we need to style this chart a little to make it ready for publication.
7. Double click the chart to open the SPSS chart editor.

The following things need to be done:
Rotate the graph a quarter of a turn (only if you have not done that in the CHART
BUILDER).
Sort the categories on the vertical axis in a reverse (descending) order.
Show the gridlines and choose a nice grey dash line style.
Adjust the font size of the title, footnote and the captions of the axes.
Change the line style in order to create three lines which can be distinguished most
clearly.
Change the font size of the aspects on the vertical axis.
Insert four text boxes with the labels Good, Sufficient, Insufficient and Bad
along the horizontal axis.
Figure 6.10 Semantic differential for the environment
A conclusion from the semantic differential could be that people living for 5 to 10 years
in their house are most satisfied with the environment of their houses. People moving
within 5 years are not that satisfied with the shops in the neighbourhood of their
houses. People who have rented their place over 10 years are less satisfied with the
social security.
6.8 Assignment
The questionnaire from the rental housing organisation MoveOn contains another
scaled-response question, i.e. question 4. Use the techniques we discussed in this
chapter to report the results of this part of our survey in tables and graphs. Please
finish Figure 6.11 to Figure 6.15 and give a conclusion after each chart.

Figure 6.11 The rating of the services of MoveOn
Figure 6.12 Graphical display of the rating of the services of MoveOn

Figure 6.13 A bar chart displaying the mean values of the original variables
Figure 6.14 A bar chart displaying the mean values of the transformed variables

Figure 6.15 Semantic differential for the services of MoveOn
If we take a look at the rating of the services of the rental housing organisation
MoveOn it is clear that all aspects are rated positive. A comparison between people
moving within 5 years, between 5 and 10 years and after 10 years does not show us
large differences between these groups.

7. Chi-square tests
Case
Aquariade is a swimming pool in a medium sized city in the Netherlands. The
management team of Aquariade has decided that they need to pay more attention to
the opinions and wishes of Aquariade visitors. To show these opinions and wishes,
the management has decided to carry out a survey with the visitors of the swimming
pool.
Problem definition
In which way can Aquariade improve their market position by giving more attention
to the opinions and wishes of the customers?
On the basis of this problem definition they have developed research objectives for
the quantitative research. The first four objectives are descriptive and the others are
explorative. By means of a statistical test one has to prove whether the statement is
valid for the population.
Research objectives
(1) What is the opinion about the entrance fee, the overall hygiene, the visiting
hours, the kindness of the staff and the temperature of the pool water (in the
sample)?
(2) Is there a difference between the three age groups in the opinions about these
four aspects (in the sample)?
(3) Is there a difference between the customers who visit the sauna and the
customers who do not visit the sauna in the opinions about these four aspects
(in the sample)?
(4) Are there any facilities that Aquariade should add?
(5) Does the sample give a good view of the total customer population?
(6) Are there significant differences between the three age groups in the usage of
the sauna?
(7) Are there significant differences between the three age groups in their opinion
about the overall hygiene (a), the visiting hours (b), the kindness of the staff (c)
and the temperature of the pool water (d)?
(8) Is there a significant difference between men and women in the number of
visits to Aquariade?
(9) Is there a significant difference between the three age groups in the number of
(10) Does the total opinion about Aquariade relate significantly to the gender or the
age of the visitor?
(11) Does the opinion towards the entrance fee have a significant influence on the
total number of visits that the customers paid in the last two months to
Aquariade?
(12) Is there a significant difference between customers who visit the sauna and
customers who do not visit the sauna in their rating of Aquariade?
On the basis of the research questions a questionnaire is constructed.

1. How many times did you visit Aquariade in the last two months? times
2. What is your opinion of the entrance fee? (1) O Fair
(2) O High
(3) O Too high
3. Do you use the sauna in Aquariade? (1) O Yes
(2) O No
4. To which age group do you belong? (1) O < 25 years
(2) O 25-< 50 years
(3) O 50 years
5. Please indicate your gender. (1) O Male
(2) O Female
6. My mark (1 up to 10) for Aquariade is:
(fill in grade)
7. I would like to have the facilities extended with the following: Steam bath
(0=do not add; 1=do add) High toboggan
Flow acceleration
8. What is your opinion about the next aspects:
8.1 Overall hygiene (1) O Very good
(2) O Good
(3) O Neutral
(4) O Not so good
(5) O Bad
8.2 Visiting hours (1) O Very good
(2) O Good
(3) O Neutral
(4) O Not so good
(5) O Bad
8.3 Kindness of the staff (1) O Very good
(2) O Good
(3) O Neutral
(4) O Not so good
(5) O Bad
8.4 Temperature of the pool water (1) O Very good
(2) O Good
(3) O Neutral
(4) O Not so good
(5) O Bad
7.1 Chi-Square Goodness-of-Fit Test

The researchers will ask themselves whether their sample is a good representation of
the customer population of the swimming pool Aquariade. This is formulated in
research question 5. To be able to check this, the researchers asked the respondents
some personal questions (e.g. age, gender, number of visits).
According to the available information 35% of the customers are younger than 25,
20% are between 25 and 50 and 45% are 50 years old or older. It is also known that
60% of the customers are female and 40% are male.
With a chi-square goodness-of-fit we can check whether the distribution in our

database matches with what we theoretically would expect on grounds of earlier
mentioned available information about the Age and the Gender. We are going to
compare the distribution of our survey with the expected (theoretical) distribution.
We will start by making a frequency table for the variables age and gender.
1. Open the data file Aquariade.sav in SPSS (from mim.saxion.nl/docent/sms/SPSS).
2. Construct a frequency table of the variables age and gender.
Figure 7.1 Tables of frequencies of the variables gender and age

Inspection of the frequency tables makes clear that the percentages found in our
survey are not (exactly) equal to the theoretical percentages. The question is whether
the discrepancies can be explained by the sampling process. In that case we have no
reason to question the representativeness of the sample. Otherwise, if the deviations
are so large that we cannot relate this to probability causes, unfortunately we have to
conclude that our survey is not representative.
We will use the chi-square goodness-of-fit test to compare our sample with the known
population distribution. Berenson (Basic Business Statistics) discusses the theoretical
aspects of the test, like hypotheses, calculation of the test statistics and assumptions.
The formal hypotheses are
H0: The sample distribution (observed frequencies) corresponds to the theoretical

distribution (population).
H1: The two distributions do not correspond.
We will start to analyse the variable gender. After that, we will discuss age.
3. From the menus, choose: Analyze Nonparametric Tests One Sample.

Our objective is to do a customized analysis.
4. On the second tab, Fields, remove all variables, and leave Gender as the only one
to be tested.

5. Select the third tab Settings and choose the option Customize tests.
6. The Chi-Square test is the second one presented by SPSS as a customized test.
Click the Options button to enter the expected probabilities as relative frequencies.
For gender we expect a 40% - 60% distribution, so that makes the odds 4 to 6.

The expected values must be
entered as percentages in a
decimal representation.
Note Although you might expect to enter the expected values, SPSS asks
you to enter them as percentages, in a decimal format. SPSS will
calculate the expected values. However, you must enter the figures
corresponding to the codes in the codebook. Because we have
defined 1= Male and 2= Female we start to enter the percentage
of males in the population and after that the percentage of females.
The only problem here is that we cannot enter 40% and 60%, so we
make it 0.4 and 0.6.
7. Close the Chi Square Test Options by clicking OK and Run the test.
SPSS will produce the following output. On basis of the p-
value (here 16,5%)
you can perform
the test at once.
If you double click on this item, SPSS will open the Model Viewer. You will see a bar
chart displaying the observed and expected values.

With this line you can
check Cochrans rule.
Figure 7.2 SPSS output of the chi-square goodness-of-fit test (variable Gender)
There are differences between the observed frequencies Observed N and the
(theoretical) expected frequencies Expected N. To perform the test we will use the
value of the Asymp. Sig. which means asymptotic significance. This value (here 0.165)
is the right tail probability (p-value) in the chi-square distribution. Because this value
exceeds =5% we cannot reject our null hypothesis (H0). The consequence will be that
we have no reason to doubt the representativeness of our sample with respect to
Gender.
Decision rule for testing statistical hypotheses

If the p-value is less than the significance level , you reject the null hypothesis.
p-value (Asymp. Sig.) < reject H0
p-value (Asymp. Sig.) do not reject H0
8. Repeat this analysis for the variable Age. Use the theoretical percentages
mentioned in the beginning of this section.
The expected values must be

entered as percentages in a
decimal representation.
Figure 7.3 SPSS output of the chi-square goodness-of-fit test (variable Age)
With the p-value of 0,316 (Asymp.Sig.) we can perform the test at once: It exceeds the
significance level =5%, so we have no reason to reject the null hypothesis. We must

conclude that the age distribution in our sample does not deviate from the known
population distribution in the same way.
Note The check of the conditions (Cochrans rule) is straightforward.

Inspecting the SPSS output (the last line in Figure 7.2) makes clear
that all expected values are greater than 5.
7.2 Chi-Square Crosstab Test

We discussed in Chapter 4 how to analyse research questions with respect to two
variables. In Section 4.5 cross tabulations are dealt with. To start, we will focus on
research objective 6:
(6) Are there significant differences between the three age groups in the usage of
the sauna?
To answer our research objective we can make a cross tabulation. To test whether the
relationship is significant we use the chi-square crosstab test. This option can be found
in the dialog Statistics (see also Section 4.5.4).
1. From the menus, choose Analyze Descriptive Statistics Crosstabs.
Use this button to invoke

the chi-square crosstab
test.
This button leads to the

dialog to display the cell
contents
2. Move the variables Sauna and Age into the Row(s) and Column(s) textboxes
respectively.
3. Use the button Statistics to perform the chi-square crosstab test.

The Chi-square
crosstab test
Cramrs V is known
from section 4.5.1
4. Use the button Cells to display the expected frequencies in the cells (also).
Observed counts
(to be displayed always)
Expected values if the

variables are independent
The output shows us a cross tabulation with the observed counts and the expected
counts displayed in the cells.
After this table SPSS gives the results of the chi-square analysis and of the calculation of
Cramrs V.

The Chi-square The p-value
crosstab-statistic equals 2.8%
(value = 7.116)
Cramrs V
equals 0.20
Steps of the Chi-square Crosstab Test

1. Start to formulate the hypotheses:
H0: In the population, there exists no relationship between age and the usage of
the sauna.
H1: In the population there is a relationship between these two variables (or,
the variables are dependent).
2. Calculate the value of the chi-square statistic and check whether the value
mentioned as Asymp. Sig. (p-value) is greater or less than the significance level
you are using.
3. Formulate the conclusion whether the null hypothesis is to be rejected or not. In

our example the p-value equals 0.028, that is 2.8% which is less than =5%. In
our example we have to reject the null hypothesis.
4. Transfer this conclusion to the original research objective. In our example we

have statistical evidence that the age groups differ with respect to the usage of
the sauna.
5. With the value of Cramrs V you are able to characterise the magnitude of the
relationship (we refer to Section 4.5.1). Finally by percenting the crosstab you
can compare the elder people with the younger. You can for example show
whether the elder people are using the sauna more (or less) often than the
younger. This last conclusion applies only to the sample, strictly speaking.
Note If you conclude not to reject the null hypotheses, there is no need
to interpret the value of Cramrs V, because there is no evidence
for any relationship. It is important to say that the differences
between the column or row percentages are not significant, in this
case.
In our example we have a certain degree of dependence between Age and the Usage of
the sauna. We are eager to know to which age group the sauna is more favourable. By
comparing the expected frequencies in the cross tabulation you can discover where
discrepancies can be found. E.g. in the group 50 year we find an observed count of
34 where the expected count equals 25.7. Within the two other age groups it is the
other way around. We must conclude that in the sample the elder people 50 year
are using the sauna on a more frequent basis than the younger.
5. Recreate the cross tabulation of the variables Sauna and Age in the Row(s) and
Column(s) textboxes respectively. Ask for percentages to compare the age groups.
(The expected values are left out now, of course). Make the layout of your table the
same as in Figure 7.4.
Figure 7.4 Crosstab of Age and Sauna usage, with column percentages
The table of Figure 7.4 indeed shows a clear difference between the age groups: Within
the group 50 years 45% use the sauna, in the two other (younger) groups this
percentage is equal to 26%.
The next step is to display these results in a chart.
6. From the menus, choose Graphs Chart Builder and on the tab Gallery for the
option Bar. Double click the icon Simple Bar.
7. Move the variable Age into the X-axis box.
8. Change (temporary within the Chart Builder) the measurement level of the
variable Sauna with the right mouse button into Scale and move it into the Y-axis
box.
9. Choose Percentage less than (?) as statistic function.
Since the codes of the variable Sauna have been chosen as 1= Yes and 2= No and we
want to display the percentage Yes answers, we ask SPSS to calculate the percentage
of cases less than 1.5.
10. The button Set Parameters is used to enter the value 1.5 at the place of the
question mark.
11. Enter the Titles and Footnote:

Title1: Usage of the sauna
Title2: (comparison of age groups)
Footer1: Todays date, your name and class.
12. Finally, create the chart and customise it into the lay-out of Figure 7.5.

Figure 7.5 Usage of the sauna, three age groups compared
Formulate Your Conclusion

There is a difference between the three age groups with respect to the use of the sauna.
Within the eldest group (50 years and older) 45% are using the sauna. Within the two
other age groups, this percentage is much lower, it is 26%. We performed a statistical
test and must conclude that this difference is significant (chi-square= 7.116; df= 2;
p= 0.028).
Note It is most important that you present your research findings fast
and clear. For every analysis you have made, you must wonder
what the practical relevance is and what you want to say about it.
In the research report you include the crosstabulation, of course. In
the main text you only include the conclusion in plain text, without
all statistical details. The chi-square output can be included in an
appendix, if you want to. Of course you save all tables and graphs
in your spss output file to have them available quickly in case there
might be questions about the results.
7.3 Conditions for Chi-Square Crosstab Test

When performing a chi-square crosstab test it is highly important to check the
conditions (Cochrans rule).
Rule of Cochran
(1) All expected frequencies must exceed 1
(2) In at most 20% of the cells an expected frequency less than 5 is allowed.
In this section we will discuss an example which violates Cochrans rule. However, by
collapsing two rows in the cross tabulation we can satisfy Cochrans rule. In the next
section we will discuss when these elaborations are worthwhile in practice.
Research objective (7) deals with the opinion of customers about the visiting hours.
The management of Aquariade wants to know whether the customers are satisfied,
and whether there are differences between the age groups. We start by making a
frequency table of the opinion about the visiting hours. After that, we will make a cross
tabulation to see whether the differences between age groups are significant. In this
analysis the conditions of the chi-square test are not met. By collapsing classes we get
a smaller cross tabulation with the expected values large enough to do a valid analysis.
In this section we will show you how to collapse classes of a variable. The next section
will discuss when this is worthwhile in practise and when this method will have no
result.
1. Construct a frequency table of Aspect2, the opinion about the visiting hours.
Customise this table to make it suitable for publication.
Figure 7.6 Frequency table of visiting hours
From this table it is clear that 13% of the visitors are not satisfied with the visiting
hours. To know whether there is a relationship with the age of the visitors we construct
a cross tabulation and perform a chi-square crosstab test.
2. From the menus, choose Analyze Descriptive Statistics Crosstabs.

Move the variable Aspect2 into the Row(s) textbox and the variable Age into the
Column(s) textbox.
Display percentages to compare the age groups and calculate the chi-square
statistic as well.
Customise the table into the following layout. (See Section 4.5.7).
Figure 7.7 Cross tabulation of Age Opinion about visiting hours
In this table we discover differences between the age groups. For example, within the
age group 25 -< 50 a relatively large group (13%) thinks the visiting hours are bad.
That is substantially more than within the other age groups. But there are more
differences between the age groups. By means of a chi-square crosstab test we are able
to detect whether these differences are significant.

With the footer you can check
Cochrans rule at one glance.
With the numbers displayed in the footer of the chi-square output we can easily check
Cochrans rule.
In our case, the minimum expected value equals 1.82. So the first requirement is met.
Because in 5 out of 15 cells (that is 33%) the expected count is less than 5, the second
requirement is not met. So we need to adjust the table dimensions in order to perform
a valid chi-square analysis.
3. Construct the cross tabulation again, but display expected frequencies instead of
percentages.
Figure 7.8 Cross tabulation with observed and expected counts.
The table in Figure 7.8 makes clear that the problems can be found in the last two
rows, the categories not so good and bad. In these rows we have expected values
which are less than 5. A solution is to collapse these two categories into one category,
with the label negative.
4. From the menus, choose Transform Recode into Different Variables.

5. The dialog Old and New Values facilitates the definition of the transformation we
want to make.
-- The value 5 must be transformed into 4
-- and all other values remain the same (can be copied).
Selecting these options gives the

transformation ELSE Copy
6. Continue and click OK to do the transformation.

Perhaps you are surprised that there is no output right now. That is because you did
not ask to produce output, but you asked to make a new variable. Please note that this
new variable is added in the Data Editor as the last variable.
This new variable Aspect2Adjusted needs to get Value Labels and the right
measurement level.

You can copy the labels from
Aspect2 and adjust them.
Change the measurement

level into Ordinal
7. Define value labels before you continue. (See Section 2.1).

Hint: Copy the value labels from Aspect2 to the new Aspect2Adjusted variable and
adjust the labels for the values 4 and 5:
4= Negative and 5 must be deleted.
!!! And do not forget to set the measurement level!
8. Construct via the menus Analyze Descriptive Statistics Crosstabs a new cross
tabulation, but now with the variable Aspect2Adjusted instead of Aspect2, of
course. Calculate the chi-square p-value as well.
The footnote of the output block Chi-Square Tests shows us that no cells (0%) have an
expected count less than 5. The smallest value equals 5.01. Now we meet Cochrans
rule (easily).
The p-value (0,4%) is less than our value, so we must come to the conclusion that in
the population there is a relationship between age and the opinion about visiting
hours. More practically stated: A significant difference between the age groups with
respect to their opinion about the visiting hours exists.

9. Create a stacked bar (band diagram) to show these differences.
Make all staves lining up to 100% and take care that your diagram shows the same
percentages as the cross tabulation.
In the Chart Builder, on the tab Gallery, choose the category Bar and drag the icon
Stacked Bar into the previewer. The X-axis box should contain the variable which
defines the groups to be compared (Age) and the variable which is the subject of
comparison (Aspect2Adjusted)is put into the box Stack Set Color .
Please read Section 4.7 for the other instructions.
Figure 7.9 Band diagram
Formulate Your Conclusion

Although, in general, customers are satisfied with the visiting hours (almost 60% of
the ratings are positive), major differences exist between the age groups. Of the
younger customers (up to 25 years old) 75% are very satisfied, for the other two groups
this percentage equals roundabout 45%. Within the age group 25 to 50 we have 20%
negative ratings of the visiting hours. These differences turned out to be significant.
7.4 How to Use Cochrans Rule in Practice

Research objective (7) in full was:
(7) Are there significant differences between the three age groups in their opinion
about the overall hygiene (a), the visiting hours (b), the kindness of the staff (c)
and the temperature of the pool water (d)?
In the previous section we discussed the aspect of the visiting hours. In order to
perform a valid chi-square test we had to recode the variable Aspect2 and combine two
classes.
1. Create for the other three variables Apect1, Aspect3 en Aspect4 a cross tabulation
with the variable Age. Print the chi-square statistics also.
2. Check Cochrans rule. You will see that the cross tabulations with Aspect1 (overall
hygiene) and Aspect3 (Kindness of staff) do not satisfy Cochrans rule.
Chi-Square Tests Chi-Square Tests
Asymp. Sig. Asymp. Sig.

Value df (2-sided) Value df (2-sided)
Pearson Chi-Square 8,607 a 8 ,377 Pearson Chi-Square 1,195 a 8 ,997
Likelihood Ratio 7,991 8 ,434 Likelihood Ratio 1,188 8 ,997
Linear-by-Linear Linear-by-Linear
,412 1 ,521 ,067 1 ,796
Association Association
N of Valid Cases 169 N of Valid Cases 175
a. 7 cells (46,7%) have expected count less than 5. The a. 6 cells (40,0%) have expected count less than 5. The
minimum expected count is 1,21. minimum expected count is 1,47.

Figure 7.10 Left the crosstab test of Aspect1 and right of Aspect3 (with age).
In both tables the p-value is (very much) greater than =5% and you cannot reject the
null hypothesis. Or, stated differently, in both cases there is no statistical evidence for
a relationship between age and that aspect in the population. So, in the population,
there is no (significant) difference between the age groups and their rating of overall
hygiene and kindness of staff.
But, Cochrans rule is not met! But before collapsing classes, it is wise to review a
practical rule formulated by Bert Nijdam:
Practical application of Cochrans rule

Only if in a cross tabulation the p-value is less than the value of , you need to check
Cochrans rule before you can reject the null hypothesis.
(1) All expected values must exceed 1.
(2) In at most 20% of the cells an expected frequency less than 5 is allowed.
If those two requirements are not met, you must take action, like collapsing classes,
or excluding classes from the analysis.
In a cross tabulation with a p-value greater than , you do not reject the null
hypothesis. There is no statistical evidence for a relationship between the variable and
even after collapsing classes, there will be no statistical evidence. In this situation
Cochrans rule is irrelevant.
Both chi-square analyses in Figure 7.10 have a p-value greater than . Although
Cochrans rule is not met, we do not have to collapse classes, because we are not able
to reject the null hypothesis. Our conclusion (there is no relationship in the
population) remains valid.
7.5 Banding a Variable to Do a Chi-Square test

The previous section makes clear that the dimensions of a cross tabulation are limited
when perfoming a valid chi-square analysis. If you are dealing with a scale level
variable like the number of visits or the mark for Aquariade a cross tabulation of these
variables and Age is not suitable for the chi-square crosstab test. The cross tabulation
itself is also not suitable to publish, because it is far too large. The next chapter will
introduce a couple of techniques to be used for comparing groups with respect to a
scale test variable. These techniques however are restricted by rather severe
requirements. In situations where those requirements are not met you can use the chi-
square crosstab test after banding the test variable.
We will now focus on research objective (9): Is there a significant difference between
the three age groups in the number of visits to Aquariade? Another formulation might
be: Is there a relationship between age and the number of visits to the swimming pool
Aquariade?
1. Construct a cross tabulation with the variables Number of visits and Age.
You will see that this cross tabulation is far too large and that too many expected
frequencies are less than 5 (45 to be precise) and some are even less than 1 (because
the minimum expected count equals 0.25).

Chi-Square Tests
The crosstabulation is
Asymp. Sig.
completely useless for
Value df (2-sided)
publication and analysis.
Pearson Chi-Square 56,790 a 36 ,015
Likelihood Ratio 65,331 36 ,002
Linear-by-Linear Association ,381 1 ,537
N of Valid Cases 175
a. 45 cells (78,9%) have expected count less than 5. The

minimum expected count is ,25.
Figure 7.11 Output of a chi-square crosstab test which is completely useless.
A way to make a better cross tabulation is to band the Number of visits into a new
variable with only a few classes. The creation of a categorical variable from a scale
variable is discussed in Section 3.5.
2. Create a suitable classification for the variable Number of visits in four classes.
Take care that every class contains at least 15% of the observations. See Section
3.5.1 to find the border values.
Number of visits to Aquariade
Aantal In %
Of course, your classification can be
Sometimes 48 27%
different. If you have at least 15% in the
Regular 42 24%
smallest class, it is fine.
Often 42 24%
Very Often 43 25%
Totaal 175 100%
Figure 7.12
Note Most likely, you do not want to use these vague terms to
characterize the classes. But we want you to find the borders
yourself and come up with your own classification!
3. Now construct the cross tabulation.
Number of visits to Aquariade * Age group Crosstabulation
Number of Age group

visits to < 25 year 25 < 50 year >= 50 year Total
Aquariade Count In % Count In % Count In % Count In %
Sometimes 13 23% 21 49% 14 19% 48 27%
Regular 9 16% 10 23% 23 31% 42 24%
Often 19 33% 4 9% 19 25% 42 24%
Very Often 16 28% 8 19% 19 25% 43 25%
Total 57 100% 43 100% 75 100% 175 100%
Figure 7.13 Cross tabulation with column percentages
Chi-Square Tests
Asymp.
Sig.
Value df (2-sided)
Pearson Chi-Square 19,648 a 6 ,003
Likelihood Ratio 19,691 6 ,003
Linear-by-Linear Association ,062 1 ,804
a. 0 cells (,0%) have expected count less than 5. The minimum

expected count is 10,32.
Figure 7.14 Output of the chi-square crosstab test

Symmetric Measures
Approx.
Value Sig.
Nominal by Phi ,335 ,003
Nominal Cramer's V ,237 ,003
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null
hypothesis.
Figure 7.15 The value of Cramrs V
Performance of the Chi-Square Cross Tabulation Test

1. Formulate the hypotheses.
H0: In the population there is no relationship between age and the number of
visits to the swimming pool.
H1: In the population, there is a relationship between these variables.
2. The chi-statistic equals 19,648 with 6 degrees of freedom. The corresponding p-

value (Asymp. Sig.) equals 0,5% which is less than our value of (5%).
3. We have to reject the null hypothesis and conclude that there is a relationship
between Age and the Number of visits to Aquariade.
4. This means that there is statistical evidence indicating that the age groups are
different with respect to the number of visits to the swimming pool.
5. The value of Cramrs V in this crosstab equals 0,237. That means that in our
sample the relationship is rather strong. The differences between the age groups
are substantial in our survey. When we compare the percentages it becomes
clear that relatively younger people are visiting the swimming pool, but that
people in the age group 25-50 years are not visiting the swimming pool on a
frequent basis. This last conclusion is only valid for our survey, formally
speaking. A graphical representation of the percentages is shown in the band
diagram of Figure 7.16.
Figure 7.16 The band diagram displaying the differences between the age groups clearly

8. Te s t i n g f o r D i f f e r e n c e s b e t w e e n G r o u p s
In many studies, researchers are interested in knowing how subgroups within a

sample differ on certain important issues. For example, a researcher who reports
from a survey for the beverage industry may want to know if the preference for a
new beverage concept differs depending on whether the respondent is or is not a
current consumer of the companys brand. Another survey for the retail industry was
about differences between customers who shop online or shop in person at a retail
outlet. In both these cases, the researcher wanted to know if significant differences
exist between two groups of shoppers. Sometimes there will be more than two
groups. A researcher in the political arena may want to know if opinions about a
new tax plan for car drivers differ depending on whether the respondent is a
Democratic, Labour or Liberal supporter. In this chapter, you will learn how to use
SPSS to test for significant statistical differences between two groups or among more
than two groups.
8.1 Introduction
The case of Chapter Error! Reference source not found., Aquariade, a swimming
pool in a medium sized city in the Netherlands, also had a couple of research questions
about the differences between groups of visitors:
(8) Is there a significant difference between men and women in the number of visits
to Aquariade?
(9) Is there a significant difference between the three age groups in the number of
(10) Does the total opinion about Aquariade relate significantly to the gender or the
age of the visitor?
(11) Does the opinion towards the entrance fee have a significant influence on the
total number of visits that the customers paid in the last two months to
Aquariade?
(12) Is there a significant difference between customers who visit the sauna and
customers who do not visit the sauna in their rating of Aquariade?
All these questions relate to two variables. We already discussed in Chapter 4 how to
analyze these kind of questions. It is important to recognize which variable is the
independent one, that defines the groups, and which one is the test variable or
independent variable. This chapter will extend Chapter 4 which only described
potential differences in tables or charts. This chapter will answer the question whether
the differences are significant in the population by running statistical tests. That
means that for every difference we have in the sample, we will test the null hypothesis
(There are no differences in the population). If we have to reject the null hypothesis,
we can report that we have statistical evidence supporting the statement that in the
population there is a difference between males and females with respect to the number
of visits to the swimming pool on a monthly basis, if we relate it to the first research
question.
The test variable (dependent variable) can have any level of measurement as you might
have noticed from the research questions. This measurement level determines the
statistical test to be used in combination with the number of groups to be compared. It
is important to distinguish whether two groups or more than two groups are involved.
The next table summarizes this and indicates which test is to be used.

Measurement Number of groups
level of the test
variable Two More than two
t-test for two independent Oneway ANOVA

groups (Analyse of Variance )
( 8.2) ( 8.4)
Scale
(Interval or If the conditions of the t-test If the conditions of ANOVA are
Ratio) are not fulfilled: not fulfilled:
Mann-Whitney-test Kruskal-Wallis-test
( 8.3) ( 8.5)
Mann-Whitney-test Kruskal-Wallis-test
Ordinal
( 8.3) ( 8.5)
This chapter will discuss how to run the tests in SPSS. However, before testing whether
the differences with respect to a variable are significant, we strongly advise you to start
by creating a chart to get some insight into the potential differences. Each section will
begin by discussing a graphical display first.
8.2 Comparing Two Groups on a Scale Variable:

t-Test
1. Open the data file Aquariade.sav in SPSS.
Please read the case in Chapter Error! Reference source not found. before
you proceed.
We will start to analyze research question (8) Is there a significant difference between
men and women in the number of visits to Aquariade?. It is clear that this involves
two groups. The test variable (dependent variable) in the research question is the
variable Visits, representing the number of visits to the swimming pool in the last two
months. The independent variable (which defines the groups) is Gender.
In Section 4.2 we discussed how to create a bar chart displaying the mean values of
each group and in Section 4.3 how to compare groups by means of a boxplot.
2. Create a bar chart that displays the mean number of visits of males and females.
Customize your chart to the lay-out of Figure 8.1 (see Section 4.2 if you need
help).
Figure 8.1 Mean number of visits to the swimming pool

Figure 8.1 clearly shows that in the past two months men had 11.9 visits and women
had 11.3 visits. A small difference in our sample, but is there is difference in the
population?
8.2.1. Graphical Display of the Data

In order to answer these kind of question for the whole population we need to have a
chart which also displays the statistical uncertainty. That is a chart which displays
confidence intervals for the mean value of both groups. This chart is knows as an
Error Bar in SPSS. The calculation of confidence intervals can be found in any
statistical text (e.g. Berenson: Basic Business Statistics).
3. From the menus, choose Graphs Chart Builder. On the tab Gallery select the
category Bar and double click the icon Simple Error Bar.
4. Drag the variable Gender into the X-axis box, and the variable Visits into the Y-
axis box.
5. Use the tab Titles/Footnotes to enter the title 95% Confidence Interval for the
mean number of visits. Do not forget to include a footnote with your name and
class at this very moment.
The result is displayed in the next figure.
6. Customize the chart into the display of Figure 8.2.
Figure 8.2 Confidence intervals displayed as Error Bars
If we take a look at these confidence intervals we see that there is a great overlap. So
we expect that the difference between men and women with respect to the number of
visits will not be significant. Of course, we will run the test to support this argument.

8.2.2. Running the Statistical Test
In the introduction we have discussed that the t-test can compare two groups if the
dependent variable has an interval or ratio level of measurement. First we will run the
test, after that, we will check the conditions.
The statistical test involves a statement about the parameters of the two populations,
e.g. the mean number of visits of men (men) and the mean number of visits for women
(women). Let us start formulating the hypotheses.
H0: men = women
H1: men women
7. From the menus, choose Analyze Compare Means Independent-Samples T

Test. The test variable is Visits and the grouping variable is Gender.
8. Define the two groups (Group1: 1=Men en Group2: 2=Women).
The output is displayed in Figure 8.3.

If the standard deviations in
both groups are equal, use
the first line.
Figure 8.3 SPSS output of the t-test
The first block displays statistics for both groups. Our research has a sample with 79
men and 96 women. For the male group, the mean number of visits equals 11.95, for
the females it equals 11.33. The standard deviations in the two groups are almost
equal. The statistic Std. Error of the Mean is calculated by dividing the standard
deviation by n. If you multiply the standard error of the mean with the z-score 1.96
and respectively subtract from or add to the mean value you will get a 95% confidence
interval for the mean value of each group. It must be clear to you that these intervals
have a major overlap.
The second block displays the result of the t-test. Please note that the two lines refer to
two different situations. If the variances (or standard deviations) can be assumed to be
equal, the first row applies. If they are different, you must use the second row.
Applying Levenes Test

By applying the Levenes test for Equality of Variances we can decide
which line to use. The output gives us a significance 0f 0.982 which
exceeds what ever value of alpha you might want to use. So we have no
reason to reject the hypothesis that the variances are equal. In other
words: we will assume that the variances (standard deviations) in both
groups are equal.
Conclusion: We must use the first line of the output (equal variances assumed)
Applying the t-Test

The first line displays the value of the test statistic t = 0,750, the
degree of freedom df = 173 and the (2- tailed) significance level
(p-value) 45.5%. Since this p-value exceeds our alpha value (5%)
we fail to reject the null hypothesis.
Conclusion: In the population, there is no difference between men and women with
respect to the number of visits to the swimming pool.
A way to say this is: The difference between men and women with respect to the mean
number of visits is not statistically significant.
8.2.3. Checking the Conditions

When we apply the t-test as we described above, we assume that in both population
groups the distribution of the number of visits is a normal (bell shaped) distribution
with the same variance (or standard deviation). If the distributions in both groups
have the same variance, the t-test will be robust, that means that the test will be valid
even if the distribution deviates a little from the normal distribution, provided that the
sample size is large enough.

In the situation that the normal distributions in the two groups have a different
variance, we have to apply a modification of the t-test. The results of the modification
will be displayed on the second row of the SPSS output. So, SPSS will print both cases,
and it is up to the researcher to determine which row to use.
However, if we decide that the probability distributions in the two groups are
significantly different from the normal distribution, we have to use the method we will
discuss in the next section. This method will use rank numbers instead of the raw
scores.
In Section 8.5.2 we will discuss how to check whether the normal distribution fits the
sample distribution.
8.3 Comparing Two Groups on an Ordinal Variable: Mann-

Whitney Rank Test
When we compare two groups and the probability distributions are clearly different
from the normal (bell shaped) distribution, we cannot apply the t-test of the previous
section. Also when the measurement level is only ordinal, or when we deal with rather
small samples, we have to advise against the application of the t-test. In these
situations we prefer the application of statistical tests based upon rank numbers of the
observations. These tests only use the ordinal character of the test variable and
provide a well suited alternative.
Since the Mann-Whitney rank test only uses the ordinal character of the test
variable, we are actually comparing the medians of the two groups instead of the mean
values. There are examples of test variables (like income) that have large outliers, so
the median is a better measure of central tendency than the mean in those situations.
These large outliers will raise the suspicion that the normal distribution will not fit.
This also motivates to apply a non-parametric test, like the Mann-Whitney rank test.
It will be clear that if the ordinal test variable only has a limited number of categories
(like a 5-point scale ranging from very bad to very good) the ranking process is hard
because of the large number of ties. We advise to apply the Chi-square crosstab test
and calculate the percentages to compare the groups.
We will discuss the Mann-Whitney rank test by answering the research question of the
previous section: (8) Is there a significant difference between men and women in the
number of visits to Aquariade?. The application of the t-test as described in the
previous section is to be preferred because all conditions are met. We can apply the
Mann-Whitney rank test as well, although the power of this test is less than the power
of the t-test.

We will create a boxplot to compare both groups.
1. Create (and customize) a boxplot to compare the two groups. In Section 4.3 you
can read the instructions.
Figure 8.4 A boxplot to compare the two groups.

In this figure we see that the box of the males is a little to the right, so in our sample,
men visit the swimming pool a little more often than women. But does this statement
hold for the entire population?
8.3.2. Applying the Statistical Test

The Mann-Whitney rank test compares the distributions of the number of visits of the
two groups. The procedure is based upon the rank numbers assigned to the
observations (arranging this from low to high) and compares the rank numbers
between the two groups. So, the Mann-Whitney rank test will be used for testing
whether there is a difference between two medians. That can be summarized with the
following null and alternative hypotheses:
H0: Median men = Median women
H1: Median men Median women
The application of the Mann-Whitney rank test in SPSS is described in the next steps.
1. From the menus, choose Analyze Nonparametric Tests Independent

Samples.
Select this options to

compare the medians
2. In the dialog Nonparametric tests, on the first tab Objective, select the option
Compare medians across groups.
3. On the second tab Fields, select the option Use custom field assignments, move
the variable Visits into the Test Fields box and Gender into the Groups box.

Select this option to apply the
Mann-Whitney test
Unselect this option. We

will discuss this option in
section 8.5.
4. On the third Settings, select the option Mann-Whitney and unselect the Median
test, since the latter applies for more than two groups.
5. Click Run and SPSS will produce the results in a model. You will see the summary
in your output viewer.

The SPSS output is displayed in Figure 8.5. Note that SPSS tries to explain things most
clearly to you. It states the Null Hypothesis, it mentions the test and the p-value
(significance). Since the p-value exceeds the significance level (5%) the decision will
be: Retain the null hypothesis. So we have no statistical evidence that there is a
difference between the two groups, males and females, with respect to the number of
visits to the swimming pool.
Figure 8.5 SPSS output of the Mann-Whitney U test
6. Double click in this block of output to open the model viewer.
Figure 8.6 The model viewer of the Mann-Whitney U test
The output might be a little overwhelming. The chart tries to display two histograms to
compare the two groups. In our opinion the boxplot we made in the previous section is
a better way to compare the two groups. The bottom line of the model viewer allows
you to navigate to the results of other tests and other variables. Since we only specified
one test for these two variables the lists do not contain other entries.

Conclusion: Applying the Mann-Whitney U test (p = 0,332) leads to the same result:
Men and women visit the swimming pool equally frequently.
8.4 Comparing More than two Groups on a Scale

Variable: Analysis of Variance
If you compare more than two groups on a scale (ratio) variable you will use a
procedure called analysis of variance, often abbreviated to ANOVA. However,
despite its name, it is about comparing means, not variances. The theoretical
background will explain the name analysis of variance.
We will discuss the procedure by answering research question 9: Is there a significant

difference between the three age groups in the number of visits to Aquariade?. Our
sample has three age groups, young (up to the age of 25), middle (ages ranging from
25 to 50) and senior (ages exceeding 50). We will compare these three groups with
respect to the number of visits to the swimming pool during the last two months. Like
the two previous sections, we will start with a graphical display of the data and, after
that, we will apply the test.

We start to create an Error Bar in which a 95% Confidence Interval for mean of each
group will be constructed.
1. From the menus, choose Graphs Chart Builder. On the tab Gallery, choose the
option Bar and double click the icon Simple Error Bar. Use the instructions in
Section 8.2.1 to construct a 95% confidence interval for the mean number of Visits
for the three age groups.
2. Customize the graph to the lay-out of Figure 8.7.
Figure 8.7 Error Bar
This graph shows us a difference between the age groups. The middle group is totally
below the other two. People in the age of 25 to 50 do not come to the swimming pool
such a frequent basis. Again we must ask the question whether this holds for the
population as a whole. That is to be answered with the application of a statistical test.

The statistical test to compare the means of more than two groups is called analysis of
variance, also known as ANOVA. The hypotheses are:
H0: The mean values of the three groups are equal, e.g. 1=2=3
H1: Not all mean values are equal.
3. From the menus, choose Analyze Compare Means One-Way ANOVA.

SPSS will produce an ANOVA table as displayed.
From this table we will use the significance. This value equals 0.6% and is less than
our alpha value of 5%. We will reject our null hypothesis and we can state that we have
statistical evidence that there is a difference between the groups. The three mean
values of the number of visits to the swimming pool are not equal.
8.4.3. Post Hoc Comparison of the Groups

If we have statistical evidence supporting the claim that there are differences between
the groups, it makes sense to see which groups are different. We will compare the
mean values of the groups pairwise, only after finding a significant result in the ANOVA
table. Since this is done afterwards, it is called post hoc test.
4. Recall the dialog One-Way ANOVA.
5. Click on the button Post Hoc.
Equal variances

or not?
(See third condition of

anava)

The dialog One-Way ANOVA: Post Hoc Multiple Comparisons offers a lot of tests. An
important thing to know is, whether the variances within the subgroups are equal or
not. At this very moment, let us assume that they all are equal. That implies we can use
the tests in the upper pane and we will choose the test developed by John Tukey. The
next section will discuss how to check the assumption that the variances are equal.
In this multiple comparison the difference between the middle group and the other
two are significant, because both p-values (0.5% and 3.1%) are less than the alpha
value of 5%.
Conclusion: When we compare the age group 25 to 50 years with the other two
groups we can conclude that this group does not visit the swimming as frequently as
the other two groups. Since these differences are significant, our statement holds for
the population as a whole.
8.4.4. ANOVA Assumptions

When you apply an analysis of variance, it is important to check the conditions. If the
conditions are not met, you run the risk of coming to a conclusion that turns out to be
invalid. We will discuss the three conditions in this section. One of them, the fit of a
normal distribution to all subgroups, can be a problem in practice. The non parametric
alternative is known as the Kruskal-Wallis test. Just as the Mann-Whitney rank test
(see Section 8.3) for two groups, this test is also based on rank numbers.
1. Randomness and Independence

This first assumption is critically important. The validity of any experiment depends
on random sampling and /or the randomisation process. To avoid biases in the
outcomes, you need to select random samples from the subpopulations, e.g. work with
a stratified sampling process. This ensures that observations in a group are
independent of any other value in the sample. Related to the age groups in our case,
you must prevent that both (grand)parents and their (grand)children participate
because they always swim together. It is clear in such a situation, the results of the
ANOVA will not be valid. Also a departure from this assumption is the case where males
and females are compared and the sample contains a lot of married couples. It is
obvious that you do not have independent observations.
Another point of attentions is that the grouping variable is the independent variable
and the test variable the dependent one. To illustrate this: you cannot compare the
mean age of frequent swimmers with the age of anomalous swimmers to jump to the
conclusion that the former group is younger than the latter group. It is clear that age
can have its influence on the number of visits, but not the other way around, more
visits will not decrease the age. Although, it might be a good slogan for Aquariade:
Swimming keeps the age away.
2. Normality
The second assumption states that the sample values in each group are from a
normally distributed population. Just as in the case of the t test, the one-way ANOVA F
test is fairly robust against departures from the normal distribution. As long as the
distributions are not extremely different from a normal distribution, the level of
significance of the ANOVA F test is usually not greatly affected, particularly for large
samples. In Section 8.5.2 we will discuss how you can assess the normality of each
subgroup.
3. Homogeneity of Variance
The third assumption states that the population variances of the groups are equal (i.e.
12 = 22 = 32 ). If you have equal sample sizes in each group, inferences based on the
F distribution are not seriously affected by unequal variances. However, if you have
unequal sample sizes, then unequal variances can have a serious effect on inferences
developed from the ANOVA procedure. Thus, when possible, you should have equal
sample sizes in all groups.
A method to test whether all the variances of the populations are equal, is the Levene
test. We will test the null hypothesis:
H0: 12 = 22 = 32
against the alternative that not all variances are equal.
We will discuss Levenes test in SPSS right now (see also Section 8.2.2).
6. From the menus, choose Analyze Compare Means One-Way ANOVA to

recall the dialog One-Way ANOVA.
7. Click the Options button and check the option Homogeneity of variance test.
Check the assumption of equal

variances with Levenes test.
If you want to have the statistics for the subgroups you can check the option
Descriptives. A plot of the means is available also.
The output of the Homogeneity of variance test is rather straight forward.
If we compare the significance, 42.9% with our alpha value (5%) it is clear that we can
conclude that the variances are equal. Our choice in Section 8.4.3 to take the Equal
Variances Assumed side for the post hoc test turns out to be correct.
8.5 Comparing More than Two Groups on an Ordinal
Variable: Kruskal-Wallis Rank Test
If you want to compare more than two groups and think (or fear) that there is a
departure from the assumptions of ANOVA, or the level of measurement of the test
variable is ordinal, SPSS offers you non parametric test procedures. Since these
procedures use rank numbers (just as the Mann-Whitney test) there is no need for
assumptions about the probability distributions involved.
Again, we must warn you when you use these methods for a test variable having an
ordinal level of measurement with only has a few classes. A rather large sample size
will lead to many ties and we prefer to use the chi-square test for a crosstabulation (see
Section 7.2).
In this section we will discuss the Kruskal-Wallis test to answer the same research
question as we dealt with in the previous section: Is there a significant difference
between the three age groups in the number of visits to Aquariade?

We will create a boxplot to compare both groups.
1. Create (and customize) a boxplot to compare the three age groups. In Section 4.3
you can read the instructions.
Figure 8.8 A boxplot to compare the two groups.
In this figure, we see that the median of the middle group is far less than the medians
of the other two groups. So we might conclude that the number of visits of the 25 to 50
years group drops back. But does this statement hold for the entire population?

The Kruskal-Wallis test will compare the distributions of the number of visits between
the three age groups. The procedure is based on rank numbers assigned to the
observations in an arrangement from low to high. Since mean rank number for the
three groups will be compared, the procedure actually tests whether the medians of
the three populations are equal. In our case where we compare the three age groups,
the hypotheses we test are:
H0: Median group 1 = Median group 2 = Median group 3
And the alternative hypothesis is that not all three medians are equal.
2. From the menus, choose Analyze Nonparametric Tests Independent

Samples.
3. On the dialog Nonparametric Tests: Two or More Independent Samples check

the three tabs.
On the tab Objective: Compare medians across groups.
On the tab Fields: Test Field: Visits and Groups: Age.
On the tab Settings: Kruskal-Wallis 1-way ANOVA (k samples)

4. Run the test.
The SPSS output viewer shows you the results.
Figure 8.9 SPSS output of the Kruskal-Wallis test

Figure 8.10 The model viewer of the Kruskal-Wallis test
The Hypothesis Test Summary gives you the conclusion: since the p-value (0.9%) is
less than our alpha value (5%) we have to reject the null hypothesis, so we have
statistical evidence that the number of visits to the swimming pool are not equal for
the three age groups.
The post hoc comparison can be found in the model viewer as well.
Select Pairwise comparisons to see

the differences between the groups.
Figure 8.11 Changing the view in the model viewer
5. In the model viewer, change the view to Pairwise Comparisions

Figure 8.12 The model viewer showing Pairwise comparisons
The triangles displays significant difference with a yellow line. The table also displays
significant differences with a yellow background. This supports our conclusion that the
age group 25 -< 50 years spends significantly fewer visits to the swimming pool than
the other two age groups.
8.6 Assessing the Normality

The t-test and the ANOVA are both based on the assumption that a normal (bell shaped)
distribution fits to the distribution of the test variable. In this section we will discuss
how to asses this normality assumption. We start to split the data file in order to get
separate analyses for the groups. With that setting we will run the Kolmogorov-
Smirnov test.
1. From the menus, choose Data Split File.

Select the option Compare groups and move the variable Age into the box
labelled Groups Based On. Apply with OK.

2. From the menus, choose Analyze Nonparametric Tests One Sample.
The dialog One-Sample Nonparametric Tests contains three tabs, Objective, Fields
and Settings.
3. On the tab Objective select: Customize Analysis.

On the tab Fields move the variable Visits into the Test Fields list.
On the tab Settings select: Customize tests and check the third option Test
observed distribution against hypothesized (Kolmogorov-Smirnov test)
4. Run the tests.

Since we invoked SPLIT FILE, SPSS will produce separate calculations, for each age
group.
Age group = < 25 years
Age group = 25 < 50 years
Age group = >= 50 years

Figuur 8.13 SPSS output of the Kolomogorov-Smirnov test
In neither of the three groups does the normal distribution fit the sample distribution,
so we have a departure from that important assumption of ANOVA, meaning that the
analysis of Section 8.4.2 is invalid. So research question (9) should be answered by the
Kruskal-Wallis test (see Section 8.5.1).
5. Finally, do not forget to switch off the SPLIT FILE by selecting the option Analyze
all cases, do not create groups.
8.7 Lab (with Answers)

Research question (10) is about the total opinion about Aquariade expressed in a
mark. Answer the question whether this mark relates significantly to the gender or the
age of the visitor. You can use the methods we discussed in this chapter.
8.7.1. Elaboration of the t-Test
Note that in this situation we face the problem that the variances are not supposed to
be equal since the significance of Levenes Test equals 0.4%. So we have to use the
bottom row of the SPSS output.
Conclusion: If we compare the mean mark between men and women the difference
is not significant at an alpha level of 5% (p-value = 7.5%).

8.7.2. Elaboration of the Mann-Whitney Test
Conclusion: If we compare men and women with respect to the mark, we cannot see
any significance difference (p = 10.7%).
8.7.3. Elaboration of ANOVA

Conclusion: The error bar makes clear that the young group gives higher marks for
the swimming pool than the other two groups. Running the ANOVA test gives a
significant result (p = 0.2%). However, there is a serious departure from the
assumptions. Checking the homogeneity of the variances with Levenes test gives a
significant result (p=0.5%). We also assessed (to be complete) the normality of the
distribution of the mark within the groups with the Kolomogorov-Smirnov test. From
the SPSS output it will be clear that this assumption does not hold either. So we will run
the Kruskal-Wallis test for a valid analysis.
Age group = < 25 years
Age group = 25 < 50 years
Age group = >= 50 years
Figuur 8.14 SPSS output of the Kolomogorov-Smirnov test
8.7.4. Elaboration of the Kruskal-Wallis Test
Conclusion: The Kruskal-Wallis test also gives a significant result, so our conclusion
that the rating of the swimming pool is different among the age groups is valid.

9. Appendix
9.1 Adjustment of SPSS Settings

SPSS offers a number of options which can be altered to your wishes. In order to do so,
choose from the menus Edit Options.
Change the
measurement system
into centimetres.
Setting of the variable lists in

the dialogs.
We only mention a few settings explicitly.
On the tab General:
(1) In the variable lists of the dialogs we prefer to have the names of the variables,
so change this into Display names.
(2) The order of the variables, leave this to File.
(3) Change the measurement system into centimetres if you do not want to use
inches.

When you use SPSS with the standard settings, each command you process will be
mentioned in the output viewer as a log item. On the tab Viewer, you can suppress this
by unchecking the option Display commands in the log.
Unchecking this option saves you a lot of space

in the output viewer

When you create new variables in SPSS, the display format is set to two decimal places.
On the tab Data you can change this default setting. We prefer to use the default of
zero decimal places for our new variables.
Change this into 0 decimal places

After creating your own table look, you can save this look from within the output
viewer. To apply your table look to every new pivot table change the setting
TableLook on the Pivot Tables tab:

On the tab File Locations you can specify the folder where you want to store your data
files and output files. This will save you a lot of time searching for your files. You can
also specify to use the last folder used. Moreover you can see that SPSS stores all the
commands executed in a session journal, a text file at your drive. If you want to
execute all commands of a session this knowledge can save you a lot of time and effort.

9.2 SPSS Distribution of Saxion
From the Saxion site you can download a Saxion distribution of SPSS.
1. Open your browser and go to

http://notebook.saxion.nl/index.php//software/saxion-software/saxion
or visit the MIM site at Links for students.
2. Click on the link to SPSS 18 and use your Saxion credentials to log in.
3. If your login was successful, the download will start. It is a 249Mb zip-file so your
download might take a minute (or two).
This zip-file contains a virtual application. After unzipping you can see one or more
files, these should stay together. The applications can be started by a "double click" on
the executable file PASW-Statistics-18 (filename ends with .exe).
A subdirectory with the name Thinstall has been created also. Do not remove this
directory because application specific information will be stored here (it is a virtual
registry and may contain virtual system files). If you empty this directory all specific
configuration changes you made, will be gone.
You are allowed to use this application when you belong to Saxion (as a student or staff
member). If you have any questions: mail to notebook@saxion.nl
4. Unzip the distribution to any folder on your computer. It will take 450 Mb of disk
space.
5. Find the file PASW-Statistics-18, double click to launch and (be patient) after a
couple of minutes SPSS will start.
6. You might encounter a warning from the firewall. This will only happen the first
time if you allow SPSS to run on your computer.
Note (1) Create a shortcut to the file PASW-Statistics-18 on your desktop

to start SPSS in a convenient way.
(2) You cannot launch SPSS files to open them with SPSS. You must
start spss first, and after that, you can open files by using the spss
menu File > Open > Data, or File > Open > Output.
9.3 How to Create and Customise a Chart in EXCEL

1. Start Excel and check whether you have got an empty workbook ready to use.
2. Create a table or copy one from SPSS into EXCEL.
3. Select the range A2:E8 and, from the menus, choose Insert -> (Graphs:) Bar. Take
from the category 2D-bar the third option: 100% stacked bar.

This will lead to the chart displayed here below.
Social security
Public transport
Shops Good
Sufficient
Road and traffic safety Insufficient
Bad
Parking place
Green place
0% 20% 40% 60% 80% 100%
We want to update the lay-out to get the result of Figure 6.7. We will use a couple of
Excel menus which are available only if you have the graph selected, so take care of
that.
We will start by adding a title to the graph.
4. From the menus, choose Layout > (Labels:) > Chart Title.
This option will display a

title at the top of the chart
5. Choose the option Above chart to get a title at the top of the chart.
The next thing to fix is to arrange the categories on the vertical axis the natural order,
with the first at the top of the axis.
6. Form the menus, choose Layout > (Axes:) > Axes > Primary Vertical Axis > More
Primary Vertical Axis Options. Choose the settings as indicated in the next dialog.
This option will put our first

category at the top.
You can suppress the tick

marks by selecting none.
Select this option to place

the horizontal axis at the
bottom of the chart.
The last aspect we will adjust is gap between the staves. We want to have this reduced.
7. Click with the right mouse button on one of the staves and select the option
Format Data Series from the context menu.

Reduce the gap width
to 40%
8. Reduce the Gap With to 40%.

These steps will lead you to a customized layout as is displayed in Figure 9.1.
Evaluation of the environment
Green place
Parking place
Road and traffic safety Good

Sufficient
Shops Insufficient
Bad
Public transport
Social security
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Figure 9.1 The customized EXCEL chart

10. Bibliography
Books
Berenson, M.L. & D.M. Levine & T.C. Krehbiel (2008),
Basic Business Statistics, concepts and applications (11th edition)
New Jersey: Pearson Prentice Hall, ISBN 0135009367
Burns, A.C. & R.F. Bush (2009),

Marketing Research (6th edition),
Groebner, D.F. & P.W. Shannon & P.C. Fry & K.D. Smith (2005),
Business Statistics (6th edition)
New Jersey: Pearson Prentice Hall.
Kotler, P. and G. Armstrong,

Principles of Marketing,
McClave, J.T. & P.G. Benson & T. Sincich (2005),

Statistics for business and economics (9th edition)
New Jersey: Pearson Prentice Hall.
Saunders, M. & P. Lewis & A. Thornhill (2004),

Research Methods for Business Students (4th edition)
New Jersey: Pearson Prentice Hall, ISBN-13: 978-0-27370-148-4.
Smits, J. and R.G. Edens, Onderzoek met SPSS en Excel 2nd edition (2009),
Amsterdam: Pearson Education, ISBN 9043017272 (in Dutch).
Internet References
www.prenhall.com/burnsbush
http://wps.pearsoned.co.uk/ema_uk_he_saunders_resmethbus_4
www.spss.com
www.surfspot.nl
http://notebook.saxion.nl/index.php//software/saxion-software/saxion

11. Glossary
Analysis of variance (ANOVA) A method of analysis used when dealing with a

continuous or integral dependent variable and one or more categorical or nominal
variables
Arithmetic mean The sum of all observations divided by the number of observations.
Also known as the mean.
Branching See skip pattern
Categorical variable A variable for which numbers are simply identifiers and do not
have mathematical properties, such as order. For example, the sales territory in
which a companys customer lives (Central, North, South) is a categorical variable.
Also called a nominal variable.
Census An accounting of an entire population, as opposed to a survey of a sample of
that population
Chi-square A statistic often used in crosstabulations to test the hypothesis that the row
and column variables are independent; that is, whether the observed distribution is
likely due to chance
Closed-ended question A question for which response categories are provided
Coding scheme A method for assigning a code (usually in the form of a number) to
responses to a question. For example, if you are researching customers opinions
of a certain product feature, you might devise a coding scheme to identify a
positive opinion with a 1, a negative one with a 2, and a neutral one with a 3.
Coding schemes are also used to turn open-ended text responses into data that
can be analyzed.
Commitment card A card that asks respondents to commit to participating in a survey
Confidence band or interval confidence A specified range around a survey result for
which there is a high statistical probability that it includes the value that would be
calculated from the whole population (if that were possible). Such confidence
intervals are commonly calculated for confidence levels of 0.95 or 0.99.
Continuous variable A variable whose response options have an implied order and
distance and for which one unit represents the same quantity throughout the
scale. For example, age in years or weight in pounds or kilograms. Also called an
interval variable.
Crosstabulation A table that shows the relationship between two or more variables by
presenting all combinations of categories of variables
Error bar chart A chart that plots the confidence intervals, standard errors, or standard
deviations of individual variables
Exact tests Tests that calculate the probabilities exactly, rather than by using
estimates, to determine if there is a relationship between variables. Exact tests are
necessary when you have small datasets, small subgroups, or unbalanced
distributions.
Factor analysis An analytic technique that groups quantitative variables according to
their degree of correlation
Focus group A moderated group discussion about a particular topic. The discussion
typically lasts about two hours and is led by a moderator who follows a topic guide
but does not use a fixed questionnaire.
Frequencies A table showing what number or percentage of respondents gave each
answer to a question
Histogram A bar chart in which continuous variables are shown in groups
Imputation A methodical process for making an assumption about the value of missing
data. For example, if certain demographic information is missing from a
respondents questionnaire, a model can be built comparing information that is
available to data provided by other respondents. The model would then assign a
likely value to the missing data.
Interval variable See continuous variable
Level of measurement The way in which a question may be answered. There are four
levels of measurement: nominal, ordinal, interval, and ratio. (See separate entries
for descriptions.)
Mean See arithmetic mean
Median A measure of central tendency for continuous or ordinal data, defined for
ungrouped data as the middle value when data are arranged in order of magnitude
Missing data Incomplete or invalid data. Data can be missing for a number of reasons:
for example, questions left unanswered, marked incorrectly, or marked Dont

know. Missing data are usually excluded when calculating percentages. However,
sometimes missing values can be assigned using imputation.
Mode The value of a variable that occurs more frequently than any other value
Nominal variable See categorical variable
Nonparametric tests Statistical tests that require either no assumptions or very few
assumptions about a populations distribution
Non-response rate The proportion of sample population that did not respond to a
survey
Open-ended question A question for which no response list is provided. Respondents
are expected to supply a response in their own words.
Ordinal variable A variable whose response options have an implied order but no
implied distance. For example, a scale that ranges from strongly agree to
strongly disagree.
Pilot study The administration of a questionnaire under field conditions to a small
sample in order to time it and/or uncover problems. Also called a pretest.
Population The totality of things or people that you wish to study
Pre-notification card A card alerting prospective respondents that a survey will arrive
Pretest See pilot study
Purposive sampling A sampling procedure in which each element of the population is
purposely selected for some characteristic or characteristics of interest
Questionnaire A set of questions designed to generate data necessary to accomplish
the objectives of the research project
Random digit dialing The technique of dialing random numbers in working telephone
exchanges so that people with unlisted phone numbers are not excluded from a
sample population
Random sampling A sampling procedure that selects population elements based on
chance. This ensures that the sample accurately represents the population.
Ratio variables Variables that have order among points, equal distances between
adjacent points, and an absolute zero
Regression An estimation of the linear relationship between a dependent variable and
one or more independent variables
Response rate The proportion of a sample population that responded to a survey
Sample A subset of a population from which information is collected in order to obtain
information and draw conclusions about the total population
Scatterplot A graph of data points based on two continuous variables. One variable
defines the horizontal axis and the other variable defines the vertical axis.
Simple random sampling (SRS) A sampling procedure by which population members
are selected directly from the sampling frame. This results in there being an equal
probability of selection for all population members that appear in the frame.
Skewed A distribution whose frequency curve is not symmetrical about its mean, having
one tail longer than the other
Skip pattern A method of questionnaire design that enables respondents to skip
questions, based on their response to a previous question. Also called branching.
Strata (Plural of stratum) In sampling, groups defined by certain characteristics (See
stratified sampling.)
Stratified sampling A sampling procedure in which respondents are separated into
subgroups or according to characteristics of interest, and samples drawn from
each subgroup. Income level, race, and business title are examples of
characteristics that might be used to create a stratified sample.
Systematic sampling A random sampling method that is equivalent to a simple
random sample
Survey The process of collecting information about a topic or issue by means of
sampling and interviewing selected individuals
t test A hypothesis test that uses the t statistic to determine whether or not two means
are equal in the population
Weighting Assigning a numerical coefficient to an item to express its relative
importance in a frequency distribution
White space On a printed page, an area that contains no text or graphics


Marketing Research With SPSS

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Marketing Research With SPSS

Загружено:

Авторское право:

Доступные форматы

Marketing Research

An Edition of Koala Press Limited

2. The SPSS Data Editor 13

3. Research Questions with Respect to One Variable 25

4. Research Questions with Respect to Two Variables 69

Marketing Research with SPSS 18 vs 54.docx 16/03/2011

5. Dealing with Multiple Response 105

6. Scaled Response Questions 121

7. Chi-square tests 143

8. Testing for Differences between Groups 163

10. Bibliography 194

Marketing Research with SPSS 18 vs 54.docx 16/03/2011

1.1 The Marketing Research Process

1. Establish the need for marketing research

2. Define the problem

3. Establish the research objectives

4. Determine the research design

5. Identify information types and sources: desk research

6. Conduct the field research: qualitative and quantitative

7. Collect and analyze the data

8. Interpretation of the data, leading to conclusions and recommendations

9. Prepare and present the final research report

Defining the problem

(2) What is the average spending in the restaurant on a weekly basis?

(5) Which products does one buy in the restaurant?

(9) Is there a difference between men and women in appreciation?

(10) Is there a difference between students and lecturers in appreciation?

(12) Is there a difference between students and lecturers in overall satisfaction?

Marketing Research with SPSS 18 vs 54.docx 16/03/2011

2. What is your average expenditure in the restaurant on a weekly basis?

3. The variety of products in the basic assortment is

4. The variety of products in the luxury assortment

6. Which products do you buy in the restaurant?

There are only a few more questions for clarification purposes.

9. Your relationship with Pandion University is:

11. Please indicate your gender:

1.2 Measurement in Marketing Research

But what are we really measuring? We are measuring propertiessometimes called

1.2.1. Question-Response Formats

Closed-Ended Response Format Questions

Examples from the Suxes Survey at Pandion are:

11. Please indicate your gender:

The response options to question 5 can be coded in the following way:

Marketing Research with SPSS 18 vs 54.docx 16/03/2011

We can code question 11 as:

Open-Ended Questions with Numerical Response

Examples from the questionnaire:

2. What is your average expenditure in the restaurant on a weekly basis?

Open-Ended Questions with Text Response

An example from the questionnaire:

Multiple Response Questions

The example of the Suxes Survey questionnaire is:

Marketing Research with SPSS 18 vs 54.docx 16/03/2011

In fact, in this example the respondent is asked nine questions:

1.2.2. Scale Characteristics

Scale development is designing questions to measure the subjective properties of an

1.2.3. Levels of Measurement

Scale characteristics possessed

Marketing Research with SPSS 18 vs 54.docx 16/03/2011

In your opinion, would you say the prices at Wal-Mart are

What is your age?

Please rate each brand in terms of its overall performance

Please indicate your age.

Marketing Research with SPSS 18 vs 54.docx 16/03/2011