Вы находитесь на странице: 1из 10

Questionnaire

A questionnaire is a method for the elicitation, and recording, and collecting of information. The four
italicized words in these definition summaries the essence of what questionnaires are about. I can give a
50-minute lecture explaining this definition with examples and anecdotes, but the notes below summaries
the gist of it.

• Method: This means that a questionnaire is a tool to be used rather than an end in
itself or a work of modern art. Before you start even thinking of using a questionnaire, a
useful question to ask yourself is: 'what do I need to know and how best can I find this
out?' Some kinds of information are not very reliably gathered using questionnaires (e.g.
how often people do things, or self-reports about aspects of life where status is involved.)
And it is also very useful at the start to ask yourself 'how will I summaries the
information I am seeking to give me a true picture of what I want to know?'
• Elicitation: A questionnaire may bring out information from the respondent or it
may start the respondent thinking or even doing some work on their own in order to
supply the requested information. In any case, a questionnaire is a device that starts off a
process of discovery in the respondent's mind’s
• Recording: The answers the respondent makes are somehow recorded onto a
permanent medium which can be re-played and brought into analysis. Usually by writing,
but also possibly by recording voice or video.
• Collecting: People who use questionnaires are collectors. Given the amount of
effort involved in creating a questionnaire, if you only ever needed to use it for one
respondent, chances are you'd find some more efficient method of getting the
information. However, unless you intend to leave piles of questionnaire moldering in
your filing cabinet, you must also consider what you are going to do with the information
you have amassed. Questionnaires are made up of items to which the user supplies
answers or reactions.

Answering a questionnaire focuses the respondent's mind to a particular topic and almost by
definition, to a certain way of approaching the topic. We try hard to avoid bias when we
construct questionnaires; when a respondent has to react to very tightly focused questions (so-
called closed-ended questionnaires) bias is a real problem. When a respondent has to react to a
more loose set of questions (so-called open-ended), bias is still there, but it's most probably more
deeply hidden.

Different kinds of questions

There are three basic types of questions:

Factual-type questionnaires
Such questions ask about public, observable information that it would be tedious or inconvenient
to get any other way. For instance, number of years that a respondent has been working with
computers or what kind of education did the respondent gets. Or, how many times did the
computer break down in a two-hour session, or how quickly did a user complete a certain task. If
you are going to include such questions you should spend time and effort to ensure that the
information you are collecting is accurate, or at least to determine the amount of bias in the
answers you are getting.
Opinion-type questions
These ask the respondent what they think about something or someone. There's no right or
wrong answer, all we have to do is give the strength of our feeling: do we like it or not, or which
do we prefer? Will we vote for Mr. A or Mr. B? An opinion survey does not concern itself with
subtleties of thought in the respondent; it is concerned with finding out how popular someone or
something is. Opinion questions direct the thought of the respondent outwards, towards people or
artifacts in the world out there. Responses to opinion questions can be checked against actual
behavior of people, usually, in retrospect ('Wow! It turned out that those soft, flexible keyboards
were a lot less popular than we imagined they would be!')

Attitude questions
Attitude questions focus the respondent's attention to inside themselves, to their internal response
to events and situations in their lives. There are a lot of questionnaires consisting of attitude
questions about experiences with Information Technology, the Internet, and Multi-media and so
on. These tend to be of interest to the student of social science. Of more use to the HCI
practitioner are questionnaires that ask the respondent what their attitudes are to working with a
particular product the respondents have had some experience of. These are generally called
satisfaction questionnaires.

In our research, we have found that user's attitudes to working with a particular computer system
can be divided up into attitudes concerning:

• The user's feeling of being efficient


• The degree to which the user likes the system
• How helpful the user feels the system is
• To what extent the user feels in control of the interactions
• Does the user feel they can learn more about the system by using it?

We can't directly cross-check attitude results against behaviors in the way we can with factual
and opinion type questions. However, we can check whether attitude results are internally
consistent and this is an important consideration when developing attitude questionnaires.

The advantages of using questionnaires in usability research

• The biggest single advantage is that a usability questionnaire gives you feedback
from the point of view of the user. If the questionnaire is reliable, and you have used it
according to the instructions, then this feedback is a trustworthy sample of what you
(will) get from your whole user population.
• Another big advantage is that measures gained from a questionnaire are to a large
extent, independent of the system, users, or tasks to which the questionnaire was applied.
You could therefore compare
o the perceived usability of a word processor with an electronic mailing
system,
o the ease of use of a database as seen by a novice and an expert user,
o The ease with which you can do graphs and statistical computations on a
spreadsheet.
• Additional advantages are that questionnaires are usually quick and therefore cost
effective to administer and to score and that you can gather a lot of data using
questionnaires as surveys. And of course, questionnaire data can be used as a reliable
basis for comparison or for demonstrating that quantitative targets in usability have been
met.

Disadvantages

• The biggest single disadvantage is that a questionnaire tells you only the user's
reaction as the user perceives the situation. Thus some kinds of questions, for instance, to
do with time measurement or frequency of event occurrence, are not usually reliably
answered in questionnaires. On the whole it is useful to distinguish between subjective
measures (which is what questionnaires are good for) and performance measures (which
are publicly-observable facts and are more reliably gathered using direct event and time
recording techniques).
• There is an additional smaller disadvantage. A questionnaire is usually designed
to fit a number of different situations (because of the costs involved). Thus a
questionnaire cannot tell you in detail what is going right or wrong with the application
you are testing. But a well-designed questionnaire can get your near to the issues, and an
open-ended questionnaire can be designed to deliver specific information if properly
worded.
• Those who have worked with questionnaires for a long time in industry will also
be aware of the seductive power of the printed number. Getting hard, quantitative data
about user attitudes or opinions is good, but this is not the whole story. If the aim of the
investigation is to analyze the overall usability of a piece of software, then the subjective
data must be enhanced with performance, mental effort, and effectiveness data. In
addition, one should also ask, why? This means talking to the users and observing them.

How do questionnaires fit in with other HCI evaluation methods?

The ISO 9241 standard, part 12, defines usability in terms of effectiveness, efficiency, and
satisfaction. If you are going to do a usability laboratory type of study, then you will most
probably be recording user behavior on video or at least timing and counting events such as
errors. This is known as performance or efficiency analysis.

You will also most probably be assessing the quality of the outputs that the end user
generates with the aid of the system you are evaluating. Though harder to do, and more
subjective, this is known as effectiveness analysis.

But these two together don't add up to a complete picture of usability. You want to know
what the user feels about the way they interacted with the software. In many situations,
this may be the single most important item arising from an evaluation! Enter the user
satisfaction questionnaire.
It is important to remember that these three items (effectiveness, efficiency, and
satisfaction) don't always give the same answers: a system may be effective and efficient
to use, but users may hate it. Or the other way round.

Questionnaires of a factual variety are also used very frequently in evaluation work to
keep track of data about users such as their age, experience, and what their expectations
are about the system that will be evaluated.

Reliability

The reliability of a questionnaire is the ability of the questionnaire to give the same results when
filled out by like-minded people in similar circumstances. Reliability is usually expressed on a
numerical scale from zero (very unreliable) to one (extremely reliable.)

Validity

The validity of a questionnaire is the degree to which the questionnaire is actually measuring or
collecting data about what you think it should be measuring or collecting data about. Note that
not only do opinion surveys have validity issues; factual questionnaires may have very serious
validity issues if for instance, respondents interpret the questions in different ways.

What's wrong with putting a quick-and-dirty questionnaire together?

The problem with a quick-and-dirty questionnaire is that you usually have no notion of how
reliable or valid the questionnaire is. You may be lucky and have developed a very good
questionnaire you may be unlucky. However, until you put your questionnaire through the
intensive statistical and methodological procedure involved in creating a questionnaire, you just
won't know.

A poor questionnaire will be insensitive to differences between versions of software,


releases, etc. and will not show significant differences. You are then left in a quandary:
does the questionnaire fail to show differences because they do not actually exist, or is it
simply because your questionnaire is insensitive and unreliable? If your questionnaire
does show differences, is this because it is biased, or is it because one version is actually
better?

The crux of the matter is: you can't tell unless the questionnaire has been through the
standard development and test process.

Factual-type questionnaires are easy to do, though, aren't they?

A factual, or 'survey' questionnaire is one that asks for relatively straightforward information and
does not need personal interpretation to answer. Answers to factual questions can be proven right
or wrong. An opinion based questionnaire is one that asks the respondent what they think of
something. An answer to an opinion question cannot be proven right or wrong: it is simply the
opinion of the respondent and is inaccessible to independent verification.

Although it is important to check that the respondents understand the questions of both
kinds of questionnaires clearly, the burden of checking is much greater with opinion style
questionnaires because we cannot sanity check the answers against reality.

What's the difference between a questionnaire which gives you numbers and one that
gives you free text comments?

A closed-ended questionnaire is one that leaves no room for individual comments from the
respondent. The respondent replies to a set of questions in terms of pre-set responses for each
question. These responses can then be coded as numbers. An open-ended questionnaire requests
the respondent to reply to the questions in their own words, maybe even to suggest topics to
which replies may be given. The ultimate open-ended questionnaire is a 'critical incident' type of
questionnaire in which respondents explain several good or bad experiences, and the
circumstances which led up to them, and what happened after, all in their own words.

• Closed-ended questionnaires are good if you are going to be processing massive


quantities of data, or if your questionnaire is appropriately scaled to yield meaningful
numeric data. If you are using a closed-ended questionnaire, however, encourage the
respondents to leave their comments either in a special space provided on the page, or in
the margins. You'll be surprised what this gives you.
• Open ended questionnaires are good if you are in an exploratory phase of your
research or you are looking for some very specific comments or answers that can't be
summarized in a numeric code.

Can you mix factual and opinion questions closed and open ended questions?

It doesn't do to be too purist about this. It's a good idea to mix some open-ended questions in a
closed-ended opinion questionnaire and it's also not a bad thing to have some factual questions at
the start of an opinion questionnaire to find out who the respondents are, what they do, and so on.
Some of your factual questions may need to be open-ended, for instance if you are asking
respondents for the name of the hardware they are using.

This also means you can construct your own questionnaire booklets by putting together a
reliable opinion questionnaire, for instance, and then add some factual questions at the
front and maybe some open ended opinion questions at the end.

How do you analyze open-ended questionnaires?

The standard method is called 'content analysis' and is a subject all of its own. Content analysis
usually lets you boil down responses into categories, and then you can count the frequency of
occurrence of different categories of response.
What is a Likert-style questionnaire?

No indeed not. A Likert-style questionnaire is one in which you have been able to prove that each
item of the questionnaire has a similar psychological 'weight' in the respondent's mind, and that
each item is making a statement about the same construct. Likert scaling is quite tricky to get
right, but when you do have it right, you are able to sum the scores on the individual items to
yield a questionnaire score that you can interpret as differentiating between shades of opinion
from 'completely against' to 'completely for' the construct you are measuring.

It is possible to find questionnaires which seem to display Likert-style properties in


which many of the items are simply re-wordings of other items. Such questionnaires may
show some fantastic reliability data, but basically they're a cheat because you're just
adding in extra items that bulk up the statistics without telling you anything really new.

And of course there are plenty of questionnaires around which are masquerading as
Likert-style questionnaires but which have never had their items tested for any of the
required Likert properties. Summing item scores of such questionnaires is just nonsense.
Treat such questionnaires as checklists (see below) until you are able to do some
psychometric validation on them.

How can I tell if a question belongs to a Likert scale or not?

The essence of a Likert scale is that the scale items, like a shoal of tropical fish, are all of
approximately the same size, and are going in the same direction.

People who design Likert scales are concerned about developing a batch of items that all
have approximately the same level of importance (size) to the respondent, and are all
more or less talking about the same concept (direction), which concept the scale is trying
to measure. Designers use various statistical criteria to quantify these two ideas.

To start with, we have to get a bunch of people to fill out the first draft of the
questionnaire we are trying to design. We should ideally have about 100 respondents with
varied views on the topic we are trying to measure, and certainly, more respondents than
questions. We then compute various statistical summaries of this data.

Do the items all have the same level of importance to the respondent? To measure this we
look at the reliability coefficient of the questionnaire. If the reliability coefficient is low
(near to zero) this means that some of the items may be more important to the
respondents than others. If the reliability coefficient is high (near to one) then the items
are most probably all of the same psychological 'size.'

Are the items all more or less talking about the same concept? To measure this we look at
the statistical correlation between each item and the sum of the rest of the items. This is
sometimes called the item-whole correlation. Items which don't correlate well are clearly
not part of the scale (going in a different 'direction') and should be thrown out or
amended.
It's fascinating to use an interactive statistical package and to watch how reliabilities and
item-whole correlations change as you take items in and out of the questionnaire.

A very real risk a developer runs when constructing a scale is that they start to 'model the
data.' That is, they take items in and out and they compute their statistics, but their
conclusions are only applicable to the sample that evaluated the questionnaire. What the
developer must do next is to try the new questionnaire on a fresh sample, and re-compute
all the above statistics again. If the statistics hold on the fresh sample, then good. If not,
then it's back to the drawing board.

Warning: one sometimes sees some very good-looking statistics reported on the basis of
analysis of the original sample, without any check on a fresh sample. Take these with a
large pinch of salt. The statistics will most probably be a lot less impressive when re-
sampled.

In general, in answer to the question: is this a real Likert scale or not, the onus is on the
person who created the scale to tell you to what extent the above criteria have been met.
If you are not getting this level of re-assurance from the scale designer, then it really is a
fishy business. A scale item which may work very nicely in one questionnaire may be
totally out of place in another.

How many response options should there be in a numeric questionnaire?

There are two sets of issues here. One is, should we have an odd or even number of response
options. The general answer to give here is that, if there is a possibility of having a 'neutral'
response to a set of questions, and then you should have an odd number of questions with the
central point being the neutral place. On the other hand, if it is a question of whether something is
good/bad, male/female (bi-polar) then basically, you are looking at two response options. You
may wish to assess the strength of the polarity; you are actually asking two questions in one:
firstly, is to good or bad, and secondly, is it really very good or very bad. This leads you to an
even number of response options.

Some people use even numbers of response options to 'force' the respondents to go one
way or another. What happens in practice is that respondents end up giving random
responses between the two middle items.

The other set of issues is how wide the response options are should. A scale of 1 to 3, 1 to
5, or even 1 to 12? The usual answer is that it depends on how accurately can the
majority of respondents distinguish between flavours of meaning in the questions? If you
suspect that the majority of respondents are going to be fairly uninformed about the topic,
then stick with a small number of response options. If you are going to be dealing with
experts, then you can use a much larger set of response options.

A sure way of telling if you are using too many response options is to listen to the
respondents talking after they have done the questionnaire. When people have to
differentiate between fine shades of meaning that may be beyond their ability, they will
complain that the questionnaire was 'long' and 'hard.'

How many anchors should a questionnaire have?

The little verbal comments above the numbers ('strongly agree', etc.) are what we call anchors. In
survey work, where the questions are factual, it is considered a good idea to have anchors above
all the response options, and this will give you accurate results. In opinion or attitude work, you
are asking a respondent to express their position on a scale of feeling from strong agreement to
strong disagreement, for instance. Although it would be helpful to indicate the central (neutral)
point if it is meaningful to do so, having numerous anchors may not be so important. Indeed,
some questionnaires on attitudes have been proposed with a continuous line and two end anchors
for each statement. The respondent has to place a mark on the line indicating the amount of
agreement or disagreement they wish to express. Such methods are still relatively new.

A related question is should I include a 'no answer' option for each item. This depends on
what kind of questionnaire you are developing. A factual style questionnaire should most
probably not have a 'no answer' option unless issues of privacy are involved. If in an
opinion questionnaire, many of your respondents complain about items 'not being
applicable' to the situation, you should consider carefully whether these items should be
changed or re-worded.

In general, I tend to distrust 'not applicable' boxes in questionnaires. If the item is really
not applicable, it shouldn't be there in the first place. If it is applicable, then you are
simply cutting down on the amount of data you are going to get. But this is a personal
opinion.

My respondents are continually complaining about my questionnaire items. What can


I do?

People always complain. It's a fact of life. And everybody thinks of themselves as a
'questionnaire expert.' If you get the odd grumble from your respondents, this usually means that
the person doing the grumble has something extra they want to tell you, beyond the questionnaire.
So listen to them.

If you get a lot of grumbles, this may mean that you have badly miscalculated and it's
time to go back to the drawing board. When you listen to people complaining about a
questionnaire, listen carefully: are they unhappy about what the questionnaire is
attempting to measure, or are they unhappy about the wordings of some of your items?

What other kinds of questionnaires are there?

You mean, what other kinds of techniques can you employ to construct a questionnaire? There are two
main other varieties:
1. Semantic differential type questionnaires in which the user is asked to say where
their opinion lies between two anchor points which have been shown to represent some
kind of polar opposition in the respondent's mind
2. Guttmann scaling type questionnaires which are a collection of statements which
gradually get more extreme, and you calculate at what statement the respondent begins to
answer negatively rather than positively.

Of the two, semantic differential scales are more frequently encountered in practice, although
they are not used as much as Likert scales and professionals seem to have relegated Thurstone
and Guttmann scaling techniques into the research area.

Should favorable responses always be be checked on the left (or right) hand side of
the scale?

Usually no. The reason for not constructing a questionnaire in this manner is because response bias can
come into play. A respondent can simply check off all the 'agrees' without having to consider each
statement carefully, so you have no guarantee that they've actually responded to your statements -- they
could be working on 'auto-pilot'. Of course, such questionnaires will also produce fairly impressive
statistical reliabilities, but again, that could be a cheat.

Is a long questionnaire better than a short one? How short can a questionnaire be?

You have to ensure that you have enough statements which cover the most common shades of opinion
about the construct being rated. But this has to be balanced against the need for conciseness: you can
produce a long questionnaire that has fantastic reliabilities and validities when tested under controlled
conditions with well-motivated respondents, but ordinary respondents may just switch off and respond at
random after a while. In general, because of statistical artifacts, long questionnaires will tend to produce
good reliabilities with well-motivated respondents, and shorter questionnaires will produce less
impressive reliabilities but short questionnaires may be a better test of overall opinion in practice.

A questionnaire should not be judged by its statistical reliability alone. Because of the nature of
statistics, especially the so-called law of large numbers, we will find that what was only a trend
with a small sample becomes statistically significant with a large sample. Statistical 'significance'
is a technical term with a precise mathematical meaning. Significance in the everyday sense of
the word is a much broader concept.