Вы находитесь на странице: 1из 10

Volume: 14 13 12 11 10 9 8 7 6 5 4 3 2 1

A peer-reviewed electronic journal. ISSN 1531-7714


Go
Search:
Copyright 1995, PAREonline.net.

Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in
its entirety and the journal is credited. Please notify the editor if an article is to be used in a
newsletter.
Kehoe, Jerard (1995). Writing multiple-choice test items. Practical Assessment, Research &
Evaluation, 4(9). Retrieved January 4, 2010 from http://PAREonline.net/getvn.asp?v=4&n=9 .
This paper has been viewed 76,393 times since 11/13/1999.

Writing Multiple-Choice Test Items


Jerard Kehoe
Virginia Polytechnic Institute and State University

A notable concern of many teachers is that they frequently have the


task of constructing tests but have relatively little training or
information to rely on in this task. The objective of this article is to
set out some conventional wisdom for the construction of multiple-
choice tests, which are one of the most common forms of teacher-
constructed tests. The comments which follow are applicable mainly
to multiple-choice tests covering fairly broad topic areas.

Before proceeding, it will be useful to establish our terms for


discussing multiple-choice items. The stem is the introductory
question or incomplete statement at the beginning of each item and
this is followed by the options. The options consist of the answer --
the correct option -- and distractors--the incorrect but (we hope)
tempting options.

General Objectives

As a rule, one is concerned with writing stems that are clear and
parsimonious, answers that are unequivocal and chosen by the
students who do best on the test, and distractors that are plausible
competitors of the answer as evidenced by the frequency with which
they are chosen. Lastly, and probably most important, we should
adopt the attitude that items need to be developed over time in the
light of evidence that can be obtained from the statistical output
typically provided by a measurement services office (where tests are
machine-scored) and from "expert" editorial review.

Planning

The primary objective in planning a test is to outline the actual


course content that the test will cover. A convenient way of
accomplishing this is to take 10 minutes following each class to list
on an index card the important concepts covered in class and in
assigned reading for that day. These cards can then be used later as
a source of test items. An even more conscientious approach, of
course, would be to construct the test items themselves after each
class. The advantage of either of these approaches is that the
resulting test is likely to be a better representation of course activity
than if the test were constructed before the course or after the
course, when we usually have only a fond memory or optimistic
syllabus to draw from. When we are satisfied that we have an
accurate description of the content areas, then all that remains is to
construct items that represent specific content areas. In developing
good multiple-choice items, three tasks need to be considered:
writing stems, writing options, and ongoing item development. The
first two are discussed in this article.

Writing Stems

We will first describe some basic rules for the construction of


multiple-choice stems, because they are typically, though not
necessarily, written before the options.

1. Before writing the stem, identify the one point to be tested by


that item. In general, the stem should not pose more than one
problem, although the solution to that problem may require more
than one step.

2. Construct the stem to be either an incomplete statement or a


direct question, avoiding stereotyped phraseology, as rote responses
are usually based on verbal stereotypes. For example, the following
stems (with answers in parentheses) illustrate undesirable
phraseology:

What is the biological theory of recapitulation? (Ontogeny repeats


phylogeny)

Who was the chief spokesman for the "American System?" (Henry
Clay)

Correctly answering these questions likely depends less on


understanding than on recognizing familiar phraseology.

3. Avoid including nonfunctional words that do not contribute to the


basis for choosing among the options. Often an introductory
statement is included to enhance the appropriateness or significance
of an item but does not affect the meaning of the problem in the
item. Generally, such superfluous phrases should be excluded. For
example, consider:

The American flag has three colors. One of them is (1) red (2) green
(3) black
versus
One of the colors of the American flag is (1) red (2) green (3) black

In particular, irrelevant material should not be used to make the


answer less obvious. This tends to place too much importance on
reading comprehension as a determiner of the correct option.

4. Include as much information in the stem and as little in the


options as possible. For example, if the point of an item is to
associate a term with its definition, the preferred format would be to
present the definition in the stem and several terms as options
rather than to present the term in the stem and several definitions
as options.

5. Restrict the use of negatives in the stem. Negatives in the stem


usually require that the answer be a false statement. Because
students are likely in the habit of searching for true statements, this
may introduce an unwanted bias.

6. Avoid irrelevant clues to the correct option. Grammatical


construction, for example, may lead students to reject options which
are grammatically incorrect as the stem is stated. Perhaps more
common and subtle, though, is the problem of common elements in
the stem and in the answer. Consider the following item:

What led to the formation of the States' Rights Party?


a. The level of federal taxation
b. The demand of states for the right to make their own laws
c. The industrialization of the South
d. The corruption of federal legislators on the issue of state taxation
One does not need to know U.S. history in order to be attracted to
the answer, b.

Other rules that we might list are generally commonsensical,


including recommendations for independent and important items
and prohibitions against complex, imprecise wording.

Writing Options

Following the construction of the item stem, the likely more difficult
task of generating options presents itself. The rules we list below are
not likely to simplify this task as much as they are intended to guide
our creative efforts.

1. Be satisfied with three or four well constructed options. Generally,


the minimal improvement to the item due to that hard-to-come-by
fifth option is not worth the effort to construct it. Indeed, all else the
same, a test of 10 items each with four options is likely a better test
than a test with nine items of five options each.

2. Construct distractors that are comparable in length, complexity


and grammatical form to the answer, avoiding the use of such words
as "always," "never," and "all." Adherence to this rule avoids some of
the more common sources of biased cueing. For example, we
sometimes find ourselves increasing the length and specificity of the
answer (relative to distractors) in order to insure its truthfulness.
This, however, becomes an easy-to-spot clue for the testwise
student. Related to this issue is the question of whether or not test
writers should take advantage of these types of cues to construct
more tempting distractors. Surely not! The number of students
choosing a distractor should depend only on deficits in the content
area which the item targets and should not depend on cue biases or
reading comprehension differences in "favor" of the distractor.

3. Options which read "none of the above," "both a. and e. above,"


"all of the above," _etc_., should be avoided when the students have
been instructed to choose "the best answer," which implies that the
options vary in degree of correctness. On the other hand, "none of
the above" is acceptable if the question is factual and is probably
desirable if computation yields the answer. "All of the above" is
never desirable, as one recognized distractor eliminates it and two
recognized answers identify it.

4. After the options are written, vary the location of the answer on as
random a basis as possible. A convenient method is to flip two (or
three) coins at a time where each possible Head-Tail combination is
associated with a particular location for the answer. Furthermore, if
the test writer is conscientious enough to randomize the answer
locations, students should be informed that the locations are
randomized. (Testwise students know that for some instructors the
first option is rarely the answer.)

5. If possible, have a colleague with expertise in the content area of


the exam review the items for possible ambiguities, redundancies or
other structural difficulties. Having completed the items we are
typically so relieved that we may be tempted to regard the task as
completed and each item in its final and permanent form. Yet,
another source of item and test improvement is available to us,
namely, statistical analyses of student responses.

This article was adapted with from Testing Memo 4: Constructing


Multiple-Choice Tests -- Part I, Office of Measurement and Research
Services, Virginia Polytechnic Institute and State University,
Blacksburg, VA 24060

Further Reading

Airasian, P. (1994) Classroom Assessment, Second Edition, NY:


McGraw-Hill.

Cangelosi, J. (1990) Designing Tests for Evaluating Student


Achievement. NY: Addison Wellesley.

Grunlund, N (1993) How to make achievement tests and


assessments, 5th edition, NY: Allen and Bacon.

Haladyna, T.M. & Downing, S.M. (1989) Validity of a Taxonomy of


Multiple-Choice Item-Writing Rules. Applied Measurement in
Education, 2 (1), 51-78.
Descriptors: *Culture Fair Tests; *Distractors (Tests); Educational Assessment; Item Bias;
Measurement Techniques; *Multiple Choice Tests; Scoring; *Statistical Analysis; Stereotypes; *Test
Construction; Test Items; Test Theory
| Main Page | The Language Teacher | JALT Journal | Other Publications
| JALT National |

The Language Teacher

Assessment and the Language


Teacher: Trends and
Transitions
Geoff Brindley
National Centre for English Language Teaching and Research,
Macquarie University, Australia

Return to The Language Teacher Online

In recent years, assessment policies and practices in language teaching


programs worldwide have been changing in a number of ways, both at system
and classroom level. In line with political and economic pressure to
demonstrate 'efficiency and effectiveness', educational institutions are
increasingly being called on to provide much more explicit information on
program activities and outcomes. In some countries, this has resulted in a
proliferation of standardized tests of various kinds. In others, educational
authorities have introduced outcomes-based approaches which use teacher-
conducted assessments as a basis for reporting learners' progress and
achievement against system-wide or national standards (Brindley, 1997).
Teachers are thus finding themselves in the position of having to develop tools
and procedures for monitoring, recording, and assessing learners' progress and
achievement in the classroom on a more systematic and formal basis.

At this stage, however, relatively little is known about how teachers are
dealing with these new demands. What I want to do in this paper, therefore, is
to consider some of the effects of the changing assessment landscape on
language teachers' role.

Assessment skills required by teachers


In order to carry out formal assessments which can meet minimum
standards of validity and reliability, teachers need a wide range of
skills. These include:
Observing, interpreting and, documenting learners' use of
language
Designing classroom tests and assessment tasks
Analysing test results
Providing diagnostic feedback to learners
Evaluating the quality of tests and assessment tasks
Evaluating the quality of learners' language performances
according to rating scales
Writing evaluative reports for program administrators

However is it realistic to expect that teachers should possess these skills, given
that assessment could by no means be considered a 'core' component of
language teacher training courses? In this regard, Doherty, Mangubhai, and
Shearer (1996), discussing the introduction of a new national assessment and
reporting framework for adult literacy in Australia, comment:

Respondents who had a moderate level of assessment training were


not convinced that their training equipped them for their assessment
duties as delineated in explicit curricula.

Time demands of increased assessment


One of the most frequent findings which has emerged from accounts of
the development of outcomes-based approaches is that any kind of
assessment and reporting system which is reliant on teacher-
conducted assessment and observation is extremely time-consuming.
A commonly voiced concern is that constant assessment can erode
teaching time. Cumming (1997), describing Grade 9 teachers' use of
language standards in Ontario, identifies these sorts of pressures,
typified by the following comment from a teacher:
There are so many tests coming from on high that I have no idea what
they are. We are doing so many tests and all we find is that the
patients are very sick. But what do we do, how do we get the patient
well? To do that, we have to have time to teach.

Echoing this theme, Barrs (1992) reports that one of the main problems in
implementing the detailed observational recording system used with the
Primary Language Record in the United Kingdom was the sheer amount of
time necessary to document many student performances on an ongoing basis:
". . . it does seem to be the case that it takes a full school year to 'learn the
forms', to internalise the ways of observing that they encapsulate and to see
the full value of this kind of recording" (p. 56).

What can be done?


Since teachers are the people who are responsible for implementing
assessment "on the ground", it is important to ensure that they have
the opportunity to acquire the skills they need to conduct high quality
assessments through appropriately-targetted professional
development. Some of their needs can be addressed by enrollment in
formal degree courses or through attendance at in-service workshops.
However this theoretical knowledge needs to be supplemented by on-
the-job experience in developing and using assessment tools.

There are also a variety of other ways in which teachers can enhance their
assessment skills. These include moderation sessions which involve teachers
coming together on a regular basis to discuss performance standards or
criteria, using samples of students' work. Such sessions provide an opportunity
for teachers to become familiar with typical or "benchmark" performances
representing different levels of ability, thus helping them to improve the
consistency of their judgements. At the same time, the close focus on features
of language learning and use which accompanies the discussion of learner
performances serves a valuable professional development function.

Another way of developing assessment expertise and at the same time


improving the quality of standardized tests is through collaborative test
development projects in which practitioners and professional testers work
together (Shohamy, 1992). The involvement of teachers in developing
specifications, item writing and trialling can help to ensure that test content is
consonant with current teaching practices, thus increasing the likelihood that
the test will have beneficial washback on teaching.

Providing institutional support


If teachers are to assume greater responsibility for assessment, they
require sufficient time and resources to do the job properly. In some
cases, however, policy makers and program administrators may need
to be convinced of this, since they may not be aware of how time-
consuming assessment can be, particularly when it involves the
construction of formal tests. One way to demonstrate the impact of
increased assessment duties on teachers' day-to-day work is to pilot
the new tests or assessments over a reasonable length of time,
documenting the kinds of assessment-related tasks performed by
teachers and how long they take. Gunn (1995), for example, did this
and found that carrying out systematic criterion-referenced
assessment in an adult ESL class took up over 20% of class time! Once
it becomes clear what demands a new form of assessment makes on
teachers' time, an argument can be made, if necessary, for a
corresponding reduction in teaching loads or the injection of extra
resources.

When considering the issue of institutional support, it is important to bear in


mind that the introduction of a new assessment policy or practice is no
different from introducing a new curriculum or textbook-it is an exercise in
educational change which requires careful planning and ongoing management
(Fullan and Stiegelbauer, 1991). If the proposed change is clearly going to
have major effects on people's work practices, it may be necessary to
designate a person with specific responsibilities for assessment who can help
people at all levels of the system work through the implications of the change.
These responsibilities might include, among other things, communication with
management and teachers, identification of teacher training needs, conduct of
moderation sessions, co-ordination of test development and the collection of
assessment information for reporting purposes.

Conclusion
In this paper I have tried to sketch out some of the consequences that
an expanded assessment role would have for language teachers.
Experience indicates that they are prepared to undertake such a role if
they perceive clear benefits for learners (Shohamy, 1992). However,
given the level of assessment training with which many teachers enter
the profession, it would be unrealistic to expect that this could happen
without a considerable investment of resources both in terms of
professional development and institutional support. As more and more
educational systems move in the direction of increased accountability
and place greater assessment demands on teachers, it will become
clearer to what extent these resources will be forthcoming.

References
Barrs, M. (1992). The Primary Language Record: What we are learning in
the UK. In C. Bouffler (ed.) Literacy evaluation: Issues and practicalities.
Sydney: Primary English Teaching Association.

Brindley, G. (1997). Assessment and reporting in second language programs:


Purposes, problems and pitfalls. In E. S. L. Li & G. James (Eds.) Testing and
evaluation in second language education. Hong Kong: Language Centre,
Hong Kong University of Science and Technology.

Cumming, A. (1997, March). Grade 9 teachers' use of standards. Paper


presented at Colloquium on Implementation of Language Standards, American
Association of Applied Linguistics Conference, Orlando, Florida.

Doherty, C., Mangubhai, F., & Shearer, J. (1996). Assessment as an ongoing


feature of the learning environment. In J. J. Cumming & C. E. van Kraaynoord
(Eds.). Adult literacy and numeracy: Assessing change. Melbourne: Language
Australia.

Fullan, M., & Stiegelbauer, S. (1991). The new meaning of educational


change. London: Cassell.

Gunn, M. (1995). Criterion-based assessment: A classroom teacher's


perspective. In G. Brindley (Ed.), Language assessment in action. Sydney:
National Centre for English Language Teaching and Research, Macquarie
University.

Shohamy, E. (1992). New modes of assessment: The connection between


testing and learning. In E. Shohamy & R. Walton (Eds). Language assessment
for feedback:Testing and other strategies. Dubuque, Iowa: Kendall Hunt
Publishing Company.

Geoff Brindley's workshop is sponsored by The Australian International


Education Foundation (AIEF).

All articles at this site are copyright © 1997 by their respective authors.
Document URL: http://www.jalt-publications.org/tlt/files/97/sep/brindley.html
Last modified: September 15, 1997
Site maintained by TLT Online Editor

Вам также может понравиться