Вы находитесь на странице: 1из 21

CHAPTER IV

TESTING LANGUAGE SKILLS AND COMPONENTS

This chapter discusses several ways of testing language skills and components. However, it will be
preceded by some explanation about various test techniques.

A. Test Techniques and Testing Overall Ability


Test techniques are means of eliciting behavior from test-takers in which their language abilities
will be revealed. Some test techniques will be discussed here and followed with Cloze test and dictation
which are used to test overall ability.

Multiple Choice Test


Multiple choice items take many forms, but their basic structure is as follows.
There is a stem:
Andy has been here ….. half an hour.
and a number of options, one of which is correct, the others being distractors:
A. during
B. for
C. while
D. since

It is the test-takers’ task to identify the correct or most appropriate option (in this case B).
Multiple choice test technique has some advantages. The most obvious advantage is that scoring can be
perfectly reliable. Scoring should also be rapid and economical. A further considerable advantage is that
it is possible to include more items than other forms of tests since the test-takers have only to make a
mark on the paper.
Despite the advantages, multiple choice test technique has also some limitations. This technique
tests only recognition knowledge. It cannot give an accurate picture of test-takers’ performance. A
multiple choice grammar test score, for example, may be poor indicator of someone’s ability to use
grammatical structure. The person who can identify the correct response in the item above may not be
able to produce the correct form when speaking or writing. Therefore, the construct validity of such
technique is questionable.
Besides, multiple choice test technique gives the test-takers a chance of guessing the correct
answer. It will not be known what part of any particular individual’s score has come about through
guessing.
Writing successful items for multiple choice tests is also extremely difficult. Hughes (1989:61)
provides some commonest problems in multiple choice tests. Among others are that there are more than
one correct answer, there is no correct answer, there are clues in the options as to which is correct, and
the distractors are ineffective.
Practice language using multiple choice items will not be the best way for the students to improve
their command of a language since usually much attention is paid to improving one’s guessing rather
than to the content of the items. Hughes(1986:61) consider multiple choice tests as having harmful
backwash.
Finally, multiple choice tests technique is said to facilitate students’ cheating because the
responses (a, b, c, d) are so simple that can make them communicate easily to others nonverbally.
All in all, the multiple choice technique is best suited to relatively infrequent testing of large numbers of
test-takers. In order to make effective and good items in multiple choice test, Djiwandono (2008:47)
suggests the test developer to be careful in formulating the stem, and the correct answer and the
distractors. The stem should be in the form of complete sentence whenever possible. In order to avoid
students’ guessing, it is important to have identical options in terms of form, content, and length. Having
identical options will force the students to think critically.

Matching Test
Matching test require the students to match two parts of a test. The two parts are usually
interrelated in terms of meaning or content. Usually, the two parts are in the form of list. The first list
usually consists of some statements or questions, while the second consists of responses. To make
matching test effective, the number of responses should be more than the statements. This is meant to
make the students think critically until the last questions.

True False Test


Similar to matching test, true false test has also two parts. The first part consists of a list of
statements. The second part is true (T) or false (F) listed beside each statement. The students should
choose true (T) when the statement is considered correct and vice versa.

Cloze, C-Test, and Dictation: measuring overall ability


Cloze, C-Test and dictation technique are recommended as means of measuring overall ability
because they are considered economical. The original form of cloze test involves deleting a number of
words in a passage, leaving blanks, and requiring the test-taker to replace the original words. After having
a short unmutilated ‘lead in’, it is usually about every seventh word which is deleted. The cloze procedure
seemed very attractive. It is easy to construct, administer and score.
The C-Test is really a variety of cloze which is considered superior to the general cloze
procedure. Instead of whole words, it is the second half of every second word which is deleted. An
example follows.
There are usually five men in the crew of fire engine. One o__ them dri__ the eng__. The lea___
sits bes___ the dri___. The ot___ firemen s__ inside t__ cab o__ the f___ engine. T__ leader
h__ usually be___ in t___ Fire Ser___ for ma___ years. H___ will kn___ how t___ diff___ sorts
o___ fires. S__, when t___ firemen arr___ at a fire, it is always the leader who decides how to
fight a fire.

The supposed advantages of C-Test over the more traditional one are that only exact scoring is
necessary and that shorter passages are possible. Possible disadvantage of C-Test is that it is harder to
read than the cloze procedure
Dictation is a testing technique in which the passage is read aloud to students, with pauses during
which they have to write down what they heard as accurately as possible (Richard et al, 1992). Dictation
test gives results similar to those obtained from cloze test. In predicting overall ability, it has the advantage
of involving listening ability. It is also easy to create, relatively easy to administer, but is not certainly easy
to score. Because of scoring problem, partial dictation may be considered as an alternative. In this, part
of what is dictated is already printed on the answer sheet. The test-takers are simply to fill in the gaps
and the scoring is likely to be more reliable.

B. Testing Language Skills and Language Components


As indicated previously that the target of language testing is testing language skills (listening,
speaking, reading, and writing) and language components (grammar, vocabulary and pronunciation).
Discussion of each skill and component will be presented as summarized from Hughes (1989)

Testing Listening
It may seem rather odd to test listening separately from speaking, since the two skills are typically
exercised together in oral interaction. However, there are occasions, such as listening to the radio,
listening to lectures, or listening to announcement, when no speaking is required.
The testing of listening involves listening macro-skills and micro-skills. The macro-skills of
listening include listening for specific information, obtaining gist of what is being said, following directions,
and following instructions. The micro-skills of listening include interpretation of intonation patterns and
recognition of function of structures. At the lowest level are abilities like being able to distinguish between
phonemes (for example between /w/ and /v/).
There are several types of texts that can be used for listening test such as monologue, dialogue,
or multi-participant. Those types can be in the forms of announcement, talk or lecture, instructions,
directions, etc.
The source of listening test materials can be recordings of radio broadcasts, television broadcast,
teaching materials, or even recording of native speakers made by ourselves. The most important thing
to consider is that recordings must be good and natural.
There are some techniques that are possibly used in testing listening. They are multiple choice, short
answer, information transfer, note taking, partial dictation and recordings and live presentations.
 Multiple choice The technique has some advantages and disadvantages as discussed
previously. For listening test, the problem is greater because the test-takers should listen to a
passage while reading the alternatives/options. Therefore, the options must be short and simple.
 Information transfer This technique is useful in testing listening since it makes minimal demands
on productive skills. It can involve such activities as the labeling of diagrams or pictures,
completing forms, or showing routes on a map.
 Note taking Where the ability to take notes while listening to a lecture is in question, this activity
can be quite realistically replicated in the testing situation. Test-takers take notes during the talk,
and only after the talk is finished do they see the items to which they have to respond.
 Partial dictation Although partial dictation may not be an authentic listening activity, it may be
possible to administer it when no other test of listening is practical.
 Recording and live presentation The great advantage of using recordings when administering a
listening test is that there is uniformity in what is presented to the test-takers. This is fine if the
recording is to be listened in a well-maintained language laboratory or in a room with good
acoustic qualities and with suitable equipment. If these conditions cannot be obtained, then a live
presentation is preferred.

Testing Speaking
The objective of teaching speaking is the development of the ability to interact successfully in
that language and therefore, speaking involves comprehension as well as production. Consequently,
testing speaking should enable the students to elicit the behavior which truly represents their ability and
which can be scored validly and reliably.
The materials tested for speaking test include dialog and multi-participant interactions including
operations of language functions such as:
- Expressing: thanks, requirements, opinions, comment, attitude, confirmation, apology,
want/need, information, complaints, reasons/justifications.
- Narrating: sequence of events
- Eliciting: information, directions, service, clarification, help, permission.
- Directing: ordering, instructing, persuading, advising, warning
- Reporting: description, comment, decisions.
There are several formats that can be used to test speaking ability. They are interview, interaction
with peers, and response to tape-recordings. Each format has some techniques.
 Interview It is the most obvious format for testing speaking.
 Questions and request for information. For questions and request, yes/no questions should be
avoided.
 Pictures can also be used to elicit descriptions. Series of pictures (or video sequences) form a
natural basis for narration.
 Interaction with peers Two or more test-takers may be asked to discuss a topic, make plans, and
so on.
 Role play Students can be asked to assume a role in a particular situation and the tester can act
as an observer.
 Response to tape-recordings Uniformity of elicitation procedures can be achieved through
presenting the students only with the same audio- (video-) tape recordings.
 Imitation The test-takers hear a series of sentences, each of which they have repeat in turn.

Scoring will be valid and reliable only if clearly recognizable and appropriate descriptions of criteria
levels are written and scorers are trained to use them. Description of speaking proficiency usually deals
with accent, grammar, vocabulary, fluency and comprehension as in the following examples taken from
Hughes (1989).

Proficiency Description
Accent
1. Pronunciation frequently unintelligible
2. Frequent gross errors and a very heavy accent make understanding difficult, require frequent
repetition.
3. Foreign accent requires concentrated listening, and mispronunciations lead to occasional
misunderstanding and apparent errors in grammar or vocabulary.
4. Marked foreign accent and occasional mispronunciations which do not interfere with
understanding.
5. No conspicuous mispronunciation, but would not be taken for a native speaker.
6. Native pronunciation, with no trace of foreign accent.

Grammar
1. Grammar almost entirely inaccurate phrases.
2. Constant errors showing control of very few major patterns and frequently preventing
communication
3. Frequent errors showing some major patterns uncontrolled and causing occasional irritation and
misunderstanding
4. Occasional error showing imperfect control of some patterns but no weakness that cause
misunderstanding.
5. Few errors, with no patters of failure.
6. No more than two errors during the interview.

Vocabulary
1. Vocabulary inadequate for even the simplest conversation.
2. Vocabulary limited to basic personal and survival areas (time, food, transportation, family, etc.)
3. Choice of words sometimes inaccurate, limitations of vocabulary prevent discussion of some
common professional and social topics
4. Professional vocabulary adequate to discuss special interests; general vocabulary permits
discussion of any non-technical subject with some circumlocutions.
5. Professional vocabulary broad and precise; general vocabulary adequate to cope with complex
practical problems and varied situations.
6. Vocabulary apparently as accurate and extensive as that of an educated native speaker.

Fluency
1. Speech is so halting and fragmentary that conversation is virtually impossible.
2. Speech is very slow and uneven except for short or routine sentences.
3. Speech is frequently hesitant and jerky; sentences may be left uncompleted
4. Speech is occasionally hesitant, with some unevenness caused by rephrasing and groping for
words.
5. Speech is effortless and smooth, but perceptibly non-native in speech and evenness.
6. Speech on all professional and general topics as effortless and smooth as a native speaker’s.

Comprehension
1. Understands too little for the simplest type of conversation.
2. Understands only slow, very simple speech on common social and touristic topics; requires
constant repetition and rephrasing.
3. Understands careful, somewhat simplified speech when engaged in a dialogue , but may require
considerable repetition and rephrasing.
4. Understands quite well normal educated speech when engaged in a dialogue, but requires
occasional repetition or rephrasing.
5. Understands everything in normal educated conversation except for very colloquial or low-
frequency items, or exceptionally rapid or slurred speech.
6. Understands everything in both formal and colloquial speech to be expected of an educated
native speaker.
Besides using clear descriptions of criteria levels, the use of more than one scorer will decrease
the subjectivity as described earlier. If two testers are involved in an interview, then they can
independently assess each test-taker. If they disagree, even after discussion, then a third scorer may be
referred to.

Testing Reading
Similar to listening skill, reading skill is a receptive skill. The task of language tester is, then, to
set reading tasks which will result in behavior that will demonstrate their successful completion.
The reading macro-skills (directly related to course objectives) are scanning text to locate specific
information, skimming text to obtain general idea, identifying stages of argument, and identifying
examples presented in support of an argument. The micro-skill underlying reading skills are identifying
referents of pronouns, using context to guess meaning of unfamiliar words, and understanding relations
between parts of text.
The reading texts can be taken from textbooks, novel, newspaper, magazine, academic journal,
letter, timetable, etc. The texts can be in the forms of newspaper report, advertisement, editorial, etc.
The techniques that might be used to test reading skills are multiple choice, true/false,
completion, short answer, guided short answer, summary cloze, information transfer, identifying order of
events, identifying referents, guessing the meaning of unfamiliar words from context.
 Multiple Choice The test-takers provide evidence of successful reading by marking a mark
against one out of a number of alternatives. Its strengths and weaknesses have been presented
earlier.
 True/false The test-takers should respond to a statement by choosing one of the two choices,
true or false.
 Completion The students are required to complete a sentence with a single word, for example:
……………was the man responsible for the first steam railway.
 Short answer It is in the form of questions and requires the students to answer briefly, for
example:
According to the author, what does the increase in divorce rates show about people’s expectations of
marriage?

 Guided short answer This is the alternative of short answer in which students are guided to have
the intended answer. They have to complete sentences presented to them, for example:
Complete the following based on the fourth paragraph!
‘Many universities in Europe used to insist that their students speak and write only ………………… Now
many of them accept ………………….. as an alternative, but not a ………………. of the two.
 Summary cloze A reading passage is summarized by the tester, and then gaps are left in the
summary for completion by the test-takers. This is really the extension of the guided short
answer.
 Information transfer One way to minimize demands on writing by test-takers is to require them to
show successful completion of a reading task by supplying simple information in a table, following
a route on a map, labeling a picture, and so on.
 Identifying order of events, topics, or arguments The test-takers can be required to number the
events etc.
 Identifying referents One of the ‘microskills’ listed previously was the ability to identify referents.
An example of an item to test this is:
What does the word ‘it’ (line 25) refer to? ……………………
 Guessing the meaning of unfamiliar words from context This is another microskill mentioned
earlier. Items may take the form:
Find a single word in the passage (between lines 1 and 25) which has the same meaning as ‘making of
laws’.

The above techniques are among the many techniques of testing reading. In scoring the reading test,
Hughes (1989) suggested that errors of grammar, spelling or punctuation should not be penalized, as
long as it is clear that the test-taker has successfully performed the reading task which the item set. The
function of a reading test is to test reading ability. To test productive skills at the same time simply makes
the measurement of reading ability less accurate.

Testing Writing
The best way to test people’s writing ability is to get them to write directly. Therefore, indirect
testing of writing ability cannot be possibly constructed as accurately as possible even by professional
testing institutions. There are three things that we should consider to develop a good test for writing:

1. We have to set writing tasks that are properly representative.


It is impossible to have the students do various tasks in a short test with few items. It is the test
developer task to create a representative sample of tasks. One example is provided by Hughes (1989)
in developing a test of English for academic purposes. The purpose of the test is to discover whether a
students’ written English is adequate for study through the medium of English at a particular overseas
university. An analysis of needs had revealed that the most important uses of written English were for the
purpose of taking notes in lectures and the writing of examination answers up to two paragraphs in length.
Using the above description, we can describe the relevant tasks such as asking the students to describe,
to explain, to compare and contrast, and to argue for something.

2. The tasks should elicit samples of writing which truly represent the students’ ability.
There are at least two things we can do to obtain the sample that properly represent each
student’s ability. The first one is to set as many tasks as is feasible. The reasons for this are because
students’ performance on the same task is not consistent. And they sometimes are better at some tasks
than others. Therefore, giving many different tasks will enable the test developer to see the students’
performance as objectively as possible.
The second way to elicit students’ writing ability is by testing only writing ability. Another ability
which at times interferes with the accurate measurement of writing is reading. It is acceptable to expect
students to be able to read simple instructions, but asking the students to read very difficult and long
instruction in writing test should be avoided. It will prevent the students to perform adequately on writing
task. To reduce students’ dependence on the students’ ability to read is to make use of illustrations in the
forms of diagram, a series of pictures or graphs.

3. The samples of writing can and will be scored reliably.


To facilitate reliable scoring, a test developer should set as many tasks as possible. The more
scores that scorers provide for each student, the more reliable should be the total score. The test-takers
should not be given too many choices of writing tasks. As discussed previously, the test-takers should
perform the same tasks to make the scoring easier. Finally, the samples of writing which are elicited
should be long enough for judgments to be made reliably.
In obtaining reliable scoring of writing, the process of scoring can be done either holistically or
analytically. Holistic scoring involves the assignment of a single score to a piece of writing on the basis
of an overall impression on it. This kind of scoring has the advantage of being very rapid. The following
is an example of holistic scoring provided by Cohen (1994:327-328)

Holistic Scoring:
5 The main idea is stated very clearly, and there is a clear statement of change of opinion. The essay
is well organized and coherent. The choice of vocabulary is excellent. There are no major or minor
grammatical errors. Spelling and punctuation are fine.

4 The main idea is fairly clear, and change of opinion is evident. The essay is moderately well organized
and is relatively coherent. The vocabulary is good, and there are only minor grammatical errors.
There are few spelling and punctuation errors.
3 The main idea and a change of opinion are indicated but not so clearly. The essay is not well organized
and is somewhat lacking in coherence. The vocabulary is fair, and there are some major and minor
grammatical errors. There are a fair number of spelling and punctuation errors.

2 The main idea and change of opinion are hard to identify in the essay. The essay is poorly organized
and relatively incoherent. The use of vocabulary is weak, and grammatical errors appear frequently.
Spelling and punctuation errors are frequent.

1 The main idea and change of opinion are absent in the essay. The essay is poorly organized and
generally incoherent. The use of vocabulary is very weak, and grammatical errors appear very
frequently. Spelling and punctuation errors are very frequent.

Method of scoring which require a separate score for each of a number of aspects of a writing task is said
to be analytic. The following is an example of analytic scoring provided by Cohen (1994:328-329)

Analytic Scoring:
Content
5 – Excellent : main ideas stated clearly and accurately, change of opinion very clear
4 – Good : main ideas stated fairly clearly and accurately, change of opinion relatively clear
3 – Average : main ideas somewhat unclear and inaccurate, change of opinion somewhat weak
2 – Poor : main ideas not clear or accurate, change of opinion weak
1 – Very Poor : main ideas not at all clear or accurate, change of opinion very weak

Organization
5 – Excellent : well organized and perfectly coherent
4 – Good : fairly well organized and generally coherent
3 – Average : loosely organized but main ideas clear, logical but incomplete sequencing
2 – Poor : ideas disconnected, lacks logical sequencing
1 – Very poor : no organization, incoherent

Vocabulary
5 – Excellent : very effective choice of words and use of idioms and word forms
4 – Good : effective choice of words and use of idioms and word forms
3 – Average : adequate choice of words but some misuse of vocabulary, idioms and word forms
2 – Poor : limited range, confused use of words, idioms, and word forms
1 – Very Poor : very limited range, very poor knowledge of words, idioms, and word forms

Grammar
5 – Excellent : no errors, full control of complex structure
4 – Good : almost no errors, good control of structure
3 – Average : some errors, fair control of structure
2 – Poor : many errors, poor control of structure
1 – Vary Poor : dominated by errors, no control of structure

Mechanics
5 – Excellent : mastery of spelling and punctuation
4 – Good : few errors in spelling and punctuation
3 – Average : fair number of spelling and punctuation errors
2 – Poor : Frequent errors in spelling and punctuation
1 – Very poor : no control over spelling and punctuation

The choice between holistic and analytic scoring depends on the purpose of testing (Hughes,
1989:97). If diagnostic information is required, then analytic scoring is essential. If the scoring is carried
out by a small group of people, then holistic scoring may be appropriate. Analytic scoring is used when
scoring is conducted by heterogeneous, less well-trained people or in a number of different places.
However, whichever is used, multiple scoring involving two or more scorers is suggested.

Testing Grammar
The place of grammar in language teaching is sometimes debatable. Some may think that
control of grammatical structure was seen as the core of language ability and it would have been
unthinkable not to test it. For that reason, most proficiency tests include a grammar section besides the
reason of its ease with which large numbers of items can be administered and scored within a short
period of time.
On contrast, others see that one cannot accurately predict mastery of grammar by measuring
control of what we believe to be the abilities that underlie it. Besides, the backwash effect of grammar
test may encourage the learning of grammatical structures in isolation, with no apparent need to use
them. Therefore, consideration of this kind has resulted in the absence of grammar components in
some well-known proficiency tests.
However, whether or not grammar has an important place in an institution’s teaching, it has to
be accepted that grammatical ability has an important influence on someone’s performance. The
successful writing of academic writing, for example, must depend to some extent on command of some
elementary grammatical structures. Therefore, it can be said that there is still room for grammar
component in a language test.
The specification of grammar test should be in line with the teaching syllabus if the syllabus lists
the grammatical structures to be taught. When there is no such list, it becomes necessary to infer from
textbooks or other teaching materials.
There are some techniques that can be used to test grammar. Multiple choice is one alternative.
However, it is not suggested for its difficulty in finding appropriate distractors. The other proposed
techniques are paraphrase, completion, and modified cloze.
 Paraphrase This technique requires the students to write a sentence equivalent in meaning to
one that is given. It is helpful to give part of the paraphrase in order to restrict the students to the
grammatical structures being tested. An example of testing passive past continuous form would
be:
When we arrived, a policeman was questioning the bank clerk.
When we arrived, the bank clerk ……………………………..

 Completion This technique can be used to test variety of structures. The following is an example
of testing interrogative forms:
In the following conversation, some sentences have been left incomplete. Complete them
suitably. Read the whole conversation before you begin to answer the question.

(Mr. Cole wants a job in Mr. Gilbert’s export business. He has come from an interview.)

Mr. Gilbert: Good morning, Mr. Cole. Please come in and sit down. Now let me see. (1)
Which school ……………………………………………………….?
Mr. Cole: Whitestone College
Mr Gilbert: (2) And when …………………………………………………………...?
Mr. Cole: In 1999, at the end of the summer term.
Mr. Gilbert: (3) And since then, what ……………………………………………….?
Mr. Cole: I worked in a bank for a year. Then I took my present job, selling cars. But I
would like a change now.
Mr. Gilbert: (4) Well, what sort of a job ……………………………………………?
Mr. Cole: I’d really like to work in your Export Department.

 Modified cloze This technique can be in the form of the following example of testing articles:

Write the, a, or NA (No Article) in the blanks.


In England, children go to ..… school from Monday to Friday. ..… school that Mary goes is very
small. She walks there each morning with ….. friend. One morning, they saw ….. man throwing
….. stones and ….. pieces of wood at ….. dog. ….. dog was afraid of ….. man.

In the scoring process, the scorer should only score what the item is testing, not something else.
For instance, when the focus is to test pronoun, the error on a missing third person -s should not be
penalized. Finally, for valid and reliable scoring of grammar items, careful preparation of the scoring key
is necessary.

Testing Vocabulary
The debate on testing vocabulary is equal to the testing of vocabulary. Clearly, knowledge of
vocabulary is essential to the development and demonstration of linguistic skills. But according to some
people, that does not mean that it should be tested separately.
On the other hand, some argue that some time should devoted to the regular, conscious teaching
of vocabulary, Thus, it is important to test vocabulary as an achievement test of vocabulary after teaching.
The specification for vocabulary achievement test should be based on all items presented to the
students in vocabulary teaching. When placement test is applied, the vocabulary being tested should
refer to one of common published word lists.
The testing of vocabulary productively is so difficult. Information on receptive ability is regarded as
sufficient. The following techniques are suggested only for possible use in achievement test.
 Pictures The use of pictures can limit the students to lexical items that we have in mind. Some
pictures are provided and the students are required to write down the names of the objects. This
method of testing vocabulary is obviously restricted to concrete nouns which can be drawn.
 Definitions This may work for a range of lexical items. The following is an example of such test.
A …… is a person who looks after our teeth.
……… is frozen water.
……… is the second month of the year.

But not all items can be identified using a definition. Nor can all words be defined entirely in words
more common or simpler than themselves.
 Gap filling This can take the form of one or more sentences with a single word missing.
Because of the snow, the football match was ….. until the following week.
I ….. to have to tell you this, Mrs. Jones, but your husband has had an accident

To avoid various answers, the first letter of the word or even the indication of the number of
letters can be given

Testing Pronunciation
Heaton (1990) includes pronunciation into testing speaking skill. There are at least three
techniques of testing pronunciation. They are pronouncing words in isolation, pronouncing words in
sentences, and reading aloud.
 Pronouncing words in isolation The importance of listening in almost all tests of speaking,
especially those of pronunciation, should never be underestimated. It is impossible for students
to pronounce words correctly unless they first hear and recognize the precise sound of that word.
In the early stages of learning English, it is useful to base our pronunciation tests on minimal
pairs: that is, pairs of words which differ only in one sound, for example:

bud bird ferry fairy


nip nib boss bus
pill pail knit lit
ball bowl fry fly
sheet seat sport support
Pictures can also be used to test the students’ pronunciation. The students can be shown pictures
and asked to identify the object of each picture. Each picture is based on a possible source of confusion.
For example, a picture of ship can be used to test the students to distinguish between sheep and ship.
 Pronouncing words in sentences Students can also be asked to read aloud sentences containing
the problematic sounds which we want to test. Sentences are, of course, preferable because
they provide a context for the sounds (as in real life). For example:
There were several people standing in the hole. (hole/hall)
Are you going to sail your boat today (sail/sell)
Do you like this sport? (sport/spot)

 Reading aloud Reading aloud can offer a useful way of testing pronunciation provided that we
give a student a few minutes to look at the reading text first. When we choose suitable texts to
read aloud, it is useful to imagine actual situations when someone may read something aloud.
For example, people read aloud news on TV, letters, or instructions.

Discussion
1. Having cloze test passage in the chapter, complete it and say what you think each item is testing?
2. Discuss when and how multiple choice tests can be used appropriately in an English classroom?
3. What advantages can we have in testing language proficiency by using dictation?
4. Design a test that requires the test-takers to draw (complete) simple picture after listening to an
instruction!
5. Can reading aloud be included into one technique of testing reading ability?
6. Do you think grammar should be tested separately?
7. What do you think the best way of testing writing ability?

CHAPTER V
DESIGNING CLASSROOM TESTS AND STANDARDIZED TESTS

This chapter provides the teachers with step-by-step procedures in designing classroom tests
and standardized tests. Most of the explanation is summarized from Brown (2004).

A. Designing Classroom Tests


We, as teachers, need to take a lot of effort and a long time to design and to refine a good test
through trial and error. Here are some practical steps in constructing classroom test adapted from Brown
(2004).

Assessing Clear, Unambiguous Objectives


When we wants to develop a good classroom test, we need to know as specifically as possible
what we want to test. We can do this by looking carefully at what we think the students should “know” or
“be able to do,” based on the material that the students are responsible for. In other words, we need to
examine the objectives for the unit we are testing.
Ideally, every curriculum has appropriately framed assessable objectives, that is, objectives that
are stated in terms of explicit performance by students. “Students will produce yes/no question with final
raising intonation” is a good example of objective because acceptable students’ level of performance is
specified and the objective can be tested. Unfortunately, however, not all objectives are stated clearly.
Then, we have to go back through a unit and formulate them by ourselves. Remember that we have to
state the performance elicited and the target linguistic domain. See Table 5.1

Table 5.1 Example of Selected Objectives for a unit in a low-intermediate integrated-skills course
(Brown, 2004:50)

Form-focused objectives (listening and speaking)


Students will
1. recognize and produce tag questions, with correct grammatical form and final intonation
pattern, in simple social conversations.
2. recognize and produce wh-information questions with correct final intonation pattern.

Communication skills (speaking)


Students will
3. state completed actions and events in a social conversation.
4. ask for confirmation in a social conversation.
5. give opinions about an event in a social conversation.
6. produce language with contextually appropriate intonation, stress, and rhythm.

Reading (simple essay or story)


Students will
7. recognize irregular past tense of selected verbs in a story or essay.

Writing skills (simple essay or story)


Students will
8. write a one paragraph story about a simple event in the past.
9. use conjunctions ‘so’ and ‘because’ in a statement of opinion.

In reviewing the objectives of a unit, we cannot possibly test each one. We will then need to
choose a possible subset of the objectives to test.
Drawing Up Test Specifications
Test specifications for classroom use can be a simple and practical outline of our test. Test
specifications will simply comprise (a) a broad outline of the test, (b) the skills being tested, and (c) test
items. The following test specification based on the objectives in Table 5.1 will provide a clearer
description of how to make it.

Table 5.2 Test Specifications


Speaking (5 minutes per person)
- Format : oral interview, teacher and students
- Tasks : teacher asks questions to students (objectives 3 and 5, emphasis on 6)

Listening (10 minutes)


- Format : Teachers makes audiotape in advance, with one other voice on it
- Tasks : a. minimal pair items, multiple choice (objective 1)
b. 5 interpretation items, multiple choice (objective 2)

Reading (10 minutes)


- Format : cloze test items (10 total) in a story line
- Tasks : fill-in-the-blanks (objective 7)

Writing (10 minutes)


- Format : prompt for a topic: why I liked/disliked a recent TV sitcom
- Tasks : writing a short opinion paragraph (objective 9)

These informal, classroom-oriented specifications give us an indication of the topics (objectives),


the format of test, the number of items in each section, and the time allocated for each. Notice that not
all objectives (objectives 4 and 8) are tested. This is because of time limitation.

Devising Test Tasks


Having oral interview, we have to draft the questions to conform to the accepted pattern of oral
interviews which begin and end with nonscored items (warm-up and wind-down) designed to set students
at ease and then sandwich between them items intended to test the objective (level check) and a little
beyond (probe)

Table 5.3 Oral Interview Form


A. Warm-up: questions and comments
B. Level-check questions (objective 3, 5, and 6)
1. Tell me about what you did last weekend
2. Tell me about an interesting trip you took last year.
3. How did you like the TV show we saw this week.
C. Probe (objective 5,6)
1. What is your opinion about _____? (news event)
2. How do you feel about _____? (another news event)
D. Wind-down: comments and reassurance

Now, we are ready to draft other test items. To provide a sense of authenticity and interest, we
make the items based on the context of a recent TV sitcom that we have used in class. The sitcom
described a loud, noisy party with lots of small talk. Finally, we have the following samples of test items
for each section.

Table 5.4 Test items sample (first draft)


Listening, part a (sample item)
Directions: Listen to the sentence on the tape. Choose the sentence on your test page that
is closest in meaning to the sentence you heard.
- Voice : They sure made a mess, didn’t they?
- Students read : a. They didn’t make a mess, did they?
b. They did make a mess, didn’t they?

Listening, part b. (sample item)


Directions: Listen to the question on the tape. Choose the sentence on your test page that
is the best answer to the question.
- Voice : Where did George go after the party last night?
- Students read : a. yes, he did.
b. Because he was tired.
c. To Elaine’s place for another party
d. He went home around eleven o’clock
Reading (sample items)
Direction: Fill in the correct tense of the verb (in parentheses) that should go in each blank.
Then, in the middle of this loud party, they (hear) _____ the loudest thunder you have ever
heard! An then right away lightning (strike) _____ right outside their house!

Writing
Directions: Write a paragraph about what you liked or didn’t like about one of the characters
at the party in the TV sitcom we saw.

However, the above test items are quite traditional. It should be admitted that the format of some
of the items is unnatural, thus lowering the level of authenticity. Therefore, the above test items need to
be revised.
In revising our draft, we need to ask some important questions:
1. Are the directions to each section absolutely clear?
2. Is there an example item for each section?
3. Does each item measure a specified objective?
4. Is each item stated in clear, simple language?
5. Does each multiple choice item have appropriate distractors?
6. Is the difficulty of each item appropriate for your stuff\dents?
7. Is the language of each item sufficiently authentic?
8. Do the sum of the items and the test as a whole adequately reflect the learning objective?

In the current example that we have been analyzing, our revising process is likely to result in at least
four changes or additions:
1. In both interview and writing sections, you recognize that a scoring rubric will be essential. For
the interview we decide to create a holistic scale and and for the writing section we devise a
simple analytic scale that captures only the objectives you have focused on (see the previous
chapter).
2. In the interview questions, you realize that follow-up questions may be needed for students who
give one-word or very short answers.
3. In the listening section, part, you intend choice “c” as the correct answer, but we realize that
choice “d” is also acceptable. We need an answer that is unambiguously incorrect. You shorten
it “d. Around eleven o’clock.” We also note that providing the prompts for this section on audio
recording will be logistically difficult, and so we opt to read these items to your students.
4. In the writing prompt, we can see how some students would not use the words so or because,
which were in our objectives, so we re-word the prompt: “Name one of the characters at the
party in the TV sitcom we saw. Then use the word so at least once and the word because
at least once to tell why you liked or didn’t like that person.”

Ideally, we have to try out all your tests on the students not in our class before actually administering
the tests. But, in our daily classroom teaching, the tryout phase is almost impossible. Alternatively, we
could enlist the aid of a colleague to look over our test.
In the final revision of our test, imagine that you are a student taking the test. Go through each
set of directions and all items slowly and deliberately. It’s better if we time ourselves. If the test should be
shortened or lengthened, we should make necessary adjustments. We have to be sure that everything is
good.

B. Standardized Tests
A standardized test presupposes certain standard objectives, or criteria, that are held constant
across one form of the test to another. The criteria in large-scale standardized tests are designed to apply
to a broad hand of competencies that are usually no exclusive to one particular curriculum. A good
standardized test is the product o a thorough process of empirical research and development. It dictates
standard procedures for administration and scoring. And finally, it is typical of a norm-referenced test, the
goal of which is to place test-takers on a continuum across a range scores and to differentiate test-takers
by their relative ranking.
Many people are under the incorrect impression that all standardized tests consist of items
presented in multiple-choice format. While it is true that many standardized tests conform to a multiple-
choice format for its objective standard, multiple-choice is not a prerequisite characteristic of standardized
tests. Human-scored tests standard are also involved in standardized tests as in Test of Spoken English
(TSE) and Test of Written English (TWE) produced by Educational Testing Service (ETS).
Standardized tests have both advantages and disadvantages. The advantages of standardized
testing include a ready-made previously validated product that frees the teacher from having to spend
hours creating a test. Administration to large groups can be accomplished within reasonable time limits.
In the case of multiple-choice formats, the scoring procedures are easy.
The disadvantage of standardized tests centers largely on the inappropriate use of such tests,
for example using an overall proficiency test as an achievement test simply because of the convenience
of standardization. Therefore, the teachers should be careful in using standardized test.

Developing a Standardized Test


While it is not likely that a classroom teacher would be in a position to develop a brand-new
standardized test of large-scale proportions, it is virtual certainty that some day we will be in a position to
revise an existing test, to adapt or expand an existing test, and/or to create a smaller-scale standardized
test for a program we are teaching in. Here are some steps in developing a standardized test by using
TOEFL as an example.
1. Determine the purpose and objective of the test. Most standardized tests are expected to provide
high practicality in administration and scoring without improperly compromising validity. The initial
outlay of time and money for such test is significant. It is therefore, important for its purpose and
objectives to be stated specifically. Dealing with TOEFL, the purpose of TOEFL is to evaluate the
English proficiency of people whose native language is not English. More specifically, TOEFL is
designed to help institutions of higher learning make valid decisions concerning English language
proficiency in terms of their own requirements. As we can see, the objectives of TOEFL are specific.
The content of each test must be designed to accomplish those particular ends.
2. Design test specification. Decision need to be made on how to go about structuring the specifications
of the test. A comprehensive program of research must be done to identify a set of constructs
underlying the test itself. This stage of laying the foundation of stones can occupy weeks, months, or
even years of effort. Standardized tests that don’t work are often the product of short-sighted
construct validation. In the case of TOEFL, its construct validation is carried out by the TOEFL staff
at Educational Testing Service (ETS) under the guidance of a Policy Council that works with the
committee of Examiners that is composed of appointed external university faculty, linguists, and
assessment specialists. Because TOEFL is a proficiency test, the first step in the developmental
process is to define the construct of language proficiency. First, it should be made clear about the
term ‘proficiency’. According to Lowe as quoted by Brown (2004:71), proficiency is a holistic, unitary
trait view of language ability. How we view language will make a difference in how we assess
language proficiency. After breaking language competence down into subsets of listening, speaking
reading and writing, each performance mode can be examined on a continuum of linguistic units:
phonology (pronunciation), and orthography (spelling), words (lexicon), sentences (grammar),
discourse and pragmatic (sociolinguistic, contextual, functional, cultural) features of language.
Finally, to make a very long story short, the TOEFL had for many years included three types of
performance in its organizational specifications: listening, structure, and reading, all of which tested
comprehension through standard multiple choice tasks. In 1996, a major step was taken to include
written production in computer-based TOEFL.
3. Design, select, and arrange test tasks/items. Once the specifications for the standardized test have
been stipulated, the never-ending task of designing, selecting, and arranging items begins. The
process involves determining the number and types of items to be created. Dealing with TOEFL, the
first thing to do is coding the content. Items, then, are designed by a team who select and adapt items
solicited from a bank of items that have been deposited by free-lance writers and ETS staffs. The
content of reading section, for example, are usually excerpts from authentic general or academic
reading that are edited for linguistic difficulty, culture bias, or other topic biases. Items are designed
to test overall comprehension, certain specific information, and reference.
4. Make appropriate evaluations of different kinds of items. The indices on item facility (IF), item
discrimination (ID), and distractor analysis are a must in standardized multiple choice test. For
production responses format, the essay writing in this case, different forms of evaluation become
important. The principle of practicality and reliability are prominent.
5. Specify scoring procedures and reporting formats. The process of developing standardized tests
should yield a test that can be scored accurately and reported back to test-takers and institutions
efficiently. TOEFL is known as having the most straightforward scoring procedures. Scores are
calculated and reported for three sections of the TOEFL (the essay ratings are combined with the
Structure and Written Expression score). The total score (range 40 to 300 on the computer-based
TOEFL and 310 to 677 on the paper-pencil TOEFL) is also calculated and reprted. A separate score
for the essay (range 0 to 6) is also provided on the examinee’s score record.
6. Perform ongoing construct validation studies. The last step to develop a standardized test is to
perform a systematic periodic validation of its effectiveness. ETS, as the producer of TOEFL, also
has sponsored many impressive programs of research, including examining the content
characteristics of the TOEFL from a communicative perspective.

Discussion
1. Following the steps of developing classroom tests, can you make your own English test for the
first grade of junior high school? Do it in group!
2. In pairs or in small groups, compile a brief list of pros and cons of standardized testing!
3. Tell the class about the worst test experience you’ve ever had. Briefly, analyze what made the
experience so unbearable?

Вам также может понравиться