Академический Документы
Профессиональный Документы
Культура Документы
This chapter discusses several ways of testing language skills and components. However, it will be
preceded by some explanation about various test techniques.
It is the test-takers’ task to identify the correct or most appropriate option (in this case B).
Multiple choice test technique has some advantages. The most obvious advantage is that scoring can be
perfectly reliable. Scoring should also be rapid and economical. A further considerable advantage is that
it is possible to include more items than other forms of tests since the test-takers have only to make a
mark on the paper.
Despite the advantages, multiple choice test technique has also some limitations. This technique
tests only recognition knowledge. It cannot give an accurate picture of test-takers’ performance. A
multiple choice grammar test score, for example, may be poor indicator of someone’s ability to use
grammatical structure. The person who can identify the correct response in the item above may not be
able to produce the correct form when speaking or writing. Therefore, the construct validity of such
technique is questionable.
Besides, multiple choice test technique gives the test-takers a chance of guessing the correct
answer. It will not be known what part of any particular individual’s score has come about through
guessing.
Writing successful items for multiple choice tests is also extremely difficult. Hughes (1989:61)
provides some commonest problems in multiple choice tests. Among others are that there are more than
one correct answer, there is no correct answer, there are clues in the options as to which is correct, and
the distractors are ineffective.
Practice language using multiple choice items will not be the best way for the students to improve
their command of a language since usually much attention is paid to improving one’s guessing rather
than to the content of the items. Hughes(1986:61) consider multiple choice tests as having harmful
backwash.
Finally, multiple choice tests technique is said to facilitate students’ cheating because the
responses (a, b, c, d) are so simple that can make them communicate easily to others nonverbally.
All in all, the multiple choice technique is best suited to relatively infrequent testing of large numbers of
test-takers. In order to make effective and good items in multiple choice test, Djiwandono (2008:47)
suggests the test developer to be careful in formulating the stem, and the correct answer and the
distractors. The stem should be in the form of complete sentence whenever possible. In order to avoid
students’ guessing, it is important to have identical options in terms of form, content, and length. Having
identical options will force the students to think critically.
Matching Test
Matching test require the students to match two parts of a test. The two parts are usually
interrelated in terms of meaning or content. Usually, the two parts are in the form of list. The first list
usually consists of some statements or questions, while the second consists of responses. To make
matching test effective, the number of responses should be more than the statements. This is meant to
make the students think critically until the last questions.
The supposed advantages of C-Test over the more traditional one are that only exact scoring is
necessary and that shorter passages are possible. Possible disadvantage of C-Test is that it is harder to
read than the cloze procedure
Dictation is a testing technique in which the passage is read aloud to students, with pauses during
which they have to write down what they heard as accurately as possible (Richard et al, 1992). Dictation
test gives results similar to those obtained from cloze test. In predicting overall ability, it has the advantage
of involving listening ability. It is also easy to create, relatively easy to administer, but is not certainly easy
to score. Because of scoring problem, partial dictation may be considered as an alternative. In this, part
of what is dictated is already printed on the answer sheet. The test-takers are simply to fill in the gaps
and the scoring is likely to be more reliable.
Testing Listening
It may seem rather odd to test listening separately from speaking, since the two skills are typically
exercised together in oral interaction. However, there are occasions, such as listening to the radio,
listening to lectures, or listening to announcement, when no speaking is required.
The testing of listening involves listening macro-skills and micro-skills. The macro-skills of
listening include listening for specific information, obtaining gist of what is being said, following directions,
and following instructions. The micro-skills of listening include interpretation of intonation patterns and
recognition of function of structures. At the lowest level are abilities like being able to distinguish between
phonemes (for example between /w/ and /v/).
There are several types of texts that can be used for listening test such as monologue, dialogue,
or multi-participant. Those types can be in the forms of announcement, talk or lecture, instructions,
directions, etc.
The source of listening test materials can be recordings of radio broadcasts, television broadcast,
teaching materials, or even recording of native speakers made by ourselves. The most important thing
to consider is that recordings must be good and natural.
There are some techniques that are possibly used in testing listening. They are multiple choice, short
answer, information transfer, note taking, partial dictation and recordings and live presentations.
Multiple choice The technique has some advantages and disadvantages as discussed
previously. For listening test, the problem is greater because the test-takers should listen to a
passage while reading the alternatives/options. Therefore, the options must be short and simple.
Information transfer This technique is useful in testing listening since it makes minimal demands
on productive skills. It can involve such activities as the labeling of diagrams or pictures,
completing forms, or showing routes on a map.
Note taking Where the ability to take notes while listening to a lecture is in question, this activity
can be quite realistically replicated in the testing situation. Test-takers take notes during the talk,
and only after the talk is finished do they see the items to which they have to respond.
Partial dictation Although partial dictation may not be an authentic listening activity, it may be
possible to administer it when no other test of listening is practical.
Recording and live presentation The great advantage of using recordings when administering a
listening test is that there is uniformity in what is presented to the test-takers. This is fine if the
recording is to be listened in a well-maintained language laboratory or in a room with good
acoustic qualities and with suitable equipment. If these conditions cannot be obtained, then a live
presentation is preferred.
Testing Speaking
The objective of teaching speaking is the development of the ability to interact successfully in
that language and therefore, speaking involves comprehension as well as production. Consequently,
testing speaking should enable the students to elicit the behavior which truly represents their ability and
which can be scored validly and reliably.
The materials tested for speaking test include dialog and multi-participant interactions including
operations of language functions such as:
- Expressing: thanks, requirements, opinions, comment, attitude, confirmation, apology,
want/need, information, complaints, reasons/justifications.
- Narrating: sequence of events
- Eliciting: information, directions, service, clarification, help, permission.
- Directing: ordering, instructing, persuading, advising, warning
- Reporting: description, comment, decisions.
There are several formats that can be used to test speaking ability. They are interview, interaction
with peers, and response to tape-recordings. Each format has some techniques.
Interview It is the most obvious format for testing speaking.
Questions and request for information. For questions and request, yes/no questions should be
avoided.
Pictures can also be used to elicit descriptions. Series of pictures (or video sequences) form a
natural basis for narration.
Interaction with peers Two or more test-takers may be asked to discuss a topic, make plans, and
so on.
Role play Students can be asked to assume a role in a particular situation and the tester can act
as an observer.
Response to tape-recordings Uniformity of elicitation procedures can be achieved through
presenting the students only with the same audio- (video-) tape recordings.
Imitation The test-takers hear a series of sentences, each of which they have repeat in turn.
Scoring will be valid and reliable only if clearly recognizable and appropriate descriptions of criteria
levels are written and scorers are trained to use them. Description of speaking proficiency usually deals
with accent, grammar, vocabulary, fluency and comprehension as in the following examples taken from
Hughes (1989).
Proficiency Description
Accent
1. Pronunciation frequently unintelligible
2. Frequent gross errors and a very heavy accent make understanding difficult, require frequent
repetition.
3. Foreign accent requires concentrated listening, and mispronunciations lead to occasional
misunderstanding and apparent errors in grammar or vocabulary.
4. Marked foreign accent and occasional mispronunciations which do not interfere with
understanding.
5. No conspicuous mispronunciation, but would not be taken for a native speaker.
6. Native pronunciation, with no trace of foreign accent.
Grammar
1. Grammar almost entirely inaccurate phrases.
2. Constant errors showing control of very few major patterns and frequently preventing
communication
3. Frequent errors showing some major patterns uncontrolled and causing occasional irritation and
misunderstanding
4. Occasional error showing imperfect control of some patterns but no weakness that cause
misunderstanding.
5. Few errors, with no patters of failure.
6. No more than two errors during the interview.
Vocabulary
1. Vocabulary inadequate for even the simplest conversation.
2. Vocabulary limited to basic personal and survival areas (time, food, transportation, family, etc.)
3. Choice of words sometimes inaccurate, limitations of vocabulary prevent discussion of some
common professional and social topics
4. Professional vocabulary adequate to discuss special interests; general vocabulary permits
discussion of any non-technical subject with some circumlocutions.
5. Professional vocabulary broad and precise; general vocabulary adequate to cope with complex
practical problems and varied situations.
6. Vocabulary apparently as accurate and extensive as that of an educated native speaker.
Fluency
1. Speech is so halting and fragmentary that conversation is virtually impossible.
2. Speech is very slow and uneven except for short or routine sentences.
3. Speech is frequently hesitant and jerky; sentences may be left uncompleted
4. Speech is occasionally hesitant, with some unevenness caused by rephrasing and groping for
words.
5. Speech is effortless and smooth, but perceptibly non-native in speech and evenness.
6. Speech on all professional and general topics as effortless and smooth as a native speaker’s.
Comprehension
1. Understands too little for the simplest type of conversation.
2. Understands only slow, very simple speech on common social and touristic topics; requires
constant repetition and rephrasing.
3. Understands careful, somewhat simplified speech when engaged in a dialogue , but may require
considerable repetition and rephrasing.
4. Understands quite well normal educated speech when engaged in a dialogue, but requires
occasional repetition or rephrasing.
5. Understands everything in normal educated conversation except for very colloquial or low-
frequency items, or exceptionally rapid or slurred speech.
6. Understands everything in both formal and colloquial speech to be expected of an educated
native speaker.
Besides using clear descriptions of criteria levels, the use of more than one scorer will decrease
the subjectivity as described earlier. If two testers are involved in an interview, then they can
independently assess each test-taker. If they disagree, even after discussion, then a third scorer may be
referred to.
Testing Reading
Similar to listening skill, reading skill is a receptive skill. The task of language tester is, then, to
set reading tasks which will result in behavior that will demonstrate their successful completion.
The reading macro-skills (directly related to course objectives) are scanning text to locate specific
information, skimming text to obtain general idea, identifying stages of argument, and identifying
examples presented in support of an argument. The micro-skill underlying reading skills are identifying
referents of pronouns, using context to guess meaning of unfamiliar words, and understanding relations
between parts of text.
The reading texts can be taken from textbooks, novel, newspaper, magazine, academic journal,
letter, timetable, etc. The texts can be in the forms of newspaper report, advertisement, editorial, etc.
The techniques that might be used to test reading skills are multiple choice, true/false,
completion, short answer, guided short answer, summary cloze, information transfer, identifying order of
events, identifying referents, guessing the meaning of unfamiliar words from context.
Multiple Choice The test-takers provide evidence of successful reading by marking a mark
against one out of a number of alternatives. Its strengths and weaknesses have been presented
earlier.
True/false The test-takers should respond to a statement by choosing one of the two choices,
true or false.
Completion The students are required to complete a sentence with a single word, for example:
……………was the man responsible for the first steam railway.
Short answer It is in the form of questions and requires the students to answer briefly, for
example:
According to the author, what does the increase in divorce rates show about people’s expectations of
marriage?
Guided short answer This is the alternative of short answer in which students are guided to have
the intended answer. They have to complete sentences presented to them, for example:
Complete the following based on the fourth paragraph!
‘Many universities in Europe used to insist that their students speak and write only ………………… Now
many of them accept ………………….. as an alternative, but not a ………………. of the two.
Summary cloze A reading passage is summarized by the tester, and then gaps are left in the
summary for completion by the test-takers. This is really the extension of the guided short
answer.
Information transfer One way to minimize demands on writing by test-takers is to require them to
show successful completion of a reading task by supplying simple information in a table, following
a route on a map, labeling a picture, and so on.
Identifying order of events, topics, or arguments The test-takers can be required to number the
events etc.
Identifying referents One of the ‘microskills’ listed previously was the ability to identify referents.
An example of an item to test this is:
What does the word ‘it’ (line 25) refer to? ……………………
Guessing the meaning of unfamiliar words from context This is another microskill mentioned
earlier. Items may take the form:
Find a single word in the passage (between lines 1 and 25) which has the same meaning as ‘making of
laws’.
The above techniques are among the many techniques of testing reading. In scoring the reading test,
Hughes (1989) suggested that errors of grammar, spelling or punctuation should not be penalized, as
long as it is clear that the test-taker has successfully performed the reading task which the item set. The
function of a reading test is to test reading ability. To test productive skills at the same time simply makes
the measurement of reading ability less accurate.
Testing Writing
The best way to test people’s writing ability is to get them to write directly. Therefore, indirect
testing of writing ability cannot be possibly constructed as accurately as possible even by professional
testing institutions. There are three things that we should consider to develop a good test for writing:
2. The tasks should elicit samples of writing which truly represent the students’ ability.
There are at least two things we can do to obtain the sample that properly represent each
student’s ability. The first one is to set as many tasks as is feasible. The reasons for this are because
students’ performance on the same task is not consistent. And they sometimes are better at some tasks
than others. Therefore, giving many different tasks will enable the test developer to see the students’
performance as objectively as possible.
The second way to elicit students’ writing ability is by testing only writing ability. Another ability
which at times interferes with the accurate measurement of writing is reading. It is acceptable to expect
students to be able to read simple instructions, but asking the students to read very difficult and long
instruction in writing test should be avoided. It will prevent the students to perform adequately on writing
task. To reduce students’ dependence on the students’ ability to read is to make use of illustrations in the
forms of diagram, a series of pictures or graphs.
Holistic Scoring:
5 The main idea is stated very clearly, and there is a clear statement of change of opinion. The essay
is well organized and coherent. The choice of vocabulary is excellent. There are no major or minor
grammatical errors. Spelling and punctuation are fine.
4 The main idea is fairly clear, and change of opinion is evident. The essay is moderately well organized
and is relatively coherent. The vocabulary is good, and there are only minor grammatical errors.
There are few spelling and punctuation errors.
3 The main idea and a change of opinion are indicated but not so clearly. The essay is not well organized
and is somewhat lacking in coherence. The vocabulary is fair, and there are some major and minor
grammatical errors. There are a fair number of spelling and punctuation errors.
2 The main idea and change of opinion are hard to identify in the essay. The essay is poorly organized
and relatively incoherent. The use of vocabulary is weak, and grammatical errors appear frequently.
Spelling and punctuation errors are frequent.
1 The main idea and change of opinion are absent in the essay. The essay is poorly organized and
generally incoherent. The use of vocabulary is very weak, and grammatical errors appear very
frequently. Spelling and punctuation errors are very frequent.
Method of scoring which require a separate score for each of a number of aspects of a writing task is said
to be analytic. The following is an example of analytic scoring provided by Cohen (1994:328-329)
Analytic Scoring:
Content
5 – Excellent : main ideas stated clearly and accurately, change of opinion very clear
4 – Good : main ideas stated fairly clearly and accurately, change of opinion relatively clear
3 – Average : main ideas somewhat unclear and inaccurate, change of opinion somewhat weak
2 – Poor : main ideas not clear or accurate, change of opinion weak
1 – Very Poor : main ideas not at all clear or accurate, change of opinion very weak
Organization
5 – Excellent : well organized and perfectly coherent
4 – Good : fairly well organized and generally coherent
3 – Average : loosely organized but main ideas clear, logical but incomplete sequencing
2 – Poor : ideas disconnected, lacks logical sequencing
1 – Very poor : no organization, incoherent
Vocabulary
5 – Excellent : very effective choice of words and use of idioms and word forms
4 – Good : effective choice of words and use of idioms and word forms
3 – Average : adequate choice of words but some misuse of vocabulary, idioms and word forms
2 – Poor : limited range, confused use of words, idioms, and word forms
1 – Very Poor : very limited range, very poor knowledge of words, idioms, and word forms
Grammar
5 – Excellent : no errors, full control of complex structure
4 – Good : almost no errors, good control of structure
3 – Average : some errors, fair control of structure
2 – Poor : many errors, poor control of structure
1 – Vary Poor : dominated by errors, no control of structure
Mechanics
5 – Excellent : mastery of spelling and punctuation
4 – Good : few errors in spelling and punctuation
3 – Average : fair number of spelling and punctuation errors
2 – Poor : Frequent errors in spelling and punctuation
1 – Very poor : no control over spelling and punctuation
The choice between holistic and analytic scoring depends on the purpose of testing (Hughes,
1989:97). If diagnostic information is required, then analytic scoring is essential. If the scoring is carried
out by a small group of people, then holistic scoring may be appropriate. Analytic scoring is used when
scoring is conducted by heterogeneous, less well-trained people or in a number of different places.
However, whichever is used, multiple scoring involving two or more scorers is suggested.
Testing Grammar
The place of grammar in language teaching is sometimes debatable. Some may think that
control of grammatical structure was seen as the core of language ability and it would have been
unthinkable not to test it. For that reason, most proficiency tests include a grammar section besides the
reason of its ease with which large numbers of items can be administered and scored within a short
period of time.
On contrast, others see that one cannot accurately predict mastery of grammar by measuring
control of what we believe to be the abilities that underlie it. Besides, the backwash effect of grammar
test may encourage the learning of grammatical structures in isolation, with no apparent need to use
them. Therefore, consideration of this kind has resulted in the absence of grammar components in
some well-known proficiency tests.
However, whether or not grammar has an important place in an institution’s teaching, it has to
be accepted that grammatical ability has an important influence on someone’s performance. The
successful writing of academic writing, for example, must depend to some extent on command of some
elementary grammatical structures. Therefore, it can be said that there is still room for grammar
component in a language test.
The specification of grammar test should be in line with the teaching syllabus if the syllabus lists
the grammatical structures to be taught. When there is no such list, it becomes necessary to infer from
textbooks or other teaching materials.
There are some techniques that can be used to test grammar. Multiple choice is one alternative.
However, it is not suggested for its difficulty in finding appropriate distractors. The other proposed
techniques are paraphrase, completion, and modified cloze.
Paraphrase This technique requires the students to write a sentence equivalent in meaning to
one that is given. It is helpful to give part of the paraphrase in order to restrict the students to the
grammatical structures being tested. An example of testing passive past continuous form would
be:
When we arrived, a policeman was questioning the bank clerk.
When we arrived, the bank clerk ……………………………..
Completion This technique can be used to test variety of structures. The following is an example
of testing interrogative forms:
In the following conversation, some sentences have been left incomplete. Complete them
suitably. Read the whole conversation before you begin to answer the question.
(Mr. Cole wants a job in Mr. Gilbert’s export business. He has come from an interview.)
Mr. Gilbert: Good morning, Mr. Cole. Please come in and sit down. Now let me see. (1)
Which school ……………………………………………………….?
Mr. Cole: Whitestone College
Mr Gilbert: (2) And when …………………………………………………………...?
Mr. Cole: In 1999, at the end of the summer term.
Mr. Gilbert: (3) And since then, what ……………………………………………….?
Mr. Cole: I worked in a bank for a year. Then I took my present job, selling cars. But I
would like a change now.
Mr. Gilbert: (4) Well, what sort of a job ……………………………………………?
Mr. Cole: I’d really like to work in your Export Department.
Modified cloze This technique can be in the form of the following example of testing articles:
In the scoring process, the scorer should only score what the item is testing, not something else.
For instance, when the focus is to test pronoun, the error on a missing third person -s should not be
penalized. Finally, for valid and reliable scoring of grammar items, careful preparation of the scoring key
is necessary.
Testing Vocabulary
The debate on testing vocabulary is equal to the testing of vocabulary. Clearly, knowledge of
vocabulary is essential to the development and demonstration of linguistic skills. But according to some
people, that does not mean that it should be tested separately.
On the other hand, some argue that some time should devoted to the regular, conscious teaching
of vocabulary, Thus, it is important to test vocabulary as an achievement test of vocabulary after teaching.
The specification for vocabulary achievement test should be based on all items presented to the
students in vocabulary teaching. When placement test is applied, the vocabulary being tested should
refer to one of common published word lists.
The testing of vocabulary productively is so difficult. Information on receptive ability is regarded as
sufficient. The following techniques are suggested only for possible use in achievement test.
Pictures The use of pictures can limit the students to lexical items that we have in mind. Some
pictures are provided and the students are required to write down the names of the objects. This
method of testing vocabulary is obviously restricted to concrete nouns which can be drawn.
Definitions This may work for a range of lexical items. The following is an example of such test.
A …… is a person who looks after our teeth.
……… is frozen water.
……… is the second month of the year.
But not all items can be identified using a definition. Nor can all words be defined entirely in words
more common or simpler than themselves.
Gap filling This can take the form of one or more sentences with a single word missing.
Because of the snow, the football match was ….. until the following week.
I ….. to have to tell you this, Mrs. Jones, but your husband has had an accident
To avoid various answers, the first letter of the word or even the indication of the number of
letters can be given
Testing Pronunciation
Heaton (1990) includes pronunciation into testing speaking skill. There are at least three
techniques of testing pronunciation. They are pronouncing words in isolation, pronouncing words in
sentences, and reading aloud.
Pronouncing words in isolation The importance of listening in almost all tests of speaking,
especially those of pronunciation, should never be underestimated. It is impossible for students
to pronounce words correctly unless they first hear and recognize the precise sound of that word.
In the early stages of learning English, it is useful to base our pronunciation tests on minimal
pairs: that is, pairs of words which differ only in one sound, for example:
Reading aloud Reading aloud can offer a useful way of testing pronunciation provided that we
give a student a few minutes to look at the reading text first. When we choose suitable texts to
read aloud, it is useful to imagine actual situations when someone may read something aloud.
For example, people read aloud news on TV, letters, or instructions.
Discussion
1. Having cloze test passage in the chapter, complete it and say what you think each item is testing?
2. Discuss when and how multiple choice tests can be used appropriately in an English classroom?
3. What advantages can we have in testing language proficiency by using dictation?
4. Design a test that requires the test-takers to draw (complete) simple picture after listening to an
instruction!
5. Can reading aloud be included into one technique of testing reading ability?
6. Do you think grammar should be tested separately?
7. What do you think the best way of testing writing ability?
CHAPTER V
DESIGNING CLASSROOM TESTS AND STANDARDIZED TESTS
This chapter provides the teachers with step-by-step procedures in designing classroom tests
and standardized tests. Most of the explanation is summarized from Brown (2004).
Table 5.1 Example of Selected Objectives for a unit in a low-intermediate integrated-skills course
(Brown, 2004:50)
In reviewing the objectives of a unit, we cannot possibly test each one. We will then need to
choose a possible subset of the objectives to test.
Drawing Up Test Specifications
Test specifications for classroom use can be a simple and practical outline of our test. Test
specifications will simply comprise (a) a broad outline of the test, (b) the skills being tested, and (c) test
items. The following test specification based on the objectives in Table 5.1 will provide a clearer
description of how to make it.
Now, we are ready to draft other test items. To provide a sense of authenticity and interest, we
make the items based on the context of a recent TV sitcom that we have used in class. The sitcom
described a loud, noisy party with lots of small talk. Finally, we have the following samples of test items
for each section.
Writing
Directions: Write a paragraph about what you liked or didn’t like about one of the characters
at the party in the TV sitcom we saw.
However, the above test items are quite traditional. It should be admitted that the format of some
of the items is unnatural, thus lowering the level of authenticity. Therefore, the above test items need to
be revised.
In revising our draft, we need to ask some important questions:
1. Are the directions to each section absolutely clear?
2. Is there an example item for each section?
3. Does each item measure a specified objective?
4. Is each item stated in clear, simple language?
5. Does each multiple choice item have appropriate distractors?
6. Is the difficulty of each item appropriate for your stuff\dents?
7. Is the language of each item sufficiently authentic?
8. Do the sum of the items and the test as a whole adequately reflect the learning objective?
In the current example that we have been analyzing, our revising process is likely to result in at least
four changes or additions:
1. In both interview and writing sections, you recognize that a scoring rubric will be essential. For
the interview we decide to create a holistic scale and and for the writing section we devise a
simple analytic scale that captures only the objectives you have focused on (see the previous
chapter).
2. In the interview questions, you realize that follow-up questions may be needed for students who
give one-word or very short answers.
3. In the listening section, part, you intend choice “c” as the correct answer, but we realize that
choice “d” is also acceptable. We need an answer that is unambiguously incorrect. You shorten
it “d. Around eleven o’clock.” We also note that providing the prompts for this section on audio
recording will be logistically difficult, and so we opt to read these items to your students.
4. In the writing prompt, we can see how some students would not use the words so or because,
which were in our objectives, so we re-word the prompt: “Name one of the characters at the
party in the TV sitcom we saw. Then use the word so at least once and the word because
at least once to tell why you liked or didn’t like that person.”
Ideally, we have to try out all your tests on the students not in our class before actually administering
the tests. But, in our daily classroom teaching, the tryout phase is almost impossible. Alternatively, we
could enlist the aid of a colleague to look over our test.
In the final revision of our test, imagine that you are a student taking the test. Go through each
set of directions and all items slowly and deliberately. It’s better if we time ourselves. If the test should be
shortened or lengthened, we should make necessary adjustments. We have to be sure that everything is
good.
B. Standardized Tests
A standardized test presupposes certain standard objectives, or criteria, that are held constant
across one form of the test to another. The criteria in large-scale standardized tests are designed to apply
to a broad hand of competencies that are usually no exclusive to one particular curriculum. A good
standardized test is the product o a thorough process of empirical research and development. It dictates
standard procedures for administration and scoring. And finally, it is typical of a norm-referenced test, the
goal of which is to place test-takers on a continuum across a range scores and to differentiate test-takers
by their relative ranking.
Many people are under the incorrect impression that all standardized tests consist of items
presented in multiple-choice format. While it is true that many standardized tests conform to a multiple-
choice format for its objective standard, multiple-choice is not a prerequisite characteristic of standardized
tests. Human-scored tests standard are also involved in standardized tests as in Test of Spoken English
(TSE) and Test of Written English (TWE) produced by Educational Testing Service (ETS).
Standardized tests have both advantages and disadvantages. The advantages of standardized
testing include a ready-made previously validated product that frees the teacher from having to spend
hours creating a test. Administration to large groups can be accomplished within reasonable time limits.
In the case of multiple-choice formats, the scoring procedures are easy.
The disadvantage of standardized tests centers largely on the inappropriate use of such tests,
for example using an overall proficiency test as an achievement test simply because of the convenience
of standardization. Therefore, the teachers should be careful in using standardized test.
Discussion
1. Following the steps of developing classroom tests, can you make your own English test for the
first grade of junior high school? Do it in group!
2. In pairs or in small groups, compile a brief list of pros and cons of standardized testing!
3. Tell the class about the worst test experience you’ve ever had. Briefly, analyze what made the
experience so unbearable?