Академический Документы
Профессиональный Документы
Культура Документы
SUMMATIVE TEST
(Teacher made test for the 1st semester of the 1st Class at SMPN 4 Jambi 2004/2005
School Years)
I. Speaking Test
It is held on Monday, December 27 th, 2004. Teacher had divided the students into
groups. Then teacher asked each group to make conversation about “Introduction”
and performed it in front of the class. Speaking Test for Class 1 A and 1 B are joined
together.
A. Listen to the song and find words about job! Put the job words in their right places!
PENNY LANE
The Beatles
In Penny Lane there is a (1)_______________ showing photographs,
of every head he’s had the pleasure to know.
And all the people that come and go,
stop and say hello.
92
Penny Lane is in my ears and in my eyes.
There, beneath the blue, suburban skies,
I sit, and meanwhile back.
Penny Lane is in my ears and in my eyes.
There, beneath the blue, suburban skies,
Penny Lane.
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
____________________________________
___________________________________ .
DAVID BECKHAM
Teachers room R. 3A R. 3B R. 2B
X : Excuse me, where is the Language Lab please?
Y : It is __________________________
a. Beside the Grade 1B room. b. In front of music room
c. Beside the library d. Between the hall and library
93
6. Andrew never _____________ about his schedule.
a. think b. thinks
c. thought d. is thought
7. How __________ Anna and July ___________ to school?
a. do, go b. does, go
c. are, go d. is, go
8. A living room is __________________
a. a place to wash b. a place to grow flower
c. a place to cook d. a place to relax to talk
9. A dining room is __________________
a. a place to eat b. a place to cook
c. a place to keep a car d. a place to sleep
10. We need ______________ sugar to make ________________ of coffee.
a. a loaf, a glass b. a spoon, a plate
c. a loaf, a plate d. a spoon, a cup
Answer the questions based on the following text for questions number 11 – 15!
Tania’s Schedule
Good Luck
94
Appendix
RELIABILITY AND VALIDITY TEST
(BACHMAN AND PALMER (1996) FRAMEWORK OF EVALUATION)
The Bachman and Palmer 1996 framework evaluation of test usefulness will be
used to evaluate the Teacher English made test for the 1st semester of the 1st class
at SMPN 4 Jambi 2004/2005 School Years. The questions for logical evaluation of
usefulness as posed by Bachman and Palmer will be identified in italics.
RELIABILITY
1) To what extent do characteristics of the test setting vary from one
administration of the test to another?
All students take the tests without air-conditioning, comfortable classrooms
with minimal background noise. All students did the speaking test in
December 27th, 2004. Written test is held in December 28th, 2004.
95
The input does not vary. But the responses are varies, they are conversation,
finding a missing words through listening to the song, writing a descriptive
paragraph, and multiple choices.
VALIDITY
6) Is the language ability construct for this test clear and unambiguously
defined?
Yes, it is. There are reading, grammars, vocabulary, and conversational
components, but they are a few written questions, which is not a valid
interpretation of English ability at all.
7) Is the language ability construct for the test relevant to the purpose of the
test?
The purpose of the semester test of the 1 st class at SMPN 4 Jambi 2004/2005
School Years is to rank students and to set a pass level to proceed to 2nd class.
Students are ranked and compared among classes. There is speaking, listening,
and writing component to the test. A multiple choice test is going to test their
reading, grammar, and vocabulary ability. It is active and passive test
construct.
8) To what extent does the test taker reflect the construct definition?
The speaking test reflects conversational ability. The written test reflects
listening ability, writing ability, reading ability, and grammatical ability.
10) Will the scores obtained from the test help us to make the desired
interpretations about the test takers' language ability?
It is active and passive test. Although it is just a few questions, but the
listening, speaking, and writing ability is tested. The results can help in
interpreting learners' English ability.
11) What characteristics of the test setting are likely to cause different test takers
to perform differently?
All students complete answers on a folio paper. All conditions are the same.
This aspect is satisfactory.
12) What characteristics of the test rubric are likely to cause different test takers to
perform differently?
96
The instructions for individual questions are in English and are basic in
structure. But there is no instruction can cause test takers to perform
differently.
13) What characteristics of the test input are likely to cause different test takers to
perform differently?
Incorrect grammar, words, spelling, and punctuation are input that can cause
problems with test takers. But this is not occurs in these tests. Incorrect
grammar is just found in questions 10 part C. Incorrect test questions
invalidate the test.
14) What characteristics of the expected response are likely to cause different test
takers to perform differently?
Once again, having several possible correct answers in the question, whilst
only marking one correct, can cause major problems with test takers.
Response for speaking and writing test of each student are different according
to their competence. These answers are scored by using rubric scoring. For
listening and multiple choice questions are having one possible correct
answer.
15) What characteristics of the relationship between input and response are likely
to cause different test takers to perform differently?
The test construct included Indonesian boys' and girls' names which does not
cause problems when trying to identify gender or subject. This is quite good.
There is no question that has cultural bias. The questions have input
commonly known to Indonesians.
AUTHENTICITY
16) To what extent does the description of tasks in the TLU [target language use]
domain include information about the setting input, expected response and
relationship between input and response?
The setting is included in some reading questions, but absent on grammatical
and vocabulary questions. Question 7 epitomizes the confusion over expected
response and input. The expected response by the test constructor is "a)
which" [photographs]. However, "b) what" and even "c) whose" is
grammatically acceptable. The expected response must match the input given
in an authentic valid test. Kehoe (1995, para. 3) states, "As a rule one is
concerned with writing stems that are clear and parsimonious, answers that are
unequivocal and chosen by the students who do best on the test.." Question 28
lacks input for the response. There is no specific information to base the
answer of "b". In essence, the TLU needs more contextual support and only
one correct response. Question 42 relies heavily on knowledge of an
Indonesian folk tale. Without knowledge of the folk tale, construction of the
paragraph could differ from the expected response. There is a very small
minority of students in Indonesia who do not know this folk tale, for example,
foreign nationals sitting the tests.
97
17. To what extent do the characteristics of the test task correspond to those of
the TLU tasks?
Conversational items tested as multiple choice items are far from authentic!
Thus, question 53 in reality could be a, b, c, or d depending on context. There
are also many grammatical mistakes making the test non authentic.
INTERACTIVENESS
18. To what extent does the task presuppose the appropriate area or level of
topical knowledge, and to what extent can we expect the test takers to have
this area or level of topical knowledge?
As previously mentioned question 42 presupposes topical knowledge of an
Indonesian folk tale. Overall, the topical knowledge is appropriate for 14 to 15
year old students, for example, the areas of media, sickness, and sport. There
is generally no great need to use specific topical knowledge to answer
questions.
19. To what extent are the personal characteristics of the test takers included in
the design statement?
The design is for final year students of junior high schools in Indonesia in the
Jakarta district. It is assumed all test takers are Indonesian, aged 14 to 15, and
all speak Indonesian. All have done the pre-UAN test and it is assumed all
have completed nine years of formal schooling. It is to be noted that there is a
tiny minority of foreign nationals who also sit the test, but the government
assumes and expects that they get schooled in international schools.
20. To what extent are the characteristics of the test tasks suitable for test
takers with the specified personal characteristics?
The test tasks are very much appropriate for the average and lower ability
students. In this regard the test is suitable, but it fails to take into account the
higher ability students to which the tasks are at a functional level far below
their ability. That is, some students who achieve excellent results in native
speaking English tests, for the same educational level in English, are tested on
tasks that do not challenge nor address their level of ability. Year nine students
at the school where I teach have performed well above average for the 2001
year 9/10 Australian English schools' competition test items, with one student
scoring 100%. However many students from Indonesia fail the UAN test.
Thus there is a big range of ability, whereas the test tasks do not cover the
whole range.
21. Does the processing required in a test task involve a very narrow range of
areas of language knowledge?
As discussed previously, the tasks engage a very limited range of language
knowledge.
98
22. What language functions, other that the simple demonstration of language
ability, are involved in processing the input and formulating a response?
None.
25. Is this test likely to evoke an affective response that would make it
relatively easy or difficult for the test takers to perform at their best?
No. The topics are culturally sensitive and non-emotive.
IMPACT
26. To what extent might the experience of taking the test or the feedback
received affect characteristics of test takers that relate to language use?
This test is passive and the language to be tested is done so testing only
understanding, neglecting higher skills such as processing, comparing,
debating and even production of language. Hughes (2003, p. 1) claims, "If the
skill of writing, for example, is tested only by multiple choice items then there
is great pressure to practise such items rather than practise the skill of writing
itself. This is clearly undesirable." The UAN test aims to test grammar, but
students are not required to construct any sentences. The students are to learn
conversational conventions, but not tested orally. Research by Hadiatmaja,
cited by Somantri (2003, para. 6) observes that Indonesian school students
learning English "are passive and receptive only [translation]." Thus the
backwash effect of the UAN tests can be seen in students' passive and
receptive skill focus with problems in construction of discourse in speaking
and writing.
27. What provisions are there for involving test takers directly, or for
collecting and utilizing feedback from test takers directly, or for collecting and
utilizing feedback from test takers and the design and development of the
tests?
There are no known provisions. Students do not have the opportunity to
provide any feedback or have any input into the development of the test.
28. How relevant, complete, and meaningful is the feedback that is provided
to test takers?
Correct answers, and students' responses are given showing their mistakes. A
99
final score and school ranking is also given. There are just statistics and
students are not given any explanation to why test items are correct. No
information is given on their language ability or mastery of subject matter. It is
difficult for the individual teacher to provide good feedback due to the amount
of alternative correct answers.
29. Are decision procedures and criteria applied uniformly to all groups of test
takers?
Yes. All schools follow the same criteria of the UAN score and scores are
objective, independent on participation, attendance, attitude or other factors.
30. How relevant and appropriate are the test scores to the decisions to be
made?
The test score is the single factor in determining the grade and to determine if
the student can proceed to senior high school.
31. Are test takers fully informed about the procedures and criteria that will be
used in making decisions?
32. Are these procedures and criteria actually followed in making the
decisions?
Yes. There are no exceptions, though those who fail may sit for the test again.
33. How consistent are the areas of language ability to be measured with those
that are included in teaching materials?
The teachings materials of teachers usually match the language ability to be
measured, as is the case in the majority of schools. However, schools such as
my school do not follow the national curriculum per se and go way beyond
including active skills and including listening, speaking, and writing skills, in
addition to the reading and grammar of the national curriculum. These
schools, the teachers, and students feel uncomfortable with the test as it does
not meet their learning content nor does it test most of their ability.
34. How consistent are the characteristics of the test and test tasks with the
characteristics of teaching and learning activities?
This is dependent on the individual teacher. Due to the passive nature of the
tests, a lot of students learn English in a passive manner and as a result
Artsiyanti (2002, para. 6) claims, "Students do not know when structures
[grammar] have to be used and how to apply them in everyday life
[translation]." The test tasks contribute to a negative backwash effect in the
classrooms.
35. How consistent is the purpose of the test with the values and goals of
teachers and of the instructional program?
The test is far from achieving the goals of English at IPEKA. Due to its
100
limitations in passive receptive skills it is also not consistent with goals of
other schools' English courses, even though it is consistent with the national
curriculum.
36. Are the interpretations we make of the test scores consistent with the
values and goals of society and the education system?
If wages are a reflection on worth, society does not value the worth of teachers
in Indonesia in comparison to western countries. The average wage of a
teacher is Rp 700,000 to Rp 800,000 (just over $100 AUD) a month (Sistem
pendidikan harus dirombak secara radikal, 2004). Schools are often
dilapidated and some students cannot afford their tuition. Many language
teachers do not have adequate mastery of English to teach effectively and
efficiently in schools in Indonesia. Somehow test scores are regarded as highly
valid and respected by most as the major measure of performance in English
and as a means to determine the academic progression of students to the next
level.
There are more pressing concerns here of terrorism, hunger, and work. The
acceptance by society and the educational system of the test scores should not
equate with the usefulness of the test. The UAN needs reform!
37. To what extent to the values and goals of the test developer coincide or
conflict with those of society and the education system?
There is agreement with the education system and most of society.
38. What are the potential consequences, both positive and negative, for
society and the education system, of using a test in this particular way?
The backwash effect contributes to passive learners and English speakers not
confident in production of English, of which is the case in Indonesia today.
39. What is the most desirable positive consequence, or the best thing that
could happen as a result of using the test in this particular way, and how likely
is this is happen?
The test could act as a motivating factor for some in mastering passive
English. This is still not likely.
40. What is the least desirable negative consequence, or the worst thing that
could happen as a result of using the test in this particular way, and how likely
is this to happen?
As mentioned previously, many students will learn an understanding of
reading and grammar in a passive and receptive manner without learning
active skills and to the exclusion of speaking, listening, and writing. This is
highly likely as it is already evidenced throughout the country.
PRACTICALITY
41. What type and relative amounts of resources are required for: (a.) the
101
design stage, (b.) the operationalization stage and, (c.) the administration
stage?
There is not much money available for the UAN tests, nor time, nor expertise.
The design is done by a few local English teachers with no resources provided
by the government, apart from the syllabus and test construction design. The
operation is done by a central team by Scanton computer marking. The
administration of the tests is by individual schools.
42. What resources will be available for carrying out a. b. and c. above?
Teachers, computers, printers and paper are available. Resources are very
limited in Indonesia due to its massive student population and limited budget.
CONCLUSION
The UAN is not very useful. It is not valid, authentic nor interactive and has
negative impacts on learning. It is however, reasonably reliable and practical.
All in all the UAN fails to be useful because of its test construction which is
riddled with mistakes and contains many alternative multiple choice answers
that are correct. Hughes (2003, p.2) claims, "Students' true abilities are not
always reflected in the test scores that they obtain." This is the case with the
UAN test.
102