Вы находитесь на странице: 1из 12

Lesson Planning and Assessment 684 Lecturer: Paul Mercieca Assessment 3 - Research Task 2 Review Task in Assessment and

Evaluation of an Achievement Test Used in Mac Dinh Chi High School


Introduction In the era of integration, English is becoming the global language. English offers opportunity for better learning, working, improving and developing. More and more countries are adding English as a compulsory subject at schools and colleges. Mac Dinh Chi high school, Vietnam is not an exception. English has been an required subject since the school was set up. Learning has to go with testing. Prodromou says that test is the final stage of a learning process (1995, p. 14). Test is necessary to examine how effective learning is and what need to be adjusted in the course. Test helps diagnose the learning process and so triggers remedial actions. Test is a useful means of assessment and evaluation. Assessment means measurement. According to Bachman (1990, p. 125-126), measurement is the process of quantifying the characteristics of persons according to explicit procedures and rules. Evaluation is defined by Weiss (1972, qtd. by Bachman, 1990, p. 127) as a process of systematically gathering information to make decisions. Test is designed to get specific information from which tester can make inferences about certain features of an individual. (Carroll, 1968, p. 46 qtd. by Bachman, 1990, p. 126). A test may have negative or positive effects on learning called backwash. A student in Mac Dinh Chi high school takes about 8 13 tests a school year on average. They are all achievement tests with limited time: 15 minutes, one period or 60 minutes. In this paper, the researcher examines a test done at the end of semester two, its strengths and weaknesses, its effects on learning and recommendations for improvement.

Statement of the problems The test in question is an achievement test for Vietnamese students, Mac Dinh Chi high school, grade 11 (intermediate level). Students have to do this test at the end of the 2nd semester. The purpose of the test is to see how well students have achieved the objectives of the course, to provide feedback for both the students and the teacher; therefore, suggest
1

necessary adjustments, and to assign each student a final grade. The test is very important to students, so rather stressful. The test has five parts with 36 items. 61% of the test (22/ 36 items) is to test language elements and only 39% (14/ 36 items) of the test tests language skills which is restricted to reading and writing. It is a good thing that this test combines features of the communicative approach, the integrative approach and structural approach. Part III (Reading) has features of communicative approach as it is an incorporate task which tests language in real life. The topics (alternative energy and deforestation) are practical and up-to-date. Two reading passages test students ability to understand a written text and to locate specific details in the text. At the same time, they also test language usage which is grammar and vocabulary used in context. Communicative competence needs a considerable mastery of the grammar of a language. Part III. B. (cloze test) has features of integrative approach as it assesses students ability to use 2 or more skills/ elements simultaneously. It tests reading, guessing, translating, deducing skills, and also structure and vocabulary at the same time. Other parts have features of structural approach as language elements are measured separately. For example, part I tests Phonology, part II (Error identification) and V (Sentence transformation) test Grammar, part IV tests Grammar and Vocabulary separately. The test is highly reliable as 89% of the test employs multiple choice technique. The other 11% is also very objective as there is only one correct answer or a limited set of correct answers. In general, the test meets the requirement of reliability. It can be quickly and easily marked. However, it is monotonous as it uses the same techniques, which may reduce the variety and validity of the test. The test lacks validity. Part V lacks face validity as it appears to test writing ability but actually it checks memorization of structures. It lacks content validity as it does not test two very important skills: Listening and Speaking. Micro skills in reading are not tested either. When testing Grammar, it does not include some important points in semester 2 such as tag questions and cleft sentences. Therefore, the test does not test what it should test, which makes assessment and evaluation inaccurate. The test also lacks authenticity as it does not reflect students needs. It does not include listening and speaking skills. Students learn the language but cannot communicate in that language. Fragment language is tested as it is taken away from the context. Furthermore,

the technique of multiple choice is reliable but not natural in real life. Such qualities of the test lead to some backwash on learning process. The test paper is developed by an individual teacher of the school and checked by another teacher. This has some positive backwash as the teacher understands best her students and what should be tested, so she has an advantage to design a valid and reliable test. However, it may cause some negative backwash because the teacher may ignore or focus less on items which are not tested. The preparation for the exam is emphasized over the learning itself. She may teach in the way that students pass the exam. For example, speaking and listening will receive little attention. Writing seems to restrict to mechanical writing only. Most parts of the test check structure, vocabulary and reading skill, which means they receive more focus in class. Such way of learning and teaching hinders students ability to use the language communicatively. Ss fail to achieve the objectives of the course. The test is stressful and frightening as 32/ 36 items in the test checks memorization. This pressure creates a great extrinsic motivation. However, it decreases the interest of learning and intrinsic motivation which is more meaningful, long-lasting and effective. Learning becomes a boring and mechanical process which requires great attention and effort. In summary, the test is reliable but fails to meet the standard of validity and authenticity, which entails negative backwash. Teacher should be aware of such backwash to design a valid and reliable test, to teach all that should be taught. Tests ought to create positive backwash such as encourage ss, show them their strength and weakness for self improvement

A review of related literature Assessment is a necessary part of teaching process. Every learning is accompanied by testing. According to Bachman and Palmer (1996, p. 18), usefulness is the most important quality which includes reliability, construct validity, authenticity, interactiveness, impact and practicality. He points out three principles that testers should bear in mind when develop a test: Principle 1: It is the overall usefulness of the test that is to be maximized, rather than the individual qualities that affect usefulness. Principle 2: The individual test qualities cannot be evaluated independently, but must be evaluated in terms of their combined effect on the overall usefulness of the test.

Principle 3: Test usefulness and the appropriate balance among the different qualities cannot be prescribed in general, but must be determined for each specific testing situation. It is common shared idea by most researchers that validity, reliability and authenticity are the most important qualities of a good test. Validity comes first in the list. It answers the question: Does the test test what it is supposed to test? A scholar defines validity as the appropriateness, correctness, or meaningfulness of the specific inferences, descriptions, decisions, or consequences that are triggered by a test score. Henning (1987, p. 89 qtd. by Anderson, 1995, p. 170) asserts that validity refers to the appropriateness of a given test as a measure of what it is purported to measure. His emphasis is the particular purpose of the test. A test can be valid for this purpose but may be invalid for other purposes. Anderson also shares the same idea. He confirms that a test cannot be valid but we have to say the test is valid for its purpose. There are a number of ways to classify validation. Bachman limits validity to construct validity, which is the meaningfulness and appropriateness of the interpretations that we make on the basis of test scores. Thorndike and Hagen (1986) think that there are three main types of validity: rational, empirical and construct validity. Anderson (1995, p. 171-186) has a detailed classification of validity as internal and external validity. Internal validity, which includes face validity, content validity and response validity, refers to perceived content and perceived effect of the test. External validity, which includes concurrent validity and predictive validity, refers to the correlation between the students test result and measures of their ability obtained from outside the test. According to Wikipedia, types of validity are construct validity, content validity (representation and face validity) and criterion validity (seem like external validity includes concurrent and predictive). About.com website also has similar classification. The next important concept is reliability. Reliability is an easier term to define which receives less controversy. Validity studies accuracy, reliability studies consistency. Bachman (p.19) defines reliability as: a function of the consistency of scores from one set of tests and test tasks to another. If we think of test tasks as sets of task characteristics () then reliability can be considered to be a function of consistencies across different sets of task characteristics. Reliability is a necessary quality of a test. A test cannot be valid if it is not reliable. If the test results are not consistent, they cannot provide useful and correct information about
4

the matters tested. Most researchers agree with the components of reliability: inter-rater, testretest, parallel forms, and internal consistency reliability. Many linguists have questioned the correlations between reliability and validity. Bachman (p. 25) assert that while reliability is a quality of test scores themselves, validity is a quality of test interpretation and use. However, their roles are not always equal in a test. The website Answer.com says that the two features do not necessarily go hand-in-hand. Most researchers agree that they are the two most important qualities of a test, but the maximization of one quality will necessarily decrease the value of the other. Some advocate the sacrifice of reliability to validity. They think that validity is more decisive. However, a test cannot be valid unless it is relatively reliable. The website Research Methods Knowledge Base uses a metaphor of the target to describe this relationship.

Reliability & Validity

The bulls eye is the point tested. The blue dots are the measurement of each individual. As described in the figure above, reliability accounts for consistency whereas validity measures accuracy. In target 2, the test is valid but unreliable. It provides a valid group estimate, but it is inconsistent and therefore, cannot measure individuals correctly. Obviously, reliability can directly affect the variability of measurement. Bachman & Palmer (p. 23) confirms that the two qualities above are essential to the usefulness of any language test. Reliability is a necessary but not a sufficient condition for validity. In short, we can infer that a test is not necessarily highly valid and reliable at the same time. Whether the tester emphasizes validity or reliability depends on the purpose, the particular situation of the test. As long with those two qualities, authenticity is also a required feature of a useful test. It is defined as the degree of correspondence of the characteristics of a given language

test task to the features of a TLU [target language use] task (Bachman & Palmer, p. 23). The idea of authenticity is that the test should test authentic language which means the language used in real life, not fragments of language uprooted from the context. Lewkowicz (p. 60) did a research in University of Hong Kong and found out that the idea of authenticity differs across cultures, groups of test takers Therefore, to increase authenticity, test developers must examine carefully their subjects _ the people who take part in the test. Researchers and educators have come up with some alternative methods for assessment: presentation, assignment, portfolio, and rubric. Presentation: students work on a topic and present in front of the testers. Students can choose from a limited set of topics which are interesting and included in the course. This kind of assessment can test many aspects at the same time: speaking, listening, reading, pronunciation, vocabulary, grammar, persuading skill, body language. The task is authentic as it is natural. It is also interactive. The assessment is difficult so it requires experienced testers. The evaluation is subjective. To increase the reliability, there should be two or more testers and a detailed checklist to give assessment. Students are also well aware of this checklist. However, presentation takes time to prepare and rather challenging, so it is suitable for students of higher level. It truly tests communicative ability. Assignment: Students are required to write an assignment. The topics are carefully selected to arouse students interest and within students ability or included in the course. Assessment can check writing, reading, vocabulary, grammar. With this task, students need clear instructions. They are shown a model of a good and not good assignment. The marking takes time and is subjective and needs experience of markers. Just like presentation, to make marking more reliable, we need more markers and detailed checklist. However, it cannot test interactive skill, so suitable for students who need English for documenting and doing research. Portfolio: Students are required to work on a topic and present about what they have done. For example, students have to find pictures, songs, videos about alternative energy. They have to design a website about the topic, then present in front of the class, think of questions for class discussion This kind of task can assess integrated skills: reading, writing, listening, speaking, critical thinking, computer skills and problem solving. Students are encouraged to utilize creativity, self confidence, independence as well as cooperative ability (if students work in groups). However, it is demanding and time consuming.

Rubric: The Wikipedia website has a careful description of rubric. It is a way of establishing written guidelines or standards of assessments for formal, professionally-administered essay tests () Rubrics are designed to reflect the processes and outputs of "real-life" problem solving. It is usually in the form of a matrix with a mutually agreed upon negotiated contract or criteria for success. The rubric focuses on stated objectives, which should be tied to the educational standards as established by the community, and should use a range or scale to rate the performance. In Rubrics, assessment is more objective as criterion list is shared with students. Students are aware of expected standard, so they can assess themselves, their classmates and the assessment of teacher. A disadvantage of this technique is its time consuming, challenging. It requires help from technology, therefore cannot apply in rural areas.

Conclusion The achievement test in question is highly reliable but not very valid. There should be some adjustment and addition to make this achievement test more suitable. Speaking must be tested in other arranged time. The result will be accounted to the total scores. At this level, the tasks should be interview and role play. Students are given a communicative task with clear instructions and steps. To increase reliability, there should be more than one tester. A checklist of requirements is clearly stated. Many aspects of speaking should be included: ideas, fluency, confidence, gestures, word choice, pronunciation, stress and intonation However, the testers must bear in mind that ideas are more important than structures, communicative competence is more important than language competent in this task. Writing should be truly tested. During the course students learn to write a postcard, a complaint letter, a description, a biography and a report. The test can include one of these to test writing skill. Because this writing task is rather free, much effort is needed to improve reliability. Just like testing speaking, when testing writing includes language use, mechanical skills, treatment of content, stylistic skills and judgment skill. Language in context is tested, not pieces of language. The balance of skills and elements, and the balance between component aspects in each skill/ element need to be considered. To reduce negative backwash as the overwhelming of test preparation in class, the teacher should not be aware of the test. Each teacher will design a test; the administrator will choose one of those tests to be official.
7

The testing technique of presentation or rubric is also applicable and effective as long as there is enough time and modern view of teaching and testing is accepted in the school.

References About.com. What is Validity? Retrieved from http://psychology.about.com/od/researchmethods/f/validity.htm Anderson, J.C., Caroline Clapham and Dianne Wall. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press. Answer.com. What is the relationship between reliability and validity? Retrieved from http://wiki.answers.com/Q/What_is_the_relationship_between_reliability_and _validity Bachman, L. F. & Palmer, A. S. (1996). language testing in practice designing and developing useful language test. Oxford, UK: Oxford University Press. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press. Lewkowicz, J.A. Authenticity in language testing: some outstanding questions . Retrieved from http://202.197.121.116/Downloads/LangTst/tst_003.pdf Prodromou, L. (1995). The backwash effect: from testing to teaching. ELT Journal 49(1), 13-25. Oxford, UK: Oxford University Press. Research Methods Knowledge Base. Reliability and validity. Retrieved from http://www.socialresearchmethods.net/kb/relandval.php Thorndike, R.L. & E. P. Hagen. (1986). Measurement and evaluation in psychology and education. New York, The USA: Macmillan. Wekipedia. Rubric (academic). Retrieved from http://en.wikipedia.org/wiki/Rubric_%28academic%29 Wikipedia. Validity. Retrieved from http://en.wikipedia.org/wiki/Validity_%28statistics%29

Appendix FINAL TEST SEMESTER 2


I. A. 1. 2. 3. 4. B. 5. 6. 7. 8. II. Time: 60 minutes PRONUNCIATION (2mks) Choose the word whose underlined part is pronounced differently from that of the other three. A. butter B. secure C. country D. puncture B. luggage C. manage D. teenage A. damage B. co-exist C. protect D. proportion A. prohibit A. athlete B. advance C. aquatic D. appreciate Choose the word whose main stressed syllable is different from that of the other three. A. vacation B. enjoyable C. activity D. television A. scenic B. event C. fashion D. purpose A. industry B. endanger C. consequence D. maintenance A. energy B. plentiful C. develop D. dangerous ERROR IDENTIFICATION (1mk)

Choose the words or phrases that are not correct in Standard English. 9. The success of the trip (A) depends on (B) who you share it with and (C) on the target (D) which you aim at. 10. Buying clothes (A) are often (B) a very time-consuming practice (C) because those clothes that a person likes (D) are rarely the ones that fit him or her. 11. Nuclear energy can provide (A) enough electricity for the (B) worlds needs (C) in hundreds of years, but it (D) can be very dangerous. 12. If those students (A) would have checked their (B) answer sheets (C) more carefully, they would have corrected these errors (D) themselves. III. READING (2.5 mks) A. Read the following passage and choose the best answer. The energy security problem has been complicated by the problem of global climate change. Science has made it increasingly clear that climate change in now a major political issue at both global and national levels. The most recent confirmation of this is the Nobel Peace Prize given to Al Gore and the IPCC. Rising sea levels are threatening the existence of low-lying islands like the Maldives. In addition, the deadly Hurricane Katrina that left a trail of devastation is a case in point. All these pose a new type of threat that must be taken seriously. The way forward is to encourage alternative sources of energy. In Britain, Gorden Brown has proposed building five new eco-towns with 100,000 new environmentally friendly homes. This already exists in BedZeD, on the outskirts of London. BedZED is powered by a smallscale combined heat and power plant (CHP). In Germany, the use of photovoltaic cells and solar panels in individual houses is becoming the norm to generate electricity in the home. This is also popular in the UK. What is strange is that, considering the amount of sunlight in the UK or Germany and that of Malta, its quite clear that this system is 100 percent efficient.

10

13. What has made the energy security problem complex? A. Global climate change B. Politics C. Wars D. Draughts and hurricanes 14. What are threatening the existence of low-lying islands like the Maldives? A. Draughts B. Rising sea levels C. Hurricanes D. Typhoons 15. How serious was the Hurricane Katrina? A. Deadly serious B. Not very serious C. A little serious D. Not serious at all 16. What has Gordon Brown suggested building in the UK? A. The outskirts of London B. A friendly atmosphere C. Five new eco-towns with 100,000 new environmentally friendly homes. D. A small-scale combined heat and power plant (CHP) 17. What kind of energy is popular both in Germany and the UK? A. Nuclear energy B. Sea energy C. Wind energy D. Solar energy B. Choose the word that best suit the blank space in the following passage People are rapidly destroying the worlds rain forests. In 1950, rain forests (18) _____ about 8,700,000 square miles of the earth or about three-fourths of Africa. Today, less than half the original extent of the earths rain forests (19) _____. Few rain forest species can adjust to disturbance of their habitat. Most die when people clear large areas of forest. Scientists estimate (20) _____ tropical deforestation wipes out about 7,500 species per year. Commercial logging and the expansion of agriculture have (21) _____ or wiped out wide areas of rain forest. Huge mining projects, the construction of hydroelectric dams (22) _____ narrowed forest areas. 18. A. included 19. A. remains 20. A. why 21. A. fought 22. A. also have B. covered B. leaves B. which B. helped B. also are C. consisted C. stays C. what C. stopped C. have also D. lengthened D. develops D. that D. damaged D. are also

11

IV. VOCABULARY AND STRUCTURE (2.5 mks) 23. Our new post office is equipped with _____ technology. A. express B. advanced C. updated D. progressive 24. Were going to lose the game _____ our team doesnt start playing better soon. A. if B. unless C. whereas D. although 25. Today, many animals are facing _____. A. extinct B. extinctive C. extinctions D. extinction 26. The human race is only one small _____ in the living world. A. species B. racing C. animal D. thing 27. Everyone, except the twins, _____ to go trick-or-treating on Halloweens Day. A. agrees B. agree C. are agreeing D. is agreeing 28. It seems that the Earth is the only planet _____ can support life. A. who B. where C. what D. which 29. What happened to the pictures _____ were on the wall A. Who B. whom C. which D. whose 30. There is a telephone box _____ the bank 1. on B. over C. with D. opposite 31. The goalie tried to catch _____ ball, but he failed. A. a B. an C. the D. 32. He _____ to HCMC last year and I _____ him since then. A. moved/ didnt see B. moves/ havent seen C. moved/ havent seen D. moved/ hadnt seen V. SENTENCE TRANSFORMATION (2mks) Rewrite the following sentences with the given beginning so that the meaning is unchanged. 33. It didnt rain hard last night, so the match wasnt cancelled. If _______________________________________. 34. The first man who flew non-stop across the Atlantic was John Aleork. The first man _________________________________________ . 35. They will take that old woman to hospital tomorrow. That old woman ____________________________. 36. You should take up swimming to lose weight, Tom said to me. Tom advised me _______________________________.

________________________________ GOOD LUCK! ________________________________ _ ______

12

Вам также может понравиться