Академический Документы
Профессиональный Документы
Культура Документы
(Qian Xu Jun Liu) A Study On The Washback Effects (B-Ok - CC)
(Qian Xu Jun Liu) A Study On The Washback Effects (B-Ok - CC)
A Study on the
Washback Effects of
the Test for English
Majors (TEM)
Implications for Testing and Teaching
Reforms
A Study on the Washback Effects of the Test
for English Majors (TEM)
Qian Xu • Jun Liu
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Acknowledgements
This book is funded by the Fundamental Research Funds for the Central Universities
in China (Research Grant No. KY01X0222017082).
v
Contents
vii
viii Contents
Appendices�������������������������������������������������������������������������������������������������������� 187
Appendix I Survey Questionnaires�������������������������������������������������������������������� 187
Questionnaire for Experts (EQ) ���������������������������������������������������������������� 187
Appendix II Classroom Observations���������������������������������������������������������������� 215
A Advanced stage
AQ-4 Questionnaire for TEM4 related administrators
AQ-8 Questionnaire for TEM8 related administrators
C-E Chinese to English
CET College English Test
E-C English to Chinese
EQ Questionnaire for experts
ESOL English for speakers of other languages
F Foundation stage
Gra Grammar
HKCEE Hong Kong Certificate of Education Examination in
English
HKEA Hong Kong Examinations Authority
IELTS International English Language Testing System
KMO Kaiser-Meyer-Olkin
MCQ Multiple choice question
MET Matriculation English Test
NACFLT National Advisory Committee for Foreign Language
Teaching
NMET National Matriculation English Test
PhD Doctor of philosophy
PL2000 Progetto Lingue 2000
SPSS 17.0 Statistical Product and Service Solutions 17.0
SQ-4 Questionnaire for TEM4 related students
SQ-8 Questionnaire for TEM8 related students
TEM Test for English Majors
TEM4 Test for English Majors Grade 4
TEM4-Oral Test for English Majors Grade 4 Oral Test
TEM8 Test for English Majors Grade 8
TEM8-Oral Test for English Majors Grade 8 Oral Test
The Teaching Syllabus The National Teaching Syllabus for English Majors
xi
xii List of Acronyms
xiii
List of Tables
Table 1.1 A comparison between the previous TEM and new TEM............. 5
Table 1.2 TEM studies published over the years........................................... 7
Table 1.3 Research focus of the journal articles and PhD dissertations........ 8
Table 1.4 Test content and format of TEM.................................................... 10
Table 1.5 A comparison between the teaching syllabus
and TEM characteristics................................................................. 15
Table 3.1 Overview of the empirical studies on washback............................ 28
Table 4.1 A multi-phase, multi-method research design
for TEM washback study............................................................... 40
Table 4.2 The seven questionnaires............................................................... 43
Table 4.3 The number of questionnaires distributed,
returned and valid........................................................................... 45
Table 4.4 Reliability test for AQ-4, TQ-4, SQ-4, AQ-8, TQ-8
and SQ-8........................................................................................ 47
Table 4.5 KMO and Bartlett’s tests................................................................ 48
Table 4.6 Rotated component matrix for the questionnaires......................... 49
Table 4.7 Inter-coder reliability (first round)................................................. 103
Table 4.8 Inter-coder reliability (second round)............................................ 103
Table 4.9 Intra-coder reliability..................................................................... 104
Table 5.1 Demographic information of program administrators................... 111
Table 5.2 Familiarity with TEM.................................................................... 112
Table 5.3 Evaluation of TEM......................................................................... 113
Table 5.4 Information to be provided in the test report................................. 114
Table 5.5 Use and interpretation of TEM scores........................................... 115
Table 5.6 Influence of TEM scores................................................................ 115
Table 5.7 Unintended use of TEM scores...................................................... 116
Table 5.8 Attitudes towards unintended uses of TEM scores........................ 116
Table 5.9 Views on TEM washback............................................................... 117
Table 5.10 Washback exerted by different aspects of TEM............................. 117
xv
xvi List of Tables
Ministry of Education
School Administrators
Teachers
Students
markedly among schools of different levels. Third, schools of different type may
vary in their English curriculum. For instance, foreign language universities may
focus more on language and literature while technology institute may introduce
more science-related English courses.
Last, the bottom level is the classroom level, including teachers and students who
are responsible for the implementation of the school curriculum and the Teaching
Syllabus. Based on the school curriculum, teachers may design their respective
course syllabus, and their previous knowledge and available resources will affect
their teaching practice. Moreover, university students are from different regions and
background, who may perform differently in the same classroom.
All the people involved in this hierarchy are TEM stakeholders, and the higher
the level, the fewer the stakeholders, but the greater the power they have. Besides,
stakeholders from each level may overlap: some members in the NACFLT have
participated in the development of the Teaching Syllabus, some experts in the TEM
testing committee are also school administrators, and some school administrators
are English teachers at the same time. Therefore, in the current study, although the
research focus is the washback of TEM on English teaching and learning, opinions
of stakeholders from each level will be sought, with teachers and students as the
major participants.
In this view, a top down approach will be adopted when investigating the wash-
back effects of TEM, which will be carried out in three phases. First, opinions of
1.2 The History of TEM 3
stakeholders from the national level will be investigated, through the analysis of
related documents and a semi-open questionnaire survey among the experts, who
are either members of the NACFLT or syllabus designers, or both. Then, a question-
naire survey will be conducted among school administrators, to find out different
practices in all the TEM participating schools. Last, teachers and students will be
investigated, through large-scale questionnaires and case study classroom observa-
tions. As a supplementary instrument, interviews will be conducted in all the three
phases, to obtain follow-up data.
The history of TEM can be traced back to the early 1990s, and in its two-decade
development, TEM has undergone several revisions. The following part will intro-
duce the launch and development of TEM, its soaring test population, as well as the
related research on TEM.
Before reviewing the history of TEM, we will first introduce the formulation and
publication of the Teaching Syllabus. After the Cultural Revolution, foreign lan-
guage education in China has entered a new era. The early 1980s witnessed a rapid
increase in the demand for English majors in every field, because of our government
adoption of the policy of reform and opening up to the outside world. As a result,
many universities set up English departments and offered English programs.
However, due to various reasons, huge gap existed among different universities in
curriculum design, teaching material selection and course evaluation, which led to
an urgent demand for a national teaching syllabus. Hence, the first two national
teaching syllabuses for college English majors, foundation stage and advanced
stage, were published in 1989 and 1990 respectively. The two teaching syllabuses
serve as guidelines for the development of curriculum and teaching materials for
English majors in various universities throughout China. They have established spe-
cific teaching requirements and English proficiency standards for undergraduate
English majors, and thus, it is possible and necessary to develop a nation-wide test
to check the implementation of the Teaching Syllabus.
In line with this demand, two groups of specialists started working on TEM,
based in two different universities. Shanghai International Studies University was
responsible for TEM4, a test for English sophomores to measure their proficiency in
the foundation stage; while Guangdong University of Foreign Studies was respon-
sible for TEM8, a test for English majors in their fourth year to measure their profi-
ciency in the advanced stage. The first TEM4 and TEM8 were launched in 1990 and
1991 respectively. Commissioned by NACFLT, Shanghai International Studies
4 1 TEM: A High-Stakes Standardized Test in the Chinese Context
University has assumed responsibility for both tests since October, 1992. To stan-
dardize the test, the first TEM Syllabuses were drafted in 1993 and published in
1994, which marked the official beginning of TEM.
Following the introduction of TEM4 and TEM8 written tests, Nanjing University
was given responsibility to conduct studies on the feasibility of TEM4-Oral and
TEM8-Oral in 1994 and 1998 respectively. After several years’ of trialing, TEM4-
Oral and TEM8-Oral were officially launched in 1999 and 2003, and the Oral Test
Syllabuses were published in 2005.
It has been more than 20 years since TEM was first implemented, and in the course
of its development, TEM has undergone several revisions.
After its publication in 1994, the TEM Syllabuses (TEM4 and TEM8) has under-
gone two major revisions, one in 1997 and the other in 2004. A comprehensive vali-
dation study of TEM was conducted from 1993 to 1996, and based on its findings,
a revised edition of TEM Syllabuses were published in 1997. The second major
revision took place in 2004 as a response to the revision of the Teaching Syllabus in
2000. The revisions of TEM Syllabuses aim at improving test quality, standardizing
test development and administration, and achieving fairness of test result interpreta-
tion. (Jin and Fan 2011; Zou 2010).
In 2005, the new versions of TEM4 and TEM8 were introduced, employing more
integrative tasks and using lengthier and more authentic materials for listening and
reading. The test content and format of the previous TEM and new TEM are listed
in Table 1.1.
From the above table, three major changes can be identified. First and foremost,
the new versions of TEM set a higher requirement on the test takers. The listening
and reading speed has been increased, with faster listening materials and longer
reading passages; short listening materials like statements have been replaced by
longer ones like passages; a lengthier composition is required for the writing part;
and the passages for cloze and proofreading also lengthened. Second, speed reading
has been dropped from both TEM4 and TEM8. Last, a new section of general
knowledge has been included in TEM8.
As for the oral tests, a major reform was adopted in 2008, supplementing the
tape-mediated format with computer-based spoken English tests (Jin and Fan 2011).
Further discussions on these revisions will be provided in Sect. 1.4.
1.2 The History of TEM 5
Table 1.1 A comparison between the previous TEM and new TEM
Task Input Format % Time
TEM4 (previous
version)
Writing 1) A topic, chart or graph A text of about 150 15 35
composition words
2) note-writing Written prompts A note of 50–60 words 5 10
Dictation A text of about 150 words, listen Word-for-word dictation 15 15
4 times, 120 wpm
Listening Statements, conversations and 25 MCQs 15 20
news broadcasts, 120 wpm
Cloze A text of about 250 words 15 MCQs 10 15
Grammar & 25 sentences 25 MCQs 15 15
vocabulary
Reading 1) careful 4–5 texts of about 1500 words 15 MCQs 15 25
altogether
2) speed 4–5 texts of about 900 words 10 MCQs 10 5
altogether
TEM4 (new version)
Writing 1) A topic, chart or graph A text of about 200 15 35
composition words
2) note-writing Written prompts A note of 50–60 words 10 10
Dictation A text of about 150 words, listen Word-for-word dictation 15 15
4 times, 120 wpm
Listening Conversations, passages and news 30 MCQs 15 20
broadcasts, 120 wpm
Cloze A text of about 250 words 20 MCQs 10 15
Grammar & 30 sentences 25 MCQs 15 15
vocabulary
Reading 4–5 texts of about 1800 words 15 MCQs 20 25
altogether
TEM8 (previous
version)
Listening Talks, conversations and news 15 MCQs 15 15–
broadcasts, 140–170 wpm 20
Mini-lecture, 700 words, 140 Note-taking and 10 20
wpm gap-filling (10 blanks)
Proofreading and A text of about 200 words Error identification and 10 15
error correction correction
Reading 1) careful 4–5 texts of about 2500 words 15 MCQs 15 30
2) speed 6–7 texts of about 3000 words 10 MCQs 10 10
Translation 1) C-E A text of about 150 Chinese Translation 10 30
characters, 250 wpm
2) E-C A text of about 150 words, 250 Translation 10 30
wpm
(continued)
6 1 TEM: A High-Stakes Standardized Test in the Chinese Context
After more than two decades of development, TEM has grown into one of the pre-
dominant English tests in China, winning extensive recognition from test takers,
relevant institutions and society at large (Jin and Fan 2011). The number of test
takers and participating universities has increased rapidly and significantly.
As for the written tests of TEM4 and TEM8, from 1992 to 2015, the number of
participants increased respectively from around 10,000 to 270,000 and from 5000 to
210,000. The number of participating schools also increased from around 100 to
more than 800 for both TEM4 and TEM8. As for TEM oral tests, the number of test
takers and participating schools has also increased significantly, but its test taking
population is no more than one-tenth of its written counterpart, around 18,000 and
10,000 for TEM4 and TEM8 respectively in 2015. (data released by TEM test
office).
With its substantial improvement and increasing test population, TEM is playing
a more and more important role in English teaching and learning at the tertiary
level, exerting profound washback effects on the students and teachers, the educa-
tional system and even the society as a whole.
Ever since the implementation of TEM in the early 1990s, quite a few studies have
been conducted to investigate its feasibility, validation, reform, implication for teach-
ing, and so on. The research findings have contributed to the reform of TEM and
1.2 The History of TEM 7
English teaching and learning, which can also be considered an aspect of washback,
according to Bailey (1996). Therefore, in this part, research findings on TEM will be
introduced and analyzed, to summarize their impact on English teaching and learning.
Due to the limited space and time, as well as the vast number of studies on TEM, only
influential studies are selected here. The studies selected are: (1) articles published in
CSSCI journals in linguistics, since CSSCI journals are well established in the aca-
demic field which are widely received and which may exert influence on future
research; (2) PhD dissertations, because they are often based on in-depth research in
the field, and should be of high quality and will give us original insight into this issue.
A search in the CSSCI journals1 and CNKI2 Corpus of PhD and Master
Dissertations, using the key word “TEM”, shows that up till the end of 2015, 99
articles and 18 PhD dissertations have been published. Table 1.2 briefly summarizes
the number of research published over the years.
The first journal article on TEM was published in 1991 on the feasibility of using
dictation in TEM4 (Lian 1991), and the first PhD dissertation was completed in
2004 on the authenticity of listening comprehension questions in TEM (Dang 2004).
From the above table, we can find an apparent increase in the number of studies
conducted on TEM since 2001.
In terms of the research focus, the 99 journal articles and 18 PhD dissertations
cover almost every skill assessed in TEM (Table 1.3).
From Table 1.3, it can be noted that more journal articles focus on TEM in gen-
eral, such as the test report of TEM in a certain year, or comment on the revision of
TEM Syllabus, while PhD dissertations are more focused, usually concentrating on
one skill in TEM. Moreover, writing and translation seem to be studied more than
the rest of the language skilled measured in TEM, which is probably because of the
subjective nature of these two parts—test development, marking criteria and pro-
duced texts are all subjects for research.
These researches on TEM may contribute to English teaching and learning in the
following ways:
1. Test reports on TEM analyzed the test items and students’ performance in each
item, which would provide useful feedback information on the strengths and
weaknesses of students;
1
Eleven journals on linguistics (foreign language) included in CSSCI (2013–2014) are searched,
which are Foreign Language Teaching and Research, Foreign Language World, Journal of Foreign
Languages, Modern Foreign Languages, Chinese Translators Journal, Foreign Languages in
China, Foreign Language Research, Foreign Language Education, Foreign Languages and Their
Teaching, Computer-Assisted Foreign Language Education, and Foreign Language Learning
Theory and Practice.
2
CNKI: China National Knowledge Infrastructure, an online resource sharing platform which has
a collection of published articles.
8
1
Table 1.3 Research focus of the journal articles and PhD dissertations
General Listening Reading Writing Speaking Translation Culture Gra& voc Cloze Dictation
CSSCI 28 11 5 16 11 22 2 2 1 1
PhD 1 3 3 6 1 2 1 1 0 0
TEM: A High-Stakes Standardized Test in the Chinese Context
1.3 TEM Test Characteristics 9
As stated in the Test Syllabus, the purpose of TEM is twofold: (1) to measure the
English proficiency of undergraduate English majors, to check whether they have
met the required levels specified in the Teaching Syllabus; (2) to check the imple-
mentation of the Teaching Syllabus and to bring about reform on language
teaching.
In the Teaching Syllabus, the four-year undergraduate program for English majors
is divided into the foundation stage (freshman and sophomore year) and the
advanced stage (junior and senior year). Accordingly, TEM4 and TEM4-Oral are
administered at the end of the second year to assess candidates’ English proficiency
in the foundation stage and TEM8 and TEM8-Oral are administered at the end of
the fourth year to assess candidates’ English proficiency in the advanced stage.
The test content of TEM4 includes listening, reading and writing, and two more
skills of translation and general knowledge (linguistics, culture and literature) are
assessed in TEM8. Basic language knowledge of grammar and vocabulary are mea-
sured in both tests, but in different formats, multiple-choice questions in TEM4 and
error correction in TEM8. As for oral tests, TEM4-Oral focuses on speaking, while
10 1 TEM: A High-Stakes Standardized Test in the Chinese Context
TEM8-Oral assesses not only speaking but also interpretation. Test content and for-
mat of the four tests are listed in Table 1.4.
Table 1.4 shows that various item types are employed in TEM, both objective
ones and subjective ones, both discrete-point items and integrative tasks. In TEM4
and TEM8, objective items of multiple-choice questions account for 60% and 40%
of the test, and the rest are all subjective items, such as writing and translation.
Discrete-point items are used, such as multiple-choice questions of grammar and
vocabulary in TEM4 and general knowledge in TEM8, and integrative tasks include
dictation and cloze in TEM4 and mini-lecture and proofreading in TEM8.
All the four tests of TEM are administered once a year: TEM4 and TEM4-Oral in
May for second-year students, TEM8 in March and TEM8-Oral in December for
fourth-year students. The total test time is 135 min for TEM4 and 195 min for
TEM8, and the two oral tests take approximate 25 min each.
For the written tests of TEM4 and TEM8, the test papers are sent directly to the
participating schools before administration, together with detailed guidelines, pro-
viding instructions on how the test should be administered, such as when to distrib-
ute test papers or answer booklets. The test is administered at the same time
throughout China and the listening section is broadcast from a specified radio chan-
nel. After administration, used test papers and answer booklets will be sent back to
TEM Test Office in Shanghai International Studies University.
As for the oral tests, they are administered in the audio laboratories of the partici-
pating schools, and candidates’ responses will be recorded either in tapes or in
discs, which will be sent to Nanjing University, Guangdong University of Foreign
Studies, or Dalian University of Foreign Languages, as assigned by the TEM Oral
Test Office in Nanjing University.
In TEM4 and TEM8, the objective items of multiple-choice questions are marked
by machines, and there is no penalty for wrong answers. The subjective items are
scored according to carefully-prepared marking criteria by well trained raters.
Detailed marking schemes are developed for TEM4 dictation, TEM8 mini lecture
and proofreading, to determine the marks deducted for each type of mistakes.
12 1 TEM: A High-Stakes Standardized Test in the Chinese Context
Moreover, analytic marking is adopted for writing and translation. Writing perfor-
mance is judged against the criteria of ideas and argument (50%, including content,
effectiveness and organization), language use (40%, including correctness, rich-
ness, appropriateness and fluency) and mechanics (10%, including spelling, punc-
tuation, neatness, and so on); while translation performance is evaluated against the
criteria of faithfulness (60%, faithfulness to the original text, both in content and in
style) and fluency (40%, correctness in language and elegance in expression).
TEM4-Oral and TEM8-Oral adopt a mixed marking approach. The three tasks
are marked separately, 20% for each, according to different marking schemes devel-
oped for each task, and the rest 40% is the quality of language, 20% for pronuncia-
tion and intonation and 20% for grammar and vocabulary. The performance of each
candidate will be marked by two raters independently.
The pass score for all the four tests of TEM is 60, and candidates scoring above 60
will receive a certificate from NACFLT, on which their level of performance will be
reported, including excellent (above 80 for TEM4 and TEM8, above 90 for TEM4-
Oral and TEM8-Oral), good (70–79 for TEM4 and TEM8, 80–90 for TEM4-Oral
and TEM8-Oral), and pass (60–69 for TEM4 and TEM8, 60–79 for TEM4-Oral and
TEM8-Oral). Candidates scored below 60 will receive no certificate. The test scores
are not reported to the test takers, but they can check their composite test scores of
TEM4 and TEM8 through the Academic Affairs Office in their schools. Scores of
TEM4-Oral and TEM8-Oral are not reported.
Of the four tests in the battery, only TEM4 is required of all undergraduate
English majors, while TEM8 and the two oral tests are optional. Decisions concern-
ing penalties for those failing TEM4 are left in the hands of the participating univer-
sities (Jin and Fan 2011).
As stated in the Test Syllabus, TEM results can be used to evaluate English
teaching and learning in the light of the Teaching Syllabus, and to provide feedback
to teaching and learning. However, due to the increasing social recognition of TEM
as an important benchmark of English proficiency, more and more organizations in
China are using TEM results to select and recruit employees; some schools do not
confer bachelor’s degree to those who have failed to obtain TEM certificates; and
teachers’ evaluation is connected with students’ performance in TEM in some
schools. (Jin and Fan 2011).
The increasing social stakes of TEM will have double effects. On one hand, more
importance will be attached to TEM, which may enhance its facilitative role in
English teaching and learning. On the other hand, the extensive social weighting of
TEM may lead to overly test-oriented English teaching and learning among under-
graduate English majors, which might be harmful.
1.4 TEM Test Characteristics vs. the Teaching Syllabus 13
Since the purpose of TEM is to evaluate English teaching and learning in the light
of the Teaching Syllabus, TEM test characteristics should comply with the Teaching
Syllabus. Therefore, this section will first introduce the Teaching Syllabus, and then
make a comparison between the Teaching Syllabus and TEM test characteristics.
TEM is designed to measure the implementation of the Teaching Syllabus, and the
four tests in the TEM battery assess nine out of the ten subcategories of language
abilities listed in the Teaching Syllabus. The subcategory of “cultural awareness”
has been assessed since 2005, when a new section of “General Knowledge” has
been included in TEM8. The subcategory of “use of reference books” is not included
because this skill aims at cultivating students’ ability to learn by themselves and do
research, which can be measured by the writing tasks of term papers and bachelor’s
degree degree paper. Table 1.5 lists the required language abilities in the Teaching
Syllabus and how they are measured in TEM.
From Table 1.5, it can be concluded that most language requirements in the
Teaching Syllabus are measured in TEM with several items and various test meth-
ods in their corresponding stages. Direct testing methods are employed to measure
writing, speaking, translation and oral interpretation. Listening, grammar and
vocabulary are assessed with various testing methods, both objective multiple-
choice questions and subjective methods of dictation, gap-filling, error-correction,
and so on. Pronunciation, though not tested directly in any item, is adopted as an
essential marking criterion which accounts for 20% of the composite score of
TEM4-Oral and TEM8-Oral. Reading and cultural awareness are the only two abili-
ties employing multiple-choice questions as the sole testing method.
From the above analysis, it can be inferred that TEM characteristics, generally
speaking, overlap with its construct stated in the Teaching Syllabus. However, pos-
sible mismatches lie in the following aspects:
1. Using multiple-choice questions as the sole testing method to measure language
abilities such as reading and cultural awareness will lead to construct-irrelevant
variance, which is caused by the possible guessing factor involved in multiple-
choice questions. Such variance will contaminate the interpretation of the test
score and may encourage test-taking technique training.
2. The small test population of TEM oral tests may lead to construct under-
representation. The number of TEM4-Oral and TEM8-Oral test takers is around
20,000 and 10,000 respectively, which is only 10% of the candidates of TEM4
and TEM8 written tests. However, speaking is a required ability in the Teaching
Syllabus, which should be of equal weighting as the other abilities. The lack of
oral tests for most candidates will perhaps lead to less importance attached to
speaking, and thus, the candidates’ low ability in speaking.
3. The misuse of TEM results is also a possible cause of construct irrelevant vari-
ance. As mentioned in Sect. 1.3.4, TEM results, as stated in the test syllabus, is
meant to measure the English proficiency of undergraduate English majors, to
find out whether they have met the required level specified in the Teaching
Syllabus. However, with its increasing social weighing, TEM scores are used by
some organizations and schools to recruit employees, rank schools and evaluate
teachers. These uses of TEM results have not been justified, and will possibly
lead to negative washback effects.
1.4 TEM Test Characteristics vs. the Teaching Syllabus 15
Table 1.5 A comparison between the teaching syllabus and TEM characteristics
Category Courses Stage TEM item Test method
Pronunciation Phonetics F TEM4-oral, A marking
criterion
TEM8-oral
Grammar Grammar, fundamental F TEM4: Grammar and Multiple-choice
English, advanced English vocabulary
A TEM8: Proofreading Error
identification and
correction
TEM4-oral, A marking
TEM8-oral criterion
Vocabulary Fundamental English, F TEM4: Grammar and Multiple-choice
advanced English vocabulary
A TEM8: Proofreading Error
identification and
correction
TEM4-oral, A marking
TEM8-oral criterion
Listening Listening, fundamental F TEM4: Dictation, Dictation,
English, advanced English listening multiple-choice
A TEM8: Mini-lecture, Note-taking and
listening gap-filling
Speaking Speaking, fundamental F TEM4-oral: Retelling, Retelling
English, advanced English presentation,
conversation
A TEM8-oral: Presentation
Presentation Conversation
Reading Reading, fundamental F TEM4: Reading Multiple-choice
English, advanced English A TEM8: Reading
Writing Writing, fundamental F TEM4: Note writing, Writing
English, advanced English composition writing
A TEM8: Writing
Translation Translation, oral F TEM8: Translation Translation (C-E,
interpretation, fundamental E-C)
English, advanced English A TEM8-oral: Oral Oral interpretation
interpretation (C-E, E-C)2
Culture English culture, fundamental F TEM8: General Multiple-choice
English, advanced English knowledge
A
Note: F=Foundation stage; A = Advanced stage; E-C = from English to Chinese; C-E = from
Chinese to English.
16 1 TEM: A High-Stakes Standardized Test in the Chinese Context
1.5 Summary
This chapter provides an overview of TEM, its context, history and characteristics.
Moreover, a comparison is made between TEM characteristics and the Teaching
Syllabus. Several key points are summarized below.
First, in the past 20 years, TEM has developed into a high-stakes national test
with extensive social recognition. The test results are used not only to examine the
implementation of the Teaching Syllabus, but also to recruit employees, rank
schools and evaluate teachers. Therefore, it is imperative to conduct a research on
the washback effects of TEM.
Next, there seems to be an endless list of TEM stakeholders, but in the educa-
tional system, TEM stakeholders can be roughly divided into three levels: policy
makers, Teaching Syllabus designers and TEM constructors in the national level,
the administrators in the school level and teachers and students in the classroom
level. These are the major participants of the current study.
Last, as a criterion-referenced test, TEM is not based on any set of textbooks or
some particular language theory, but adheres to the Teaching Syllabus which out-
lines the construct of TEM. A comparison between TEM characteristics and the
Teaching Syllabus shows that TEM characteristics generally overlap with its con-
struct, which may lead to positive washback. However, mismatch also exists in the
test content, method and use, which might be the source of negative impact.
Chapter 2
Washback: Definitions and Dimentions
Tests influence teaching and learning. A test, if well designed and appropriately
employed, may not only help candidates develop positive attitude and strong moti-
vation for learning, but also provide them with timely information on their strengths
and weaknesses in learning, thus accelerating their learning process. Teachers can
also benefit from the test through the feedback it offers, which makes it possible for
them to adjust their teaching accordingly. However, if poorly designed, a test may
not only provide false information, but also lead teaching and learning astray. Such
influence is termed washback.
In the field of applied linguistics, washback and backwash are both used and gener-
ally considered interchangeable (Alderson 2004; Hughes 2003). However, a search
in the dictionaries shows that “washback” is rarely seen; while only “backwash” can
be found, which means either “a backward flow of water, caused by an oar, wave,
etc.” or “the bad situation that remains after something bad has happened” (Longman
Dictionary of Contemporary English). The negative connotations of these two defi-
nitions seem to suggest that the effects of testing on teaching and learning are more
likely to be negative than positive. According to Spolsky (1996), the word “back-
wash” is often applied to unintended and negative effects, and until the 1980s it
generally carried a negative meaning in the field of applied linguistics.
However, the word “washback” has gained in currency and is now more com-
monly used in applied linguistics (Cheng et al. 2004; Green 2007; Hawkey 2006).
Attitudes towards it have also become neutral, with many studies aiming at promoting
beneficial washback (Alderson and Wall 1993; Gu 2007; Hughes 2003). In this sense,
the term “washback” will be used throughout in this dissertation, except when quot-
ing from other authors.
Washback is such an essential criterion for evaluating a test that some suggested it
should be considered an aspect of validity, coining the term “washback validity”
(Frederiksen and Collins 1989; Morrow 1986; Weir 1990). Messick (1989, 1996)
further specified it by proposing “six distinguishable aspects of construct validity”,
locating washback within the theoretical concept of consequential validity.
The consequential aspect of construct validity includes evidence and rationales for evaluat-
ing the intended and unintended consequences of score interpretation and use in both the
short- and long-term, especially those associated with bias in scoring and interpretation,
with unfairness in test use, and with positive or negative washback effects on teaching and
learning. (1996: 251)
2.2 Dimensions of Washback 19
As many studies on washback attest, one of its key features is that it is highly com-
plex (Bailey 1996; Cheng 1997; Watanabe 1996, etc.). Therefore, to better analyze
this phenomenon, it is necessary to investigate washback from different dimensions,
so that we may look into its nature from various aspects. Based on the previous
research (Bachman and Palmer 2010; Green 2007; Hawkey 2006; Watanabe 2004),
six main dimensions are to be introduced here, namely, direction, extent, intensity,
intentionality, length and specificity.
2.2.1 Direction
Washback can be either positive or negative. A test, if well designed and appropri-
ately used, may facilitate teaching and learning, providing useful information and
generating strong motivation. This is positive washback. On the other hand, if
poorly designed and inappropriately used, a test may lead to negative washback.
20 2 Washback: Definitions and Dimentions
Whether a test will generate positive or negative washback is the focus of most
washback studies, with many researchers proposing various ways to promote posi-
tive washback (Bailey 1996; Gu 2007; Hughes 2003; Kellaghan and Greaney 1992),
which can be summarized into the following three main aspects:
1. To improve the test, such as using direct and authentic testing, sampling widely,
reflecting the full curriculum, employing various testing formats, and so on;
2. To facilitate the stakeholders, ensuring that they have adequate knowledge and
resources to meet test demands, as well as taking a desirable attitude towards the
test;
3. To enhance communication between test designers and stakeholders, such as
providing timely test report, offering teacher training, and so on.
2.2.2 Extent
A test may influence students, teachers, school administrators, textbook writers, etc.
Therefore, in a washback study, it is important to first determine the research extent,
whether it is the school context or the educational system as a whole. Generally
speaking, the more important a test is, the greater its washback effects will be. A
classroom achievement test may affect a certain class only; while a high-stakes test,
such as a nation-wide selection exam, the National Matriculation Test, may affect
not only the teachers and students involved, but also the related parents, textbook
writers, and even the whole society.
Hence, in a washback study, the extent of the washback effects shall be investi-
gated, so that a better overview of the study can be achieved.
2.2.3 Intensity
Washback can be either strong or weak. If the impact is strong enough a lot of class-
room teaching activities can be determined by it, such as what to teach and how to
teach, even stakeholders’ attitude to teaching and learning. The intensity of wash-
back is often related to the stakes of a test; that is, the more important the test, the
more intense its washback. Undoubtedly, an important test like the National
Matriculation Test will affect teaching and learning more than a classroom achieve-
ment test. Therefore, a test with more intense washback tends to attract more atten-
tion from the researchers and most of the washback studies are conducted on
high-stakes tests like TOEFL, IELTS and National Matriculation Test (Cheng 2005;
Green 2007; Alderson and Hamp-Lyons 1996; Hawkey 2006; Qi 2004; Wall 2005;
Watanabe 1996).
2.2 Dimensions of Washback 21
2.2.4 Intentionality
2.2.5 Length
Washback may exist for a short time or last for a long term. The effects like motivat-
ing students to learn may disappear shortly after the test is over, while the learning
habits developed when preparing for the test may last a life time. Thus, longitudinal
study should be carried out to investigate not only the immediate effects of a test,
but also its influence in the long run.
2.2.6 Specificity
Washback can be either general or specific. General washback refers to the effects
that may be generated by any test, such as motivating students to learn and provid-
ing feedback information; while specific washback relates only to a certain test or
test type, such as the washback effects of multiple-choice phonetics questions.
Therefore, when investigating washback, attention should be paid to exploring the
specific effects of a test, apart from reviewing the general washback.
Among the dimensions listed above, the most frequently studied is the nature of
washback, whether it is positive or negative; while the rest of the dimensions are
seldom looked into. However, they can also provide researchers with valuable infor-
mation and a fresh insight into the role of testing in teaching and learning. Therefore,
to view washback from different dimensions may be a necessary supplement to
washback definition and an important step forward in washback studies, which shall
contribute to the improvement and systematization of research in this field.
22 2 Washback: Definitions and Dimentions
2.3 Summary
This chapter has reviewed the definitions and dimensions of washback effects,
which may suggest that different dimensions shoud be investigated in a washback
study. It is essential to look into its direction, whether it is positive or negative, or
whether its positive washback outweighs its negative washback. However, besides
direction, other dimensions can also be explored, such as whether a test has realized
its intended washback, whether the washback effects will last a long time or disap-
pear when the test is over, and so on. Therefore, in a washback study, more efforts
can be made to probe into the other dimensions of washback effects, to obtain a
more comprehensive picture of the phenomenon.
Chapter 3
Theoretical Frameworks for Washback
Studies
The early 1990s witnessed a great step forward in washback studies—a shift from
making assertions to establishing theoretical frameworks (Gu 2007). Alderson and
Wall (1993) were the first to develop a framework—Fifteen Washback Hypotheses.
Following them, many other scholars have proposed different models. Here, some
of the significant ones will be analyzed in detail, namely, Hughes’ (1993) Washback
trichotomy, Bailey’s (1996) basic model of washback and Green’s (2007) model of
washback, incorporating intensity and direction.
In “Does Washback Exist?”, Alderson and Wall (1993) provided a thorough and
comprehensive framework to illustrate the washback effects of language testing.
Based on the most general Washback Hypothesis “a test will influence teaching”,
they further developed fourteen hypotheses:
1. A test will influence teaching.
2. A test will influence learning.
3. A test will influence what teachers teach; and
4. A test will influence how teachers teach; and therefore by extension from 2 above:
5. A test will influence what learners learn; and
6. A test will influence how learners learn.
7. A test will influence the rate and sequence of teaching; and
8. A test will influence the rate and sequence of learning.
9. A test will influence the degree and depth of teaching; and
10. A test will influence the degree and depth of learning.
11. A test will influence attitudes to the content, method, etc. of teaching and learning.
12. Tests that have important consequences will have washback; and conversely
13. Tests that do not have important consequences will have no washback.
While Alderson and Wall (1993) tried to “lay out the territory” of washback ques-
tions to be addressed in future research, Hughes’ (1993) attempt was to construct a
model of washback mechanism. Hughes (1993: 2) proposed, “In order to clarify our
thinking on backwash, it is helpful, I believe, to distinguish between participants,
process and product in teaching and learning, recognizing that all three may be
affected by the nature of a test”.
According to Hughes (1993:2), participants are “all of whose perceptions and
attitudes towards their work may be affected by a test”, such as teachers, students,
school administrators and material writers. Process refers to “any actions by the
participants which may contribute to the process of learning”, such as syllabus
design, material development, the selection of teaching content and methods, etc.
Last, product is “what is learned (facts, skills, etc.) and the quality of learning (flu-
ency, etc.)”. He further states:
The trichotomy into participants, process and product allows us to construct a basic model
of backwash. The nature of a test may first affect the perceptions and attitudes of the
3.1 Frameworks for Washback Studies 25
p articipants towards their teaching and learning tasks. These perceptions and attitudes in
turn may affect what the participants do in carrying out their work (process), including
practicing the kind of items that are to be found in the test, which will affect the learning
outcomes, the product of that work. (1993:2)
Besides, Hughes added that the following five conditions were necessary for the
washback effects to occur:
1. Success on the test must be important to the learners;
2. Teachers must want their learners to succeed;
3. Participants must be familiar with the test and understand the implications of its
nature and content;
4. Participants must have the expertise which is demanded by the test (including teach-
ing methods, syllabus design and materials writing expertise);
5. The necessary resources for successful test preparation must be available. (1993:
2–3)
focal construct as understood by course providers and learners (the greater the over-
lap), the greater the potential for positive washback” (2007: 14). Moreover, test use
and test stakes, or the perceived consequences of test scores will also affect the
direction of washback.
Meanwhile, test stakes and test difficulty, as perceived by the stakeholders, will
influence the intensity of washback, with important and challenging (but attainable)
tests exerting the most intense washback.
However, the washback effects of the same test may vary from participant to
participant because of their different experiences and beliefs, such as their knowl-
edge and understanding of test demands, resources to meet test demands and accep-
tance of test demands.
In sum, from the above models, it can be concluded that when investigating
washback, three questions are to be answered—what, how and why. First, what will
be affected by a test? According to Alderson and Wall (1993), a test will influence
both teaching and learning, such as the content and method, the rate and sequence,
and the degree and depth. Next, how will washback effects work? According to
Hughes (1993) and Bailey (1996), the washback mechanism consists of three
parts—participants, processes and products. The participants, such as teachers, stu-
dents, school administrators and material writers, will first be affected by the test—
their perceptions of and attitudes to the test will influence their concept of language
teaching and learning. Consequently, the teaching and learning processes will
change, and the learning products will differ. Last, why would a test exert washback
effects? According to Green (2007), the direction of washback is related to the over-
lap between test construct and test characteristics; while the intensity of washback
effects is determined by test stakes and test difficulty. Moreover, the washback
effects of the same test may vary from participant to participant because of their
different experiences and beliefs. In a word, washback effects are determined by
both the test, such as its validity, stakes and difficulty, and the stakeholders, such as
their values and characteristics.
Although washback has long been a concept under discussion (Cronbach 1963;
Fredericksen 1984; Latham 1877), empirical studies on the washback effects of
language testing have only a short history of no more than 30 years. Since the late
1980s, washback has gained importance in language testing and there has been a
rapid increase in the number of empirical studies conducted. As a result, Language
Testing, a renowned academic journal in the field, published a special issue on
washback effects in 1996, reporting the latest developments in washback theory and
practice at that time, such as Alderson and Hamp-Lyons’ washback study on
TOEFL, Watanabe’s study on the University Entrance Exam in Japan, Wall’s study
on O-Level exam in Sri Lanka and Shohamy’s study on Arabic and English tests in
Israel. Moreover, in the 2000s, Cambridge University Press published four books on
3.2 Empirical Studies on Washback 27
washback studies (Cheng 2005; Green 2007; Hawkey 2006; Wall 2005), which
indicates that research on washback has advanced into a new era, becoming system-
atic and comprehensive.
In mainland China, the first washback study on language testing might be the one
conducted by Li (1990) on Matriculation English Test (MET). In the next two
decades, more research has been done on high-stakes nation-wide tests, such as
MET, CET and TEM, with CET being the most widely investigated, as indicated by
a search in the CSSCI1 journals (Cai 2009; Jin 2000, 2006; Xiao et al. 2014; Ye
1998; etc.). Moreover, two books (Gu 2007; Qi 2004) have been published on CET
washback and MET washback respectively. Comparatively, TEM washback is less
studied, with only three articles (Qin 2012; Xu 2012; Zou and Xu 2014) published
in CSSCI journals till the end of 2015. Therefore, it is necessary to conduct more
empirical research on TEM washback, to better understand the role of TEM in
English teaching and learning, so that positive effects can be achieved.
To better clarify the research content and process of washback studies, some of the
large-scale and influential empirical studies on the washback of language testing are
to be introduced, with the following table summarizing the basic information such
as research participants, methods and findings. The following six studies are selected
because books have been published on them, providing detailed information on the
implementation of these studies.
From Table 3.1, it can be concluded that most of the tests studied are large-scale,
high-stakes tests, which mainly fall into two categories—selection tests and profi-
ciency tests.
1. Selection tests: NMET (National Matriculation English Test), HKCEE (Hong
Kong Certificate of Education Examination in English), O-level English Exam.
These tests play such important roles as determining whether candidates will be
allowed to enter higher education.
2. Proficiency tests: IELTS (International English Language Testing System),
PL2000 (PL2000 is a reform project to encourage external certification of learn-
ers’ language proficiency through the examinations of providers such as
Cambridge ESOL), CET (College English Test). These are large-scale profi-
ciency tests, either national or international, which assess candidates’ language
proficiency so that important decisions can be made.
1
Eleven journals on linguistics (foreign language) included in CSSCI (2014–2015) are searched,
which are Foreign Language Teaching and Research, Foreign Language World, Journal of Foreign
Languages, Modern Foreign Languages, Chinese Translators Journal, Foreign Languages in
China, Foreign Language Research, Foreign Language Education, Foreign Languages and Their
Teaching, Computer-Assisted Foreign Language Education, and Foreign Language Learning
Theory and Practice.
28
A change in the test only can hardly fulfill the intended goal of
improving teaching and learning.
Wall O-level test in Teachers; teacher Interviews; questionnaires; Tests have washback on what teachers teach, but not how they
2005 Sri Lanka advisors classroom observations; teach, and there is both positive and negative washback;
document analysis Many factors contribute to or inhibit washback, implying that the
nature of curricular innovation is much more complex.
Hawkey IELTS and Teachers; students; Questionnaires; interviews; IELTS is perceived as a reasonable direct communicative test, using
2006 PL2000 in Italy receiving institution classroom observations; authentic and appropriate content; it is hard but fair and reading and
administrators; school textbook analysis writing are more difficult modules; it is a high-stakes test, causing
heads; parents both anxiety and motivation;
PL2000 appears to have positive effect on communicative teaching
from teachers’ views, but students’ views differ;
It has positive washback on motivation and attitudes, assessment,
exam candidacies and performances;
It helps school management and provides teachers with professional
support.
Theoretical Frameworks for Washback Studies
Green IELTS Course providers; Questionnaires; interviews; Discrepancies exist between IELTS test and participants’ beliefs
3.2
Washback studies are often carried out on these large-scale high-stakes tests
because they tend to exert great impact on language teaching and learning.
Qi (2004) investigated the washback effects of NMET, one of the three compul-
sory tests (the other two are Chinese and mathematics) of the college entrance
examinations in China. As a nation-wide high-stakes test, NMET is expected to
induce intended washback effects of reforming teaching in secondary schools, shift-
ing its focus from language form to language use. In her study, Qi used interviews,
questionnaires and classroom observations to collect data from 8 test constructors,
6 English inspectors, 388 teachers and 986 students. The research findings indicated
that NMET failed to achieve its intended washback because of the overwhelming
functions it was expected to perform—the function of selection imposed constraints
on test design and development, and the function of evaluation boosted test stakes
and stimulated intensive test preparation. As a result, Qi came to the conclusion that
“tests whose stakes are too high will fail to exert the intended washback effect
although high-stakes tests have the potential to do so” (2004: 297).
Cheng (2005) examined the washback of the revised HKCEE, which was meant
to cause changes in classroom English teaching in secondary schools. She collected
data both before and after the revised test was introduced (baseline and follow-up
data), so that a comparison could be made. In her study, Cheng developed a three-
level working model, which was carried out in three phases, investigating (1) deci-
sion making agencies from the top, such as HKEA (Hong Kong Examinations
Authority); (2) intervening agencies such as textbook publishers and tertiary institu-
tions; and (3) implementing agencies such as teachers and students in secondary
schools. The results show that the revision of a test will have washback on the teach-
ing content, but teachers’ attitudes and behaviors will not change much; therefore
the change in the test can hardly fulfill the goal of educational innovation and what
actually gets taught is far more complicated than what a test can determine.
Wall (2005) explored the washback effects of a new O-level English exam in Sri-
Lanka, which, together with a new textbook series, was introduced to bring about
innovations in English classrooms. Both baseline and follow-up data were collected
through interviews and classroom observations. After careful comparison, Wall
came to the conclusion that what teachers teach will be affected more than how
teachers teach.
In his book, Hawkey (2006) reported two washback studies, one on IELTS and
the other on PL2000. The IELTS impact research is part of the IELTS validation
systems, which will contribute to the monitoring and test development process of
IELTS. Interviews, questionnaires and classroom observations were conducted
among candidates, teachers and receiving institution administrators, which led to
generally favorable conclusions: IELTS is perceived as a reasonable direct commu-
nicative performance test, using authentic and appropriate content; IELTS is fair,
but hard, especially Reading and Writing module; IELTS is a high-stakes test, caus-
ing motivation and moderate anxiety. The other washback study was conducted on
PL2000, a language education reform project in Italy, which aimed to provide for-
eign language education in state schools to meet the communication and certifica-
tion needs of students. One of its key policies was to encourage the external
3.2 Empirical Studies on Washback 31
From the above studies, it can be concluded that washback studies are mostly large-
scale, multi-method, multi-phase and longitudinal.
Washback studies are large-scale ones because the tests investigated are often
large-scale and high-stakes tests which affect a wide range of stakeholders.
Therefore, in terms of participants, all the above studies look into washback on
teachers or students, since they are the major stakeholders. Besides, many other
stakeholders are investigated, such as test constructors, school administrators, text-
book writers and parents. As a high-stakes test, the range of its stakeholders seems
to be infinite, so the selection of participants is a vital problem to solve, which is
mainly determined by the research purpose and practicality. For instance, if inten-
tionality of washback is to be explored, it is necessary to investigate not only teach-
ers and students, but also test constructors, so that both the intended and unintended
washback can be identified.
Various methods have been employed in washback studies, among which ques-
tionnaires, interviews and classroom observations are the most frequently used.
Since washback is a complex phenomenon, it is essential to obtain both quantitative
32 3 Theoretical Frameworks for Washback Studies
and qualitative data, to ensure the validity and reliability of the research. Moreover,
document analysis is often conducted, with documents covering related research,
syllabuses, textbooks, and so on. Besides, relevant tests can be designed to compare
the outcome of test-preparation courses and non-preparation courses, just as what
Green (2007) did in the washback study on IELTS. Meanwhile, it should also be
noted that the data collection instruments, such as language tests and interview
questions, must be validated. When designing the instrument, approaches like
brainstorming, expert opinion, reporting and reviewing, piloting and trialing should
be carefully planned and meticulously conducted. Moreover, validation techniques
can also be employed when processing the data collected, such as factor analysis,
multi-trait multi-method validation, triangulation of data collected from different
sources or by different instruments, and so on.
Washback studies are often multi-phase and longitudinal research because they
involve a wide range of stakeholders and employ various research methods. Besides,
it takes time for some washback effects to emerge. For instance, in Cheng’s (2005)
research, a three-phase study was developed, to investigate policy makers, textbook
writers, teachers and students step by step. Moreover, a baseline study had been
conducted to collect data before the revised test was introduced, and then more
research was done afterwards. Therefore, it is essential to conduct a diachronic
study on washback, to explore its causes, working mechanisms and evolving effects.
As for research findings, different conclusions have been proposed, with differ-
ent research foci in each study. In Qi’s (2004) research, it is concluded that high-
stakes tests can hardly realize their intended washback because of the overwhelming
functions they are expected to perform; while in Gu’s (2007) study, the conclusion
is that positive washback of CET far outweigh its negative washback and the nega-
tive washback is mainly caused by the misuse of the test. Meanwhile, similar find-
ings have been reported. In many studies (Cheng 2005; Qi 2004; Wall 2005), it is
found that a test will have washback effects on the teaching content, but can hardly
affect the teaching method. Another common conclusion is that washback is a com-
plex issue which varies from participant to participant and which is mediated by
many other factors apart from test design and test quality.
To sum up, washback study is comprehensive and systematic, which can be con-
ducted either in the micro context (the school setting) or in the macro context (the
educational system or the society), and which involves various factors, such as the
test factor (test content, test use, etc.), the prestige factor (test stakes) and the per-
sonal factor (characteristics of teachers, students, etc.). Therefore, it is impossible
and impractical to develop an inclusive approach which may cover all aspects of
washback. In this sense, when planning a washback study, it is important to define
the research purpose first, and then relevant participants and methods can be
selected. The research findings, whatever they are, may contribute to establishing a
washback framework.
3.3 Summary 33
3.3 Summary
This chapter has reviewed related literature on washback models and empirical
studies, and several key points in washback studies have emerged in the analysis,
which are summarized as follows:
1. To have a clear research purpose. A test may affect not only the classroom set-
ting, but also the educational system, even the society as a whole. Therefore, a
washback study may involve students, teachers, school administrators, textbook
writers, policy makers… It is impractical and impossible to investigate all the
stakeholders, so a clear research purpose is essential. Once a research purpose is
defined, whether to investigate washback in the micro or macro context, the rel-
evant research participants can be selected, and then the research methods. In
this sense, the research will be more emphasized.
2. To investigate different dimensions of washback. Almost all washback studies
focus on its direction, such as whether it is positive or negative, or whether its
positive washback outweighs its negative washback. However, apart from direc-
tion, other dimensions can also be explored, such as whether a test has realized
its intended washback, and whether the washback effects will last a long time.
Therefore, in a washback study, efforts can be made to probe into various dimen-
sions of washback effects, to obtain a more comprehensive picture of the
phenomenon.
3. To draft a practical washback model. Quite a few washback models have been
proposed, such as Wall’s Washback Hypotheses and Bailey’s Basic Model.
However, it is undecided as to which one can best depict the washback phenom-
enon and illustrate its working mechanism. It still remains a challenging issue in
washback studies, since washback is such a complex phenomenon that it seems
impossible to draft a model to cover every aspect of it. Nevertheless, a theoretical
model is essential. Therefore, it is necessary to draft a washback model with
focused aims to guide the empirical research.
4. To employ appropriate research methods. A multi-method, multi-phase and lon-
gitudinal approach will be adopted to investigate the washback effects of
TEM. Large-scale questionnaires can be conducted to collect quantitative data
from a wide range of stakeholders, and interviews and classroom observations
can be conducted as case study to collect qualitative data. Moreover, data col-
lected in the previous phase may assist the development and implementation of
research in the coming phases. In this sense, data collected can be triangulated
and data collection instruments can be validated.
In conclusion, a theoretical framework should be developed to guide the present
study, taking the above four problems into consideration.
Chapter 4
Research Framework for TEM Washback
Study
Based on the above theoretical models and empirical studies, a TEM washback
model (Fig. 4.1) is drafted, to illustrate its causes, contents and mechanisms.
This model, as shown in Fig. 4.1, incorporates Bailey’s basic model and Green’s
model, which demonstrates basic factors that may contribute to the generation of
washback effects, as well as the working mechanism of TEM washback. The model
will be explained in detail in the following part.
From TEM washback model, it can be concluded that TEM washback is mediated
by the following three factors:
1. The overlap between test construct and test characteristics. The test construct of
TEM is stated in its test syllabus, and the test characteristics including test con-
tent, method, test result interpretation and use, etc. The overlap between test
construct and test characteristics may contribute to the direction of washback—
the greater the overlap, the more likely a test will exert positive washback.
Conversely, negative washback will be generated, if a test fails to test what is
stated in its construct (the part to the left of the “overlap”, e.g. the oral part is
missing in some English tests which claim to measure all the four language skills
of listening, speaking, reading and writing), or if a test tests something not stated
in its construct (the part to the right of the “overlap”, e.g. an English reading
exam tests candidates’ knowledge in a specific field, such as law or commerce);
2. Test stakes and test difficulty. These may contribute to the intensity of washback,
with important and difficult, but still attainable tests exerting the most intense
washback. In this sense, washback of TEM4 and TEM8 may be different, and
Positive
Negative
Negative
stakes difficulty
Material writers
PARTICIPANTS
Curriculum designers
Students Teachers Researchers Experts
School administrators
Etc.
TEM washback may vary from school to school, because of the difference
between tests and the gap between schools.
3. Participants’ perceptions and characteristics. Washback effects of TEM vary
from participant to participant because of their different perceptions of the test
and their unique characteristics. Participants differ in how they perceive test dif-
ficulty, test importance, test purpose, test use, and so on. Moreover, participant
characteristics and values also differ, such as their previous knowledge and
beliefs, available resources, and so on.
This model (Fig. 4.1) combines Wall’s Washback Hypotheses with Bailey’s basic
model, illustrating both the content and the mechanism of TEM washback.
4.1 Framework for TEM Washback Study 37
TEM will first influence its stakeholders, whose perceptions of English teaching
and learning will change accordingly. The stakeholders include teachers, students,
school administrators, textbook writers, researchers, and so on. Here, a new cate-
gory is added—the experts. Experts in TEM refer to TEM test designers, construc-
tors and the Teaching Syllabus designers. It is important to investigate TEM experts
because they are familiar with TEM, its implementation, related policies and the
intended washback. Therefore, their opinions can provide us with an illuminating
insight into TEM and its washback (refer to Sect. 1.1 for more information). Of
course, the stakeholders can be more than those listed in the model, if impact in the
macro level is to be investigated.
Under the influence of TEM, the stakeholders’ learning, teaching, and designing
processes will change, yielding different products. Moreover, every category of par-
ticipants, processes and products may influence each other, as indicated by the
arrows. For instance, teachers’ teaching can be affected by the textbooks, the
research findings and the new syllabuses. Besides, the participants, processes and
products may influence the test, leading to test revision and reform, which is termed
“washforward” by Van Lier (1989).
To better apply this model to TEM washback study, it is necessary to clarify a few
terms, namely, TEM test construct, TEM test characteristics, overlap, and positive
and negative washback.
Test construct is the trait or traits that a test is intended to measure, which, in the
case of a language test, refers to the language ability to be measured (Davies et al.
1999). As for TEM, its construct is stated in its test syllabuses, which include
Syllabus for TEM4, Syllabus for TEM8, Syllabus for TEM4-Oral and Syllabus for
TEM8-Oral, since TEM is composed of these four separate tests. Moreover, lan-
guage abilities stated in the Teaching Syllabus should also be included when defin-
ing TEM construct, since the test purpose of TEM is to examine whether candidates
have met the required levels of English abilities specified in the Teaching Syllabus
(Jin and Fan 2011).
Test characteristics refer to the test content, method, marking, as well as the use and
interpretation of test results. In this sense, TEM test characteristics include its test
contents (listening, reading, writing, speaking, translation, grammar and vocabu-
lary, and so on), test methods (item types used, such as multiple choice, blank
38 4 Research Framework for TEM Washback Study
4.1.3.3 Overlap
Overlap means the extent to which the test characteristics can reflect the test con-
struct; that is, whether TEM test can elicit performance which can reflect the English
ability stated in the Syllabuses. For example, it is stated in both the Teaching
Syllabus and the Test Syllabus that candidates should be able to follow news broad-
cast from native countries, and in both TEM4 and TEM8, news broadcast is a part
of listening comprehension. In this sense, TEM characteristics overlap with TEM
construct. However, if test characteristics and test construct do not match, it would
be termed either “construct under-representation” or “construct-irrelevant variance”
(Messick 1996). Construct under-representation means that some language abilities
in the test construct are not measured. For instance, speaking is a language skill
listed in the Teaching Syllabus, but only a small fraction of the candidates are eli-
gible to take the oral test, so this skill in TEM construct is under represented in the
test. On the contrary, construct-irrelevant variance refers to the problem that test
characteristics fail to reflect test construct. For instance, in a reading test, construct
irrelevant variance may be caused by background or cultural knowledge. The item
which tests candidate’s cultural knowledge is irrelevant to the reading ability to be
measured, and will lead to misinterpretation of the test scores.
The overlap and mismatch between TEM construct and TEM characteristics
would, to some extent, determine whether the test will exert positive or negative
washback.
Therefore, in the current study, two principles will be followed when judging the
direction of washback effects. First, participants’ evaluation of washback will be
sought separately. Students, teachers, school administrators and experts will all be
investigated about their opinions on the washback direction in their respective ques-
tionnaires, and in classroom observations, teachers’ and students’ behavior will be
analyzed separately, to determine the different washback on them. Moreover, a dis-
tinction will be made between positive washback in the micro context and in the
macro context. Positive washback in the micro context refers to immediate but
short-term benefits, such as improving candidates’ test performance, enhancing
efficiency of teaching and learning, and so on; while positive washback in the macro
context will last a long time, beneficial to improving candidates’ abilities. For
instance, teachers’ moderate use of the mother tongue may help clarify the testing
demands more efficiently, so it can be considered positive in the micro context;
however, the excessive use of mother tongue, more than 10–15% as suggested by
Macaro (2005), may impede the development of learners’ language ability, so it is
negative in the macro context.
As a high-stakes test, TEM will affect not only teachers and students, but also the
educational system, and even the society as a whole. The current study will focus on
the washback effects of TEM on college education for English majors, so only the
stakeholders in the educational system are investigated, while stakeholders like
employers and parents are not included.
As outlined in the TEM washback model (Fig. 4.1), the two major aspects to be
investigated are contributing factors to TEM washback and TEM washback mecha-
nism, and the participants involved are students, teachers, school administrators
(material writers, curriculum designers, etc.), researchers and experts.
In reference to the TEM washback model, a muti-phase and multi-method
research design has been developed to investigate the washback effects of TEM on
experts, school administrator, teachers and students, to explore their perception and
evaluation of TEM, as well as the impact of TEM on their teaching or learning prac-
tice. Moreover, various research methods will be employed, such as document anal-
ysis, questionnaire surveys, classroom observations and interviews.
As indicated in Table 4.1, the current research can be conducted in the following
stages. Firstly, such documents as the Teaching Syllabus, TEM syllabuses, and
research findings on washback and TEM will be analyzed. The research findings of
TEM studies and washback studies are summarized in Sects. 1.2.4 and 3.2 respec-
tively and in Sect. 1.4 a comparison is made between the Teaching Syllabus and
TEM characteristics. Next, in the second stage, large-scale questionnaire surveys
are conducted among experts, program administrators, teachers and students, to col-
lect their basic information and to investigate their perception and evaluation of
40 4 Research Framework for TEM Washback Study
Table 4.1 A multi-phase, multi-method research design for TEM washback study
Focus Instruments
TEM-related researches Document analysis
The Syllabus and TEM Syllabus Document analysis
Candidates Questionnaire survey among candidates
Classroom observation
Interview
Teachers Questionnaire survey among teachers
Classroom observation
Interview
Administrators Questionnaire survey among administrators
Interview
Experts Questionnaire survey among experts
Interview
TEM lesson observation Classroom observation
TEM, as well as the washback of TEM on them. Altogether seven questionnaires are
designed, one for experts, and the washback of TEM4 and TEM8 is investigated
separately among relevant program administrators, teachers and students. Then,
based on the above research findings, some sample classes are selected for observa-
tion, to explore the washback of TEM on classroom teaching and learning. Last,
supplementary interviews will be conducted among selected participants, to seek
answers to problems emerged in the questionnaire surveys and classroom
observations.
4.2 Participants
Major participants of the current study are experts, program administrators, teachers
and students. Experts are specialists in English education who have either partici-
pated in the Teaching Syllabus design or TEM development, and some of them are
also members of NACFLT, an advisory committee in the Ministry of Education
which has been established to guide foreign language education in China. They
have a deep understanding of the language education policies in China, the Teaching
Syllabus and TEM. Moreover, some of them are also administrators in their respec-
tive universities and professors who teach English courses. Therefore, their opin-
ions may provide us with illuminating insights into TEM and its washback, to help
us understand this issue from the perspective of supervisors, administrators and
teachers.
4.3 Instruments 41
4.3 Instruments
Major research instruments used in the current study are questionnaires, classroom
observations and interviews, which will be introduced in detail in the following part.
In early 2012, five expert questionnaires were distributed through personal contact,
all returned and valid. The questionnaires for TEM8 administrators, teachers and
candidates were distributed in March, 2012, together with the test booklets, by spe-
cial delivery. The administrators’ questionnaire and teachers’ questionnaire were
mailed to all the 809 colleges and universities that had students take TEM8, one
copy each school for the administrator, and five copies for related teachers. These
questionnaires had been distributed and finished before TEM was administered, and
returned together with the test paper, but in a separate package. The students’
4.3 Instruments 45
The valid questionnaires were put into computer by the present researcher with the
help of six college students and two graduate students. The answers to multiple-
choice and five-point Likert-scale questions were processed by SPSS 17.0 (Statistical
Product and Service Solutions 17.0), while answers to short-answer questions were
collected, sorted, double-checked, and saved in a Microsoft Word file.
Due to its limited number, only descriptive statistical analysis were applied to
expert’s questionnaire, reporting such results as frequency and mean, while inferen-
tial statistical analysis such as factor analysis and t-test were not conducted. The
experts’ perception and evaluation of TEM and its washback will be reported in
Sect. 5.1.
As for the other six questionnaires, both descriptive and inferential analyses were
performed. In this chapter, the results of reliability test and factor analysis are
reported, while a further discussion on the views of administrators, teachers and
candidates will be provided in Chap. 5.
Reliability Test
A reliability test was run on the questionnaire data of all the objective items, includ-
ing both five-point Likert scale items and multiple-choice questions. The question-
naires were analyzed separately, and for each questionnaire, two reliability tests
were performed, one on all the objective items and the other on five-point Likert
scale items only. The results are listed in the following tables (Table 4.4).
From Table 4.4, it can be seen that Cronbach’s Alpha is above 0.8 for all ques-
tionnaires, and Cronbach’s Alpha for five-point Likert scale items only is above 0.9,
which attests to the satisfactory internal consistency of the questionnaires.
Factor Analysis
An independent factor analysis was carried out on all the five-point Likert-scale
items of each questionnaire to better explain the research findings. First, KMO and
Bartlett’s Tests were conducted, the results of which proved satisfactory for further
analysis (Table 4.5).
In factor analysis, methods of principal components and varimax rotation were
employed because they maximize variance and facilitate the interpretation of the
constructs deduced (Lu 2006: 477–483). In view of the arbitrary nature of factor
extraction, and practicality and meaningful interpretability, the following three cri-
teria were observed in data reduction: (1) the eigenvalue was greater than 1 and
there were more than 3 items in one factor; (2) factor loadings lower than 0.4 were
deleted and not counted in any factor; (3) when double loadings occurred, decisions
were made on meaningful interpretations.
4.3 Instruments 47
Table 4.4 Reliability test for AQ-4, TQ-4, SQ-4, AQ-8, TQ-8 and SQ-8
Cronbach’s Alpha N of Items
AQ-4: all objective items .869 116
Five-point Likert scale items .962 58
TQ-4: all objective items .847 130
Five-point Likert scale items .931 71
SQ-4: all objective items .936 200
Five-point Likert scale items .961 118
AQ-8: all objective items .903 116
Five-point Likert scale items .961 58
TQ-8: all objective items .845 106
Five-point Likert scale items .934 49
SQ-8: all objective items .948 203
Five-point Likert scale items .966 121
Note: The number of items in the reliability test for five-point Likert scale items may exceed those
in the questionnaires because some multiple-choice questions with five options arranged in degree
are included
Based on the above criteria, factor analysis was conducted on the five-point
Likert scale items in each questionnaire, and the factors extracted and variance
explained are listed in Table 4.6.
Based on the three criteria mentioned above, factors were extracted from each
questionnaire and named according to the items included. The explanation of the
rotated component matrix of administrators’ questionnaires on TEM4 and TEM8
will be grouped together, because they are of similar content and layout, and will be
analyzed together in Chap. 5. The same applies to the teachers’ questionnaires and
the students’ questionnaires.
Program Administrators’ Questionnaires:
AQ-4: Seven factors have been extracted and named:
Factor 1: washback of TEM4 on English language teaching (16 items);
Factor 2: intended use and interpretation of TEM4 scores (11 items);
Factor 3: evaluation of TEM4 (8 items);
Factor 4: unintended use and interpretation of TEM4 scores (6 items);
Factor 5: attitude to unintended uses of TEM4 scores (4 items);
Factor 6: familiarity with TEM4 (5 items).
Factor 7: washback of the 2005 TEM4 revision on English language teaching (4
items);
48 4 Research Framework for TEM Washback Study
(continued)
Table 4.6 (continued)
Time spent on .71 .59
TEM4
preparation after
class
Time spent daily .65 .62
on TEM4
preparation after
class
Mock test .64 .59
papers done
Time spent on .46 .58
extra-curricular
TEM4 training
course
Affecting .85 .81
students’ future
career
Affecting .83 .79
students’ further
education
Familiarity with .66 .71
time allotment
Familiarity with .64 .71
score allotment
Time spent on .50 .66
TEM4
vocabulary
before TEM4
Time spent on .42 .67
reading
grammar books
before TEM4
Causing .40 .58 .58
pressure
Attitude to .63 .51
using TEM4
score to confer
degrees
Eigenvalue 25.1 6.8 6.1 4.3 3.6 3.0 2.7 2.4 2.3 2.1 1.9 1.9 1.7 1.6 1.5 1.4 1.3 1.2 1.2 1.1 1.1 1.0 1.0 1.0
Variance 21.3 5.8 5.2 3.6 3.1 2.6 2.3 2.0 2.0 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1.1 1.0 1.0 0.9 0.9 0.9 0.9 0.8
explained (%)
Accumulated 21.3 27.1 32.3 36.0 39.1 41.7 44.1 46.2 48.2 50.0 51.6 53.3 54.7 56.2 57.4 58.7 59.8 60.9 61.9 62.9 63.8 64.7 65.6 66.5
variance
explained (%)
c
Difference between TEM4 preparation and non-preparation classes
SQ-8: 24-factor rotated component matrix
Item Factor Communality
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Preference for .76 .75
writing
Preference for .76 .76
reading
Preference for .73 .76
speaking
(continued)
Table 4.6 (continued)
Preference for .71 .73
translation
Preference for .69 .73
listening
Preference for .68 .64
other activities
Preference for .66 .72
textbook
exercise
explanation
Preference for .65 .65
text explanation
Preference for .64 .65
TEM8 test
paper
explanation
Preference for .63 .70
doing textbook
exercises
Preference for .54 .62
doing TEM8
test papers
Difference in .77 .77
writing
activitiesd
Difference in .76 .76
reading
activities
Difference in .74 .73
listening
activities
Difference in .72 .71
speaking
activities
Difference in .72 .74
translation
activities
Difference in .69 .69
explaining texts
Difference in .67 .72
explaining
textbook
exercises
Difference in .66 .69
doing textbook
exercises
Difference in .63 .65
other activities
Difference in .59 .50 .68
explaining
TEM8 test
papers
Difference in .53 .53 .69
doing TEM8
test papers
(continued)
Table 4.6 (continued)
Washback on .78 .75
learning depth
Washback on .78 .75
learning
schedule
Washback on .75 .71
learning attitude
Washback on .74 .69
course selection
Washback on .73 .69
learning method
Washback on .72 .72
learning content
Washback on .72 .70
learning aims
Washback on .68 .66
learning speed
Overall .44 .54
washback on
students
Difference in .67 .73
TEM8 writing
practice after
classe
Difference in .66 .70
TEM8 reading
practice after
class
Difference in .65 .72
TEM8
mini-lecture
practice after
class
Difference in .65 .62
reading culture
books after class
Difference in .64 .68
TEM8 general
knowledge after
class
Difference in .64 .40 .74
TEM8
translation
practice after
class
Difference in .63 .67
listening to the
radio after class
Difference in .63 .72
TEM8
error-correction
after class
(continued)
Table 4.6 (continued)
Difference in .61 .73
free writing
practice after
class
Difference in .59 .62
reading practice
after class
Difference in .58 .41 .70
TEM8 listening
practice after
class
Difference in .53 .73
speaking
practice after
class
TEM8 affecting .70 .68
translation
proficiency
TEM8 affecting .70 .69
listening
proficiency
TEM8 affecting .69 .71
reading
proficiency
TEM8 affecting .67 .66
writing
proficiency
TEM8 affecting .67 .69
grammar
proficiency
TEM8 affecting .63 .66
note-taking
proficiency
TEM8 affecting .62 .62
MCQ test taking
techniques
TEM8 affecting .59 .62
speaking
proficiency
TEM8 affecting .58 .63
general
knowledge
proficiency
TEM8 affecting .56 .71
vocabulary
proficiency
Better arranging .74 .68
learning
activities
Clarifying .74 .67
learning
requirements
Improving .73 .67
learning
efficiency
(continued)
Table 4.6 (continued)
Promoting .70 .70
English leaning
Improving .66 .64
English
proficiency
Spending more .66 .64
time learning
Identifying .65 .58
weaknesses
Providing .62 .63
learning aims
Improving .52 .55
language
proficiency by
doing test
papers
Time spent on .71 .72
TEM8 listening
before TEM8
Time spent on .69 .68
TEM8 reading
before TEM8
Time spent on .68 .72
TEM8
mini-lecture
before TEM8
Time spent on .68 .70
TEM8
error-correction
before TEM8
Time spent on .68 .66
TEM8 general
knowledge
before TEM8
Reading .80 .78
activities in
TEM8
preparation
classes
Writing .80 .80
activities in
TEM8
preparation
classes
Speaking .79 .78
activities in
TEM8
preparation
classes
Listening .77 .71
activities in
TEM8
preparation
classes
Translation .76 .72
activities in
TEM8
preparation
classes
(continued)
Table 4.6 (continued)
Other activities .62 .59
in TEM8
preparation
classes
Reflecting .73 .67
translation
proficiency
Reflecting .72 .64
reading
proficiency
Reflecting .71 .62
listening
proficiency
Reflecting .70 .63
writing
proficiency
Reflecting .70 .68
overall language
proficiency
Assessing .62 .63
learning
outcome
Reflecting .53 .47
general
knowledge
proficiency
Predicting .52 .63
future
performance
Time spent on .72 .70
free writing
before TEM8
Time spent on .72 .65
reading before
TEM8
Time spent on .71 .69
speaking before
TEM8
Time spent on .68 .65
listening to the
radio before
TEM8
Time spent on .51 .50 .75
TEM8 writing
before TEM8
Time spent .42 .51 .60
reading culture
books before
TEM8
Reading .72 .71
proficiency
improved in the
advanced stage
Speaking .71 .71
proficiency
improved in the
advanced stage
Listening .70 .73
proficiency
improved in the
advanced stage
(continued)
Table 4.6 (continued)
Writing .68 .70
proficiency
improved in the
advanced stage
Translation .61 .69
ability improved
in the advanced
stage
General .54 .61
knowledge
improved in the
advanced stage
Hindering .79 .69
language
learning
Taking up time .78 .67
for skill training
Satisfied with .71 .59
guessing the
correct answer
for MCQ
No washback at .52 .61
all
Affecting .52 .58
student ranking
Familiarity with .73 .66
the Teaching
Syllabus
Familiarity with .70 .63
TEM8-Oral Test
Syllabus
Familiarity with .56 .64
TEM8 marking
criteria
Familiarity with .53 .64
TEM8 Test
Syllabus
Doing TEM8 .75 .68
test papers in
TEM8
preparation
classes
Explaining .74 .69
TEM8 papers in
TEM8
preparation
classes
Doing textbook .74 .74
exercises in
TEM8
preparation
classes
(continued)
Table 4.6 (continued)
Explaining text .70 .75
exercises in
TEM8
preparation
classes
Explaining texts .65 .65
in TEM8
preparation
classes
Familiarity with .65 .68
TEM8 time
allotment
Familiarity with .63 .67
TEM8 score
allotment
Affecting .82 .76
students’ future
career
Affecting .80 .74
students’ further
education
Time spent on .69 .59
TEM8
preparation after
class
The number of .62 .63
mock test
papers done
Time spent on .52 .52
extra-curricular
TEM8 training
course
Time spent daily .48 .58
on TEM8
preparation
Time spent on .44 .53 .75
TEM8
translation after
class
Time spent on .70 .78
TEM8
vocabulary after
class
Difference in .49 .51 .72
TEM8
vocabulary
practice after
class
Causing .60 .58
pressure
Attitude to .73 .60
using TEM8
score to confer
degrees
Enlarging .63 .64
vocabulary by
reciting TEM8
words
Eigenvalue 27.3 8.2 5.6 4.1 3.5 3.4 2.9 2.5 2.4 2.2 2.1 1.9 1.7 1.7 1.5 1.4 1.3 1.2 1.2 1.2 1.0 1.0 1.0 1.0
(continued)
Table 4.6 (continued)
Variance 22.6 6.8 4.6 3.8 2.9 2.8 2.4 2.0 2.0 1.8 1.7 1.5 1.4 1.4 1.2 1.1 1.0 1.0 1.0 1.0 0.9 0.8 0.8 0.8
explained (%)
Accumulated 22.6 29.4 34.1 37.9 40.8 43.6 46.0 48.1 50.1 51.9 53.7 55.3 56.7 58.1 59.4 60.6 61.7 62.7 63.8 64.8 65.7 66.6 67.4 68.2
variance
explained (%)
d
Difference between TEM8 preparation and non-preparation classes
e
Difference in after class learning efforts between TEM8 preparation and non-preparation stages
4.3 Instruments 91
In the current study, altogether 52 classes have been observed and each class is
around 45 min long. The following three principles were followed to select proper
English classrooms to be observed..
First, both top universities and less established schools were selected.
Questionnaire survey findings indicate that students of various language proficiency
levels have been affected differently by TEM, and that teachers from schools of dif-
ferent levels responded variedly to TEM (see Chap. 5). Therefore, in classroom
observation, special effort has been made to select English classrooms from both
universities and colleges (schools that offer a three-year program for their English
majors). However, due to the fact that TEM8 is not required for candidates taking a
three-year program, only university classrooms are observed for TEM8 related
courses.
Second, both TEM-preparation courses and non-preparation courses were
selected. As shown in the research findings of the questionnaire surveys (see Chap.
5), some schools offered TEM preparation courses, mainly comprehensive ones
teaching all language skills covered in TEM. Moreover, teachers and candidates
claimed that TEM preparation courses differed from non-preparation ones. In this
sense, both types of courses have been observed to investigate whether they do dif-
fer and whether they differ in the same way as claimed by teachers and students.
Since TEM is a comprehensive test and the TEM preparation courses observed are
often comprehensive ones, equivalent non-preparation courses should also be com-
prehensive ones which deal with all language skills. Therefore, fundamental English
and advanced English courses have been selected. Moreover, it is common practice
for some schools to offer TEM preparation in fundamental English and advanced
English courses in the semester when TEM4 and TEM8 are held or in the previous
semester. Therefore, the TEM preparation classes observed here are of two types,
one is in a separate course named TEM Preparation Course, and the other is the
TEM preparation classes delivered in fundamental English or advanced English
courses. In this sense, whether a class is considered TEM preparation class or non-
preparation class is determined by the teaching content of that class.
Last, teachers who teach both TEM preparation and non-preparation courses
were selected. It is hoped that the TEM preparation and non-preparation courses can
be delivered by the same teacher, so that teacher variance can be limited.
Based on the above principles, six teachers were selected for observation. They are
all English teachers with TEM preparation experience. They have taught both TEM
preparation and non-preparation classes, but their educational background and
teaching experience differ considerably (refer to Sect. 6.1 for detailed
information).
96 4 Research Framework for TEM Washback Study
Categories four to seven employ five-point Likert scale to measure the fre-
quency of each activity, such as whether the teacher extremely frequently, fre-
quently, sometimes, occasionally or seldom (from 5 to 1) uses English in class.
5. Atmosphere, how often students laugh in class. This is counted by how many
times students laugh.
The observation scheme had been piloted in three English classes and revised for
three times before it was finalized.
Altogether 52 classes of the six teachers have been observed and it took several
semesters (from 2012 to 2014) to observe all the classes, because in some schools,
TEM preparation and non-preparation courses were not open in the same semester.
To avoid interfering in the classes, the researcher did not observe in the class-
rooms, but used digital voice recorder to record all the classes, which were analyzed
by the researcher afterwards. Of the six teachers, two agreed to have their class
video recorded once, so two individual class hours were recorded for each of them,
and the recording was transcribed to provide a clearer picture of the characteristics
of the English classes (see Appendix II for the transcript). To achieve variety, the
classes video recorded are one TEM8 preparation class and one non-preparation
class of fundamental English.
The analysis was conducted mainly by coding the recorded classes with the help of
the observation scheme. The following procedures had been observed when coding
each lesson:
1. Listen to the recording once without doing any coding in the observation scheme.
Just note down on a blank piece of paper the major classroom events—what was
taught, how they were organized, and what kind of teaching materials were used.
In this way, the class was divided into several parts;
2. Listen to the recording again, using a stop watch and the time meter in the com-
puter to record the time spent on each activity. After listening, calculate the over-
all time spent on each item in the content, organization and materials, and fill in
the blanks in the first three categories;
3. Listen to the recording once more, focusing on the communicative features of
the classroom activities, such as the English used, how the teacher and students
responded to each other, and what the teacher focused on when they provided
students with feedback information;
4. Listen to the recording for the last time, check the previous information and
count how many times students laugh.
98 4 Research Framework for TEM Washback Study
After all the classes had been coded, data were put into computer and processed
by SPSS 17.0. Findings of classroom observation will be discussed in Chap. 6.
4.3.3 Interviews
Interviews were a supplementary instrument used in this research. They were con-
ducted to facilitate the development of the questionnaires and the classroom obser-
vation scheme, to identify possible causes of conflicting research findings, and to
verify statistical results in questionnaire surveys. The main participants are teachers
(including the six teachers observed), students (including those whose English
classes had been observed), PhD candidates (most of whom were also teachers), and
school administrators. The interviews were either conducted individually (mainly
with teachers and school administrators) or in groups (mainly with students and
PhD candidates).
Four major rounds of interviews were carried out before and after the question-
naire surveys and the classroom observations. Pre-questionnaire and pre-observa-
tion interviews were conducted to facilitate instrument development, so the results
were the questionnaires and the observation scheme. The participants were asked
about their opinions on the questionnaires and the observation scheme. Therefore,
no fixed questions had been designed, and neither had the interviews been recorded.
Post-questionnaire and post-observation interviews were meant to solve puzzles,
so relevant questions had been designed before the interview (see Appendix III for
the interview questions). Moreover, the interview was recorded using a digital voice
recorder for further analysis. Key findings of the interview had been collected,
sorted, double-checked, and saved in a Microsoft Word file, which shall be dis-
cussed in Chaps. 5 and 6.
The following measures have been taken in the development of the research instru-
ments—the questionnaires, classroom observation scheme and interview guiding
questions.
Questionnaires are a useful tool to collect information from a large number of peo-
ple. However, it is difficult to design a valid and reliable questionnaire because the
researcher and the participants can’t communicate in person when the questionnaire
surveys are conducted. Therefore, it is of vital importance for the researcher to
expect all possibilities and try to avoid ambiguity, misunderstanding and missing
information. It does not mean that the questionnaire should be perfect, but a great
deal of effort should be devoted to make the questionnaires as valid and reliable as
possible. In this regard, the current researcher has taken the following steps in the
designing, trialing and revision of the seven questionnaires used in the research.
First, before drafting the questionnaires, a detailed analysis had been made on
the washback theories, such as the definitions, dimensions and models, to explore
the construct of washback studies.
Second, a careful review was carried out on all the available empirical studies on
washback, to gain insight from relevant questionnaire surveys conducted before.
Third, the researcher had consulted with experts and school administrators, dis-
cussed with PhD candidates and colleagues, and communicated with teachers and
students, to obtain a clear picture of the possible TEM washback.
Last, two rounds of piloting were performed among school administrators, teach-
ers and students, and the expert’s questionnaire was only piloted once due to the
limited number of experts available. Based on the results of the pilot study, a com-
100 4 Research Framework for TEM Washback Study
parison was made between the prediction of participant responses and their actual
responses. Meanwhile, every item was checked again to discern their contribution
to the corroboration and refutation of the washback hypotheses established. Based
on the above findings, relevant revisions were made accordingly and the operational
version of the questionnaires was finalized for large scale administration.
In the current research, quite a few interviews have been conducted to seek the opin-
ions, advice, and suggestions from experts, PhD candidates, school administrators,
teachers and students. Among all the interviews, the post-questionnaire and post-
observation interviews were conducted formally, with guiding questions prepared
4.4 Considerations for Validity and Reliability 101
beforehand. The purpose was to collect qualitative data to confirm research findings
from the questionnaire surveys and classroom observations, to seek answers to the
contradictory findings, and to obtain supplementary information when further ques-
tions need to be asked based on previous research findings. When designing the inter-
view questions, the following steps have been taken to ensure validity and reliability.
First, a detailed analysis was made on the research findings of the questionnaire
surveys and classroom observations, to locate the problems that need to be tackled.
Second, the researcher has consulted the experts, PhD candidates, school admin-
istrators, teachers and students for their opinions on the interview questions.
Last, the interview questions had been piloted twice among teachers and stu-
dents, and revised accordingly, before the final version was completed.
Practical measures have been taken when collecting and analyzing data through
questionnaire surveys, classroom observations and interviews, to ensure validity
and reliability of the data.
Altogether seven questionnaire surveys were conducted among experts, TEM4 pro-
gram administrators, TEM8 program administrators, TEM4 teachers, TEM8 teach-
ers, TEM4 students and TEM8 students. The expert’s questionnaire was distributed
in a TEM construction conference, and no time limit was set for the survey. Four
experts returned the questionnaires during the conference, and one took it back
home to finish and mailed it to the researcher 2 weeks later. All the experts provided
their names and E-mail addresses, willing to offer additional information if needed.
The other six questionnaires were mailed to the TEM participating schools
together with the TEM test papers in 2012, but in a separate package which could
be opened before the administration of the test. One copy of program administra-
tor’s questionnaire and five copies of teacher’s questionnaire were sent to each TEM
participating school. The personnel in charge of TEM in each school were requested
to find the relevant administrators and teachers to complete the questionnaires. No
time limit was set and they may return the questionnaire any time before the admin-
istration of TEM. The student’s questionnaire was distributed to 250 schools
selected from all the TEM participating schools, based on region, school type and
school level, to ensure that the research covered a representative sample. The ques-
tionnaire survey was conducted in the test room after the TEM test was completed,
and no time limit was set. All the six questionnaires were conducted anonymously,
to ensure that the participants would feel free to voice their opinions. Moreover, if
they were unwilling to complete the questionnaires, they could just refuse or leave
the questionnaires unfinished. The blank questionnaires and questionnaires with
102 4 Research Framework for TEM Washback Study
more than 20% of the questions unanswered would be sorted out and marked as
invalid questionnaires. This may account for the fact that the valid rate of student’s
questionnaire is comparatively low, 71.7% for TEM4 student’s questionnaire and
56.5% for TEM8 student’s questionnaire. Some candidates were unwilling to par-
ticipate in the survey after taking TEM, so they did not complete the questionnaires.
Since it was an anonymous questionnaire, they felt free to hand in a blank or incom-
plete questionnaire. This practice would, to some extent, uphold the validity of the
questionnaire surveys, because the participants were more likely to voice their true
beliefs when they were willing to complete the questionnaire.
Three criteria were followed while sorting the questionnaires (refer to Sect.
4.3.1.2), and the researcher did all the sorting by herself, because it is believed that
the sorting criteria could be better observed if it was done by the same person. This
may also contribute to the reliability of the questionnaire surveys.
The questionnaire data were put into computer by the present researcher with the
help of six college students and two graduate students. Answers to multiple-choice
and five-point Likert scale questions were processed by SPSS 17.0, while answers
to short answer questions were stored in a Microsoft Word file. The SPSS data files
and the Word files were created by the researcher, and the eight research assistants
were requested to work in pairs to enter data, one reading aloud the answer and the
other inputting data, so that their work was double-checked, which ensured the reli-
ability of data.
Moreover, a reliability test was performed on the questionnaires to ensure their
inner consistency, and factor analysis was conducted to discern their validity (due to
its limited sample, the expert’s questionnaire did not undergo these tests). The test
results indicate that the reliability and validity of the questionnaire surveys were
satisfactory (refer to Sect. 4.3.1.3 for the test results).
According to Patton (1990), one major threat to reliability and validity of observa-
tion data is the presence of a “stranger” in the classroom—the effect of the observer
on what is observed. To reduce this negative effect, the researcher decided to record
the classes for further analysis, instead of sitting in the classrooms observing. Each
teacher observed was provided a digital voice recorder, and they were free to decide
which class to record and how many classes to record. It was suggested that they
record classes on different themes, such as writing, translation, reading, and so on,
if it was TEM preparation classes, so that the classes observed could cover a wide
range. While for non-TEM preparation classes, they were requested to record sev-
eral consecutive classes because these classes would usually cover one unit, thus
focusing on various teaching content, such as vocabulary building, text analysis and
exercise explanation. When the teachers felt ready to have their classes recorded,
they could just switch on the digital voice recorder on their desk, which would not
interfere with their classroom teaching. In this sense, the validity of the observation
data collected can be ensured.
4.4 Considerations for Validity and Reliability 103
Another threat to reliability is inconsistent coding, which has been tackled in the
following three ways.
First, a four-step coding procedure was developed to standardize the coding pro-
cess, so that no important information would be missed (refer to Sect. 4.3.2.5 ).
Second, a PhD candidate majoring in English pedagogy and testing was invited
to recode five classes (10% of the total classes observed), which were randomly
selected from all the 52 classes observed. A correlation test showed that the reli-
ability was 0.875, and a further analysis of the data indicated that major differences
arose from the following three aspects: (1) the distinction between teacher-fronted
interaction and teacher-student interaction; (2) the definition of genuine informa-
tion; and (3) length of questions and responses (Table 4.7).
After extensive discussion and consultation, these three terms were further
defined. Teacher-fronted interaction would include only the teacher asking and the
whole class responding pattern. Genuine information refers to any information that
is unknown to the person who asks the question, so it would be considered a genuine
question if the teacher asks, “How do you understand …?”, while “What is the
meaning of arbitrary?” would be considered not genuine. As for the length of ques-
tions and responses, it is agreed that on the five point Likert scale, (1) means word;
(2, phrase; (3), a single sentence; (4), 2–3 sentences; and (5), more than 3
sentences.
With the agreed criteria, five more classes were randomly selected for recoding,
and the reliability increased to 0.967, which suggested the satisfactory inter-coder
consistency (Table 4.8).
104 4 Research Framework for TEM Washback Study
Last, ten classes (20% of the total classes observed) were selected randomly for
recoding by the researcher after an interval of 1 month, and the intra-coder reliabil-
ity was 0.987, which indicated the satisfactory intra-coder consistency (Table 4.9).
The interviews were conducted either individually or in groups, and both telephone
interviews and face to face interviews were done, depending on the situation. To
ensure validity and reliability of interview data, the researcher took the following
measures.
First, a list of questions was prepared beforehand, but in the interview, additional
questions might be asked based on the response of the participants. Questions were
repeated and explained if the participants had any puzzles.
Second, a digital voice recorder was used to record post-questionnaire and post-
observation interviews to conduct further analysis.
Last, the participants might be interviewed again if any problems arise during
data collection and analysis.
4.4.3 Triangulation
Various research methods have been employed in the present research, such as doc-
ument analysis, questionnaire surveys, classroom observations and interviews.
Therefore, data collected by different means can be compared, which are listed as
follows:
1. closed and open-ended questionnaire responses;
2. questionnaire responses and classroom observation findings;
3. questionnaire responses and interview findings;
4. classroom observation and interview findings;
5. current research findings and previous research findings collected through docu-
ment analysis.
4.5 Summary
This chapter began with a washback model employed in the current study, listing
the contributing factors and working mechanism of TEM washback. A multi-phase
and muti-method research framework was developed accordingy, employing vari-
ous research methods to investigate the washback effects of TEM on the perceptions
and practice of related experts, program administrators, teachers and students.
Then, a detailed description was provided of the research participants and instru-
ments—how the participants were selected, and how the research instruments, such
as questionnaires and classroom observation scheme, were developed, piloted,
revised and finalized.
Next, data collection and analysis procedures were explained, with the prelimi-
nary analysis results listed in tables, such as the reliability test and factor analysis
results. The research data would be further analyzed in Chaps. 5 and 6.
106 4 Research Framework for TEM Washback Study
Last, a detailed account was provided of the measures taken to ensure validity
and reliability of the research. Washback is such a complicated phenomenon that it
is essential to bear validity and reliability in mind throughout. The research instru-
ments were validated, the data collection procedures were standardized and the
research findings from various sources were triangulated, so that the research results
are expected to be valid and reliable.
Chapter 5
Stakeholders’ Perceptions of TEM and Its
Washback
This chapter presents research findings from the questionnaire surveys. The
responses of experts, program administrators, teachers and students will be ana-
lyzed one by one, to identify their respective evaluation of TEM and its washback.
The sequence in the discussion is as follows: experts, program administrators,
teachers and students, according to the three levels presented in the educational
hierarchy (Fig. 1.1). The experts are in the top level, responsible for the Teaching
Syllabus development and test construction, thus more familiar with the intended
washback of TEM; the program administrators are in the middle, who would
develop their school syllabuses according to both the Teaching Syllabus and the
situation of their schools, so they are familiar with the washback of TEM in the
school level; teachers and students are the practitioners of the Teaching Syllabus
and TEM Test Syllabuses, so they are familiar with the washback of TEM on class-
room practice. In this view, the opinions of experts, program administrators, teach-
ers and students are to be analyzed one by one, and then a comparison would be
made to identify areas of agreement and divergence.
Five experts participated in the questionnaire survey, and gave detailed answers to
the open-ended questions. Due to the limited number of participants, more verbal
account of the research findings would be presented than statistical illustrations.
The five experts surveyed are all university professors in English with more than
20 years of teaching experience. Two are from comprehensive universities, two
from foreign language universities and the fifth from a university of science and
technology.
Among the five experts, three have participated in the development of both the
Teaching Syllabus and the TEM Test Syllabuses, and four have participated in the
TEM test construction.
As asserted by the experts, the TEM Test Syllabuses are based on the Teaching
Syllabus and the TEM tests are constructed according to the TEM Test Syllabuses,
so TEM can measure whether students have met the required English proficiency
specified in the Teaching Syllabus.
Generally speaking, TEM was positively evaluated by all the experts, who approved
of, or generally approved of the design of TEM, including its content, item types
employed, time allotment, score allotment and marking criteria. Moreover, some
suggestions have been made on the improvement of TEM, which are summarized as
follows.
First, the test content of TEM, especially which of TEM8, should include more
“professional knowledge for English majors”, such as linguistics, literature and
culture of English speaking countries. This may help teachers and students develop
a keener awareness of the difference between English majors and non English
majors.
Second, the item types employed in TEM are mainly traditional, such as multiple
choice, translation and writing, so they could be appropriately modified to include
more integrative items. Besides, the test content and test method of TEM should be
more closely integrated, and appropriate test method can be selected according to
different test content.
Last, the marking criteria of TEM are well developed and easy to follow, but
more importance should be attached to its guiding role in language teaching. For
instance, the marking criteria of writing should be able to guide the teaching of writ-
ing for English majors.
5.1 Experts’ Perceptions 109
Validity involves the quality of test use and interpretation, or the extent to which the
inferences or decisions made on the basis of test scores are meaningful, appropriate
and useful (AERA, APA & NCME 1985: 9). Therefore, it is essential to investigate
the use and interpretation of TEM scores.
According to the TEM8 Test Syllabus, the intended use and interpretation of the
test scores are:
1. to check and promote the implementation of the Teaching Syllabus;
2. to measure students’ language proficiency and to evaluate the overall language
teaching standard of a university;
3. to provide feedback to language teaching and learning.
However, due to the increasing social recognition of TEM, the test results have
been used to perform other functions like conferring bachelor’s degree, ranking
schools and evaluating teachers.
According to the experts, the unintended use of TEM scores should not be
encouraged because they will lead to some negative washback. First, to link TEM
certificates to the awarding of bachelor’s degree degrees may encourage test-ori-
ented learning, and some test takers may even take the risk of cheating because the
certificate means a lot to them. Next, teaching effects may depend on many factors,
so it is unjustifiable to use TEM results as the sole criterion to evaluate teaching.
Last, to rank schools based on TEM results may lead to undesirable competition
between schools, which is harmful to the healthy development of the schools.
All the experts agreed that TEM has exerted positive washback on teaching and learn-
ing. TEM can check and promote the implementation of the Teaching Syllabus; TEM
provides feedback information to teachers and students, who can adjust their teaching
and learning accordingly; and TEM may help improve teaching and promote reform.
If TEM is to better play its positive role in teaching and learning, the test reports
should be improved, as most experts pointed out. They suggested that more infor-
mation should be provided in the test reports, to include not only a rank, such as
Excellent, Good, Pass and Fail, but also the total score and scores for each section.
If possible, verbal description of the language ability for each score band should be
provided, so that teachers and candidates can better interpret the test results.
Moreover, the number of students in each score band can be provided to give stu-
dents a rough idea of how they rank in the test population. In a word, the purpose of
providing a test report is to facilitate teaching and learning, so the teachers’ and
candidates’ needs should be considered.
However, it is also likely that TEM may have some negative effects, such as inter-
fering in the teaching plan and promoting test taking technique training. Nevertheless,
negative effects can be avoided if TEM test and its scores are properly used.
110 5 Stakeholders’ Perceptions of TEM and Its Washback
Program administrators are the stakeholders in the school level, who will set the
teaching aims and design the curriculum in their respective schools, based on the
Teaching Syllabus and other related factors, such as the language proficiency of the
students, the available teaching staff and resources, and perhaps, the TEM
Syllabuses. Therefore, it is of significance to survey their opinions on the validity of
TEM and TEM washback, to find out the effects of TEM on the English curriculum
in different schools.
Program administrators are often responsible for designing school curricula and
setting teaching objectives for English majors in their schools, so they should be
more concerned about the Teaching Syllabus and TEM. As a result, they are
expected to make sensible evaluation of TEM.
5.2 Program Administrators’ Perceptions 111
Before investigating their evaluation of TEM, the current research first surveyed the
program administrators on their familiarity with TEM, such as the Test Syllabus,
marking criteria, and so on.
Most program administrators surveyed were familiar with TEM, as indicated in
Table 5.2. Over 90% of the participants were familiar with the Teaching Syllabus
and the TEM Test Syllabus. However, the marking criteria were less known, famil-
iar to around 60% of the participants, and the Oral Test Syllabus was the least
known, familiar to only around 50% of the participants. Comparatively, TEM4
seems to enjoy a higher familiarity rate, especially in terms of the Oral Test Syllabus
and the marking criteria.
112 5 Stakeholders’ Perceptions of TEM and Its Washback
Compared with TEM reaching an annual test population of over 200,000, the
number of the Oral test takers is much smaller, about 10,000–20,000 every year,
which means that many universities do not have students taking the test. So, it natu-
rally follows that only half of the program administrators surveyed were familiar
with the oral test syllabus. However, the finding deserves our close attention. As
TEM is supposed to assess test takers’ language proficiency specified in the Teaching
Syllabus, and speaking is an indispensable part of it, the relatively low familiarity of
the TEM-Oral test is not conducive to promoting positive washback. Therefore, it is
necessary to popularize TEM-Oral, to make it better known to teachers and students
so as to promote a balanced development in students’ language proficiency.
Marking criteria usually embody a test’s purpose and focus, and tests, especially
criterion-referenced tests like TEM, can hardly achieve expected goals if their mark-
ing criteria are unfamiliar to teachers and students. For instance, a scoring scheme
combining features of holistic and analytic marking is employed to grade TEM
writing scripts. The scheme focuses on language use as well as ideas and arguments.
Familiarity with the scoring scheme can help teachers set course objectives and plan
their lessons in line with the writing requirements in the Teaching Syllabus, because
the scoring scheme incorporates these writing requirements. In this respect, regular
communication between test constructors and test users, teachers in particular, is of
much importance. Various ways of communication can be employed, such as pro-
viding detailed marking criteria with samples, offering video training sessions, so as
to foster or intensify positive washback on teaching and learning.
As is shown in Table 5.3, the overall satisfaction rate is fairly high, with five
items above 90%. Among these five items, four are concerned with the test itself:
alignment with the Teaching Syllabus, test content, item type, and score allotment;
one is related to test administration, strictness of administration. No matter whether
these five items refer to internal or external factors, a high satisfaction rate seems to
suggest that TEM, as a criterion-referenced test, has satisfactory validity.
Feedback from the questionnaire not only provides test developers with achieve-
ments of the test, but also helps them identify possible areas of revision. As is men-
tioned above, convenience of test administration for TEM8 had received a
comparatively low satisfaction rate. From participants’ responses to the short-
answer question, it becomes clear that the problem lies with the first task in TEM8
listening section. The first task combines note taking with gap filling. And the invig-
ilators’ manual has stipulated a series of steps to be followed in test administration,
which some program administrators said were “quite complicated”. As TEM8 is a
large-scale high-stakes test, a minor mistake might cause the listening section to be
improperly delivered, which could then result in unfairness. Thus, a review of the
test invigilation procedures is essential with a view to improving practicality and
thus maintaining validity.
Moreover, the satisfaction rate for test report, both TEM4 and TEM8, is low, just
around 80%. However, to make better use of the feedback information of TEM, a
detailed test report is essential. TEM test report is just a certificate, dividing the test
takers into excellent, good and pass (those who fail the test will get no certificates).
The test takers can check their overall score in their corresponding schools, but the
scores for each section are not provided. In this sense, the information provided in
TEM test report is quite limited.
As shown in Table 5.4, most program administrators considered it necessary to
report not only a composite score, but also section scores and ability description.
Moreover, information of students’ ranking in the test population was considered
important by around half of the program administrators, and some of them wanted
114 5 Stakeholders’ Perceptions of TEM and Its Washback
other information, such as ranking of the schools, strengths and weaknesses reflected
in the tests, and so on. Therefore, it is desirable to improve the test report of TEM,
to provide more information, so that TEM can play a better role in teaching and
learning. Just as an expert pointed out, when drafting a test report, importance
should be attached to “the relation between scores and teaching, the feedback a test
report can provide for teaching and learning”.
As indicated in Table 5.5, around 80% of the program administrators believed that
TEM scores can reflect and promote syllabus implementation at the university level,
and it can measure students’ language proficiency. Meanwhile, TEM scores can
provide feedback to course design and classroom teaching, thus promoting teaching
5.2 Program Administrators’ Perceptions 115
reform. As to whether TEM scores can reflect teaching quality and students’ future
performance, fewer program administrators held a positive view. Therefore, it was
doubtful whether TEM results could be used as a criterion for teacher evaluation
and job application.
Table 5.6 provides information on the impact of TEM scores. As can be seen in
the table, TEM scores seem to affect students’ future career and education much
more than rankings of various kinds.
It can be concluded from the above data that TEM has, to some extent, achieved
its intended purpose as is specified in the Test Syllabus. In this sense, TEM scores
are used and interpreted in the intended way. Of all the items in Table 5.5, “predict-
ing students’ future performance” received the lowest percentage point (63.5% and
63.6%). This has, from a different perspective, attested to the intended use of TEM,
which, as a criterion-referenced test, is designed to evaluate learning outcome
according to predetermined criteria, rather than predict what students can do in the
future. However, as Table 5.6 indicates, TEM scores have a much greater impact on
students’ future career and future education, especially the former. This has aroused
concern from TEM test developers, for TEM is not designed to predict students’
performance in their future career. Using TEM scores to make employment deci-
sions may incur problems.
Besides its unintended use in job recruitment, TEM scores were believed to
affect school ranking and teacher evaluation (Table 5.6). However, were these prac-
tices common in the participating universities? According to Table 5.7, the most
common practice is to relate TEM4 score to the conferring of bachelor’s degree
degree, which was selected by one third of the program administrators. While for
the other uses, only about 10% of the program administrators selected “yes”, claim-
ing that TEM scores were used to rank schools, evaluate teachers or confer degrees
in their schools, which indicates that these are not common practices. However,
program administrators were almost equally divided in their attitudes towards the
four practices, as is shown in Table 5.8.
Two points came up in Tables 5.7 and 5.8: (1) unintended use of TEM scores in
participating universities was not common, which is conducive to maintaining and
promoting positive washback; (2) more than half of the program administrators
surveyed were not against using TEM scores, especially TEM4 scores, in unin-
tended ways, which poses potential hazards in misuse of test scores.
116 5 Stakeholders’ Perceptions of TEM and Its Washback
Therefore, test developers are expected to “anticipate the common uses of certain
kinds of tests, and the potential consequences of such use, even if these uses are not
explicitly advocated” (Kane 2001: 338). It is advisable that TEM test developers
identify possible uses and interpretations of TEM scores and provide guidelines on
how decisions about scores are to be made.
It is generally assumed that tests may exert influence on teaching and learning, and
the higher the stakes, the more intense its washback. The existing research on wash-
back effects indicate that washback effects do exist, and they may take different
forms in different settings or among different stakeholders. Moreover, a test is more
likely to influence teaching content than teaching method (Cheng 2005; Gu 2007;
Wall 1996).
Table 5.9 provides program administrators’ perception of TEM washback.
Generally speaking, their views on washback were positive, though their opinions
varied on some items.
From Table 5.9, it can be inferred that most program administrators held a posi-
tive attitude towards TEM, especially its overall washback on program design and
(classroom) teaching. With regard to other items, TEM was perceived to have
exerted more positive influence on setting teaching aims (85.2% and 83.5%) than
scheduling courses (75.0% and 71.4%); and teaching method (27.6% and 31.0%)
5.2 Program Administrators’ Perceptions 117
seemed to be influenced less than teaching content (18.9% and 22.2%), which com-
plies with previous research findings (Cheng 2005; Gu 2007; Wall 2005). In addi-
tion, TEM also seemed to have impact on teaching ideology and attitude in a positive
direction. On the whole, the statistics have shown that TEM has had positive effects
on teaching, but the intensity of its effect varied across different aspects of the
teaching context.
A further investigation was made to find out which aspect of TEM exerted more
washback effects (Table 5.10).
According to most program administrators, various aspects of TEM exerted posi-
tive washback on course design, and the influence of test content and method sur-
passed that of the other aspects. However, marking criteria, an aspect that should
play a guiding role, were not considered important, with around 30% of the program
administrators selecting “no washback”. Possible reason is that some program
administrators were not familiar with the marking criteria (see Table 5.2).
118 5 Stakeholders’ Perceptions of TEM and Its Washback
Table 5.9 also indicates that around 75% of the program administrators held a posi-
tive view on TEM8 washback on course design, which could further be manifested
in two aspects: introduction of TEM related courses and TEM preparation courses,
as the following tables show.
TEM4 consists of six parts—dictation, listening, grammar and vocabulary, cloze,
reading and writing, and TEM8 also consists of six parts—listening, reading, gen-
eral knowledge, proofreading, translation and writing. According to the Teaching
Syllabus, the recommended courses for students in the fundamental and advanced
stage are:
1. Fundamental stage: fundamental English, phonetics, listening, speaking, read-
ing, writing, grammar, and culture of English speaking countries;
2. Advanced stage: advanced English, writing, translation, oral interpretation, aca-
demic writing, linguistics and literature.
As is indicated in Table 5.11, TEM related courses seem to be more often intro-
duced than non-related courses, especially for TEM8 non-related courses such as
academic writing and interpretation, which may be viewed as a sign of TEM wash-
back on teaching as well. Moreover, in the fundamental stage, skill training courses
(listening, reading, speaking and writing) were more often introduced than courses
on language knowledge, such as grammar and phonetics, which may indicate that
the focus of college English teaching is language use rather than language
knowledge.
TEM underwent some substantial revisions in 2005. In TEM4, the requirement
on listening, writing and vocabulary was raised, and the item of fast reading was
deleted; while in TEM8, the requirement on listening, reading and writing was
raised and a new component, General Knowledge, was introduced to assess stu-
dents’ knowledge of linguistics, literature and culture. The current research thus
makes an initial investigation into the influence of TEM revisions on course design,
as shown in Table 5.12.
As for TEM4 revision, it seems that the raised requirement exerted more influ-
ence on course design than the absence of a particular test component. A possible
reason is that fast reading is only a kind of reading skill, so the deletion of fast read-
ing would not influence the introduction of the reading course. While for TEM8
revision, the newly introduced component, General Knowledge, was expected to
exert more influence on course design. However, our research findings indicate that
listening and writing revisions seemed to have affected course design to a greater
extent. This phenomenon was explained by some program administrators in the
short-answer question. Special courses on listening and writing had been introduced
in some institutions for juniors and seniors because these skills were generally sup-
posed to be more difficult for students. For example, many institutions offered
advanced listening courses, even though listening is not a compulsory subject for
the advanced stage in the Teaching Syllabus. Thus, higher test requirements
Table 5.11 Courses introduced by TEM participating universities
TEM4
Fundamental Phonetics Listening Speaking Reading Writing Grammar
English
5.2 Program Administrators’ Perceptions
As proposed by many researchers, the washback effects of a test may vary from
participant to participant, due to their different characteristics, such as educational
background and work experience (Alderson and Wall 1993; Green 2007). In this
sense, program administrators’ perception of TEM washback is further analyzed to
identify individual differences. Since program administrators are often responsible
for course planning, the washback effects on course design were further analyzed,
and it turned out that the washback effects perceived by TEM4 program administra-
tors varied significantly—the overall washback on course design differed among
program administrators with different educational background and teaching
experience.
5.2 Program Administrators’ Perceptions 121
TEM4 70.9% 80.8% 58.0% 61.9% 36.7% 42.8% 28.9% 22.4% 20.8% 22.0%
TEM8 64.3% 61.0% 69.0% 40.1% 20.8% 9.6% 4.3% 22.4% 8.3% 2.5%
Stakeholders’ Perceptions of TEM and Its Washback
5.3 Teachers’ Perceptions 123
The above discussion shows that TEM has more positive washback effects on
English language teaching than negative effects, and it seems to have affected those
courses recommended in the Teaching Syllabus that are more related to test content
domain. Moreover, TEM has had some impact on the continued delivery of skill
training courses in some universities, even though the skill/competency is not a
compulsory subject for the advanced stage. However, is TEM a decisive factor com-
pared with other factors involved in language teaching?
From Table 5.15, it can be noted that program administrators consider student
quality, teacher quality, student effort and leader attention the most important four
factors. In contrast, the influence of TEM was limited. Comparatively, TEM4 played
a more important role in college English teaching, acknowledged by 22.0% of the
program administrators, while TEM8 was considered important by only 2.5% of the
program administrators surveyed. In other words, although TEM has exerted some
washback on English language teaching, its effect is limited, as compared with
other factors. This finding is desirable because the relationship between teaching
and testing should be “that of partnership”—a good test should be “supportive of
good teaching” and “exert a corrective influence on bad teaching” (Hughes 2003:
2). In this view, the role TEM plays in English teaching should be facilitative rather
than decisive, so it is natural that TEM was not considered an important factor by
most program administrators. In addition, this is also in line with the experts’ opin-
ion, who believed that the top three factors affecting English teaching and learning
effects should be curriculum design, teacher quality and students efforts, while no
one has selected TEM as an important factor.
Teachers are the practitioners of the Teaching Syllabus and their school curricula.
Meanwhile, their teaching activities are influenced by TEM. Therefore, question-
naire surveys were conducted among TEM4 and TEM8 teachers, to investigate their
evaluation of TEM, perception of TEM washback, classroom teaching practice, and
perception of the factors influencing teaching for English majors.
From Table 5.16, it can be concluded that the teachers surveyed constitute a rep-
resentative sample, covering teachers from various types of universities, with differ-
ent educational background and teaching experience. The academic title of the
teachers ranged from teaching assistant to professor, but with lecturers being the
majority, and the academic degree of the teachers ranged from Bachelor to Doctor,
with Master being the most common. There are new teachers with no more than
5 years of teaching experience, and also experienced teachers with more than
15 years of experience. In terms of academic title and degree, TEM8 teachers
seemed to be slightly higher than TEM4 teachers, and TEM8 teachers had a longer
teaching experience. The number of female teachers far outweighs that of male
teachers, both for TEM4 and TEM8.
The wide coverage of the sample ensures the representativeness of the research
findings, and meanwhile, a comparison can be made between teachers with differ-
ent characteristics.
Teachers are the practitioners of the Teaching Syllabus and their school curricula,
and TEM is one of the methods used to measure their teaching effects. Therefore,
they may have developed their unique perceptions of TEM and its washback. In this
5.3 Teachers’ Perceptions 125
sense, teachers were surveyed on their familiarity with TEM, evaluation of TEM,
and their use of TEM results.
Before investigating their evaluation of TEM, the current research first surveyed the
teachers on their familiarity with TEM, such as the Teaching Syllabus, the Test
Syllabus, and so on. Whether teachers are familiar with TEM, to some extent, is
determined by whether they are teaching TEM related courses, whether they are
preparing students for TEM, and whether they are concerned about the test.
The majority of the teachers surveyed were familiar with the Teaching Syllabus,
the TEM Test Syllabus and TEM time allotment, as indicated in Table 5.17. Around
80% of the participants claimed to be familiar with the Teaching Syllabus and the
TEM Test Syllabus. However, the marking criteria were less known, familiar to
63.5% of the TEM4 teachers and 49.3% of the TEM8 teachers, and the Oral Test
Syllabus was the least known, familiar to only one third of the participants (39.5%
and 33.6%). Both TEM4 and TEM8 teachers were familiar with the Teaching
Syllabus, but TEM4 seemed to enjoy a higher familiarity rate, especially in terms of
the Test Syllabus and the marking criteria.
Teachers’ questionnaire findings coincide with that of the program administra-
tors’ questionnaires, and the possible causes are the same: the small test population
of TEM Oral Tests and the lack of communication between test developers and test
users.
Since most teachers were familiar with TEM, it is likely for them to make sensible
judgment on the quality of TEM; hence, another area of inquiry in the questionnaire
is teachers’ evaluation of TEM, including alignment with the Teaching Syllabus,
test content, test method and test administration. As is shown in Table 5.18, TEM
was positively evaluated by most teachers, with a satisfaction rate of over 75% for
all the items. Among the 10 items in the questionnaire, test content received the
highest satisfaction rate whereas test report had the lowest one. Moreover, of all the
items, marking criteria received the highest rate of “Not clear”, which is consistent
126 5 Stakeholders’ Perceptions of TEM and Its Washback
with the previous finding of its unfamiliarity to teachers. Comparatively, TEM4 had
a higher satisfaction rate than TEM8, except for the item of “test report”.
As shown Table 5.18, the items of test report and convenient administration for
TEM8 received the lowest satisfaction rate, which complies with the findings of the
program administrators’ questionnaires. Possible reasons are analyzed in Sect.
5.2.2.2.
Moreover, teachers’ opinions were sought as to what should be included in the
test report.
As shown in Table 5.19, what teachers wanted most is the section score, which
can reflect students’ strengths and weaknesses, such as whether they are better at
reading or listening. In this sense, learning activities can be organized to cater to the
needs of the students.
Test characteristics include not only test content and method, but also the use and
interpretation of test results. In this view, the current research attempted to investi-
gate teachers’ perception of how TEM scores should be used, as well as how the
score were actually used in their respective schools. As shown in Table 5.20, more
5.3 Teachers’ Perceptions 127
teachers agreed on the use of TEM score to assess students learning outcome (82.7%
and 79.0%) and provide feedback to teaching (80.0% and 75.3%) than to predict
students future performance (57.7% and 58.1%). It suggests that TEM has been
used in the intended way—to measure students’ language proficiency and to pro-
vide feedback information to facilitate English teaching. As to whether TEM results
can be used in job application, “predict students’ future performance”, most teach-
ers were doubtful, which is the same as the program administrators.
Table 5.21 provides information on the impact of TEM scores. As can be seen in
the table, TEM scores seem to affect students’ future career and further education
much more than teacher evaluation and student ranking.
Further investigation was made into the actual practice in different universities—
how TEM scores were used and the teachers’ attitudes towards such uses.
From Tables 5.22 and 5.23, it can be noted that TEM scores were seldom used to
evaluate teachers, but one third of the schools linked TEM4 score to the bachelor’s
degree degree. However, teachers’ attitudes towards these uses deserve our atten-
tion: more teachers agreed to these uses than not.
128 5 Stakeholders’ Perceptions of TEM and Its Washback
Therefore, it is advisable that TEM test developers identify possible uses and
interpretations of TEM scores and provide guidelines on how decisions about scores
are to be made.
From Table 5.25, it can be inferred that various aspects of TEM exerted more
positive than negative effects on teaching. Test content and method seem to have
exerted more influence than score allotment, time allotment and marking criteria,
which is similar to the perception of program administrators. However, it deserves
our attention that the marking criteria failed to achieve the expected effects. Marking
criteria should have played a guiding role in English teaching, but it failed to do so
since some teachers were not familiar with them (see Table 5.17).
According to Alderson and Wall (1993), a test will influence teaching, such as its
content and method, rate and sequence, degree and depth, as well as teachers’ atti-
tude towards teaching. Therefore, in the present study, teachers were surveyed on
the washback of TEM on their classroom teaching practice.
Furthermore, questions were asked to find out how teachers taught in TEM prep-
aration classes, the major activities conducted in classes.
Activities conducted in TEM preparation classes were varied, because in most
schools TEM preparation was carried out in fundamental English or advanced
English classes, rather than in a separate course. From Table 5.27, it can be inferred
that TEM preparation, such as having students work on the test paper and explain
the exercises in the test paper, is often conducted in classes, especially for TEM4.
However, textbooks were still used and skill training activities were often carried
out. It seems that TEM8 preparation was much less intense than TEM4 preparation,
with only one third of the teachers claiming that test preparation activities were
frequently conducted in class.
From Table 5.28, it can be inferred that TEM would have some washback on the
teaching content. More time would be devoted to working on the test paper and
explaining it. Also, TEM related skills would be practiced more, such as listening,
reading and writing, while non-related skills would be practiced less, such as trans-
lation for TEM4 and speaking. It seems that most activities had been intensified in
TEM preparation classes, which, according to findings from the interviews, may
have been attributed to the tight schedule, for teachers had to finish the textbook and
prepare students for TEM at the same time. As a result, more activities would be
conducted in the classes.
5.3 Teachers’ Perceptions 131
TEM underwent some substantial revisions in 2005. Since many teachers have more
than 10 years of teaching experience, several questions were designed to find out
how TEM revision has affected them in classroom teaching. (See Sect. 5.2.4 for the
content of revision).
It can be noted from Table 5.29 that as for TEM4 revision, it seems that the raised
requirement (listening, writing and vocabulary) exerted more influence on course
design than the deleted item (fast reading), which is the same as program adminis-
trators’ opinion.
While for TEM8, the newly added item, General Knowledge, seems to have
exerted more influence on English teaching. Moreover, listening and writing revi-
sion seemed to influence English teaching more than the revision in reading. As a
132 5 Stakeholders’ Perceptions of TEM and Its Washback
new item, it is natural that General Knowledge will affect teachers more, because
they had to familiarize themselves with both its content and method, so that they can
better prepare students for the test. According to the findings from the interviews
and the open questions in the questionnaires, the reason for listening and writing to
exert more influence is that these two parts were very difficult, so teachers were
more concerned about their revision.
The current research covers teachers from various universities with different back-
ground, so an investigation was made to find out whether their perceptions of TEM
washback differ (Table 5.30).
Several Samples Tests were done and difference among teachers with different
educational and teaching background was found to be significant, while there
seemed to be no significant difference among teachers from various universities. As
shown in Table 5.30, those affected less by TEM (selecting “no washback”) were
teachers with higher titles, longer teaching experience, or higher degrees. Moreover,
they seemed to take a more critical attitude towards TEM (selecting “negative”).
Possible reasons are: (1) better-educated and more experienced teachers tend to be
more confident about their teaching, so it is natural that they would be affected less
by TEM; (2) the longer one teaches, the more problems they may identify in the
educational system, including TEM, so they are more likely to hold a negative atti-
tude towards TEM; and (3) more teachers with bachelor’s degree degrees were
found to be negatively influenced by TEM8, which is probably due to the high dif-
ficulty level of the test.
5.3 Teachers’ Perceptions 133
Generally speaking, most teachers agreed that TEM had exerted some positive
effects on English teaching. However, is TEM an important factor that may deter-
mine their teaching effects?
From Table 5.31, it can be noted that TEM4 and TEM8 teachers’ responses were
quite similar, considering student quality, student effort and teacher income the
three most important factors. In contrast, the influence of TEM is limited, consid-
ered important by no more than 10% of the teachers (8.1% and 6.3%). In this view,
teachers’ view coincided with that of the program administrators, and it is the desir-
able role of TEM, to be facilitative rather than decisive in college English
teaching.
As a result, TEM was not considered a major source of pressure by most teach-
ers, as indicated in Table 5.32.
As indicated in Table 5.32, teachers’ pressure mainly came from their sense of
responsibility and their work load, while TEM preparation and being evaluated
based on TEM results would not lead to pressure for most teachers.
134 5 Stakeholders’ Perceptions of TEM and Its Washback
TEM4 candidates were mainly second-year English majors and TEM8 candidates
were English majors in their fourth year. However, there were also test takers major-
ing in other fields, such as law, economics, and so on, but they often had English as
their minor. Moreover, there were candidates majoring in other foreign languages,
such as Japanese, French, and so on. The candidates were from universities of all
types, such as comprehensive universities, universities of science and technology,
language universities, and so on (Table 5.33).
Based on the figures listed in Table 5.33, the following findings can be
summarized.
First, the number of female test takers far surpasses that of male test takers,
which might be attributed to the common phenomenon that there are more girls
majoring in English.
Second, TEM4 is mainly taken by second-year students, but quite a few third-
year students take the test, because for those who have failed their TEM4 in the
second year, they can take TEM4 again in their third year. Likewise, for those who
have failed TEM8 in the fourth year, they can take TEM8 the next year, perhaps
after their graduation. However, the number of TEM8 test takers in their fifth year
is much smaller than that of TEM4 test takers in their third year. This is not because
TEM8 has a much higher pass rate, but that some candidates who fail TEM8 for the
first time do not take the test again after their graduation.
5.4 Students’ Perceptions 135
Last, more candidates of other majors take TEM4 than TEM8, which is possibly
due to the fact that TEM8 is not a required test for non-English majors in most
schools.
Test takers’ opinions are seldom sought as to whether a test is valid or not (Cheng
2008, 2011; Cohen 2006; Hamp-Lyons 2000a, b). However, research in language
assessment has demonstrated strong evidence of test validation from multiple stake-
holder perspectives (Cheng 2011; Moss et al. 2006). Therefore, it is essential to
investigate test takers’ perception in the validation of TEM. In this view, test takers’
evaluation of TEM is explored in the questionnaire surveys, and the following part
will report their familiarity with TEM and their evaluation of TEM.
Before investigating test takers’ evaluation of TEM, it is necessary to find out how
familiar they are with TEM, such as TEM Test Syllabus, TEM test administration,
and so on.
As indicated in Table 5.34, TEM candidates were more familiar with the admin-
istration of the test, such as its time allotment and score allotment. Meanwhile, they
136 5 Stakeholders’ Perceptions of TEM and Its Washback
were more concerned about the TEM Test Syllabus than the Teaching Syllabus. This
conforms to our expectation because candidates care more about the test content (as
listed in the Test Syllabus), time allotment and score allotment, so that they can bet-
ter perform in the test; while the Teaching Syllabus is none of their concern since it
is meant to guide teaching for English majors.
It should also be noted that the candidates were not so familiar with TEM mark-
ing criteria, especially TEM8 marking criteria. 47.9% of the TEM4 candidates
claimed to be familiar with TEM4 marking criteria and 39.3% of the TEM8 candi-
dates were familiar with TEM8 marking criteria. This might be attributed to the fact
that more subjective items are employed in TEM8, such as translation and writing,
which account for 40% of the total score, while there is no translation in TEM4 and
the writing part only accounts for 25% of the total. As proved by the interview find-
ings, most candidates surveyed claimed that they were not so concerned about the
marking criteria, especially that of writing and translation, because they either “had
no access to them” or “had difficulty understanding them”. Moreover, candidates
seemed to believe that the marking of writing and translation was “teachers’ busi-
ness”, while they “had no say” in this matter. This may also account for the fact that
translation and writing were less practiced in TEM preparation after class (see
Table 5.45). Candidates may feel their efforts in vain because they can’t get prompt
feedback on their performance in translation and writing.
As shown in Table 5.34, TEM-Oral Test Syllabus was the least familiar to test
candidates, 32.5% for TEM4 candidates and 32.4% for TEM8 candidates. Possible
explanation is that the test taking population of TEM oral tests (TEM4-Oral and
TEM8-Oral) is much smaller than TEM written tests (TEM4 and TEM8). No more
than 10% of the written test takers would take the oral tests—this makes speaking
an untested ability in TEM for most English majors. As a result, speaking profi-
ciency might be neglected in teaching and learning.
Candidates’ opinions are sought as to whether TEM can assess their language abili-
ties validly, both general language ability and individual language skills.
5.4 Students’ Perceptions 137
Table 5.35 shows that most TEM candidates agreed that the individual sections
in TEM were able to assess their language ability, such as the listening part can
validly assess their listening ability. However, they seemed to be less satisfied
with the role of TEM in assessing their overall language ability and learning
achievement. There are two possible reasons. First, the overall language profi-
ciency may include more than what is assessed in TEM. For instance, most TEM
candidates would not take TEM oral test, so oral proficiency, an indispensable
part of overall language ability, is not assessed in TEM. Second, learning achieve-
ment in college involves much more than language ability, so it can hardly be
assessed by a single test.
As to whether TEM can predict future performance, opinions of candidates vary.
Around one third of the candidates believed that it could predict their future perfor-
mance, one third were not sure, and the rest believed that it could not serve such a
function. As stated in the test syllabus, the purpose of TEM is to measure whether
candidates have met the required language proficiency specified in the Teaching
Syllabus. Therefore, TEM is not designed to predict candidates’ future performance,
such as whether they are competent for a certain job or whether they are qualified
for a bachelor’s degree. In this question, more TEM8 candidates (38.1%) agreed on
the predictive role of TEM than TEM4 candidates (34.6%), which might be attrib-
uted to the fact that TEM8 is taken at the end of the fourth year when candidates are
about to graduate, so TEM8 results are more likely to be used to predict candidates
performance in their future jobs.
Whether a test can achieve its intended washback depends, to a large degree, on
the feedback information it provides, and test report is a basic means of providing
information. Therefore, candidates’ opinions are sought as to what information
should be provided in the TEM test report.
From Table 5.36, it can be inferred that TEM candidates wanted more informa-
tion than what is provided in the present TEM test report. The scores of each section
are considered necessary by most candidates, even more important than the com-
posite score. One possible reason is that candidates would be able to identify their
138 5 Stakeholders’ Perceptions of TEM and Its Washback
strengths and weaknesses based on the section scores. Moreover, some candidates
believed it necessary to describe the language proficiency for each score band, so
that they would know what their marks truly mean. Only one fourth of the test takers
believed it necessary to provide their rank in the test population, which suggests that
candidates are against the competition entailed in TEM.
Whether test scores are used in the intended way may, to some extent, affect the
validity of a test. However, in test use and interpretation, test takers often play a
passive role, powerless as to how test scores should be interpreted and used.
Nevertheless, it is still important to investigate how they believe the test scores are
used and how they are affected by it, so that a comparison can be made between the
beliefs of various stakeholders (Table 5.37).
According to the candidates, TEM4 scores (59.5%) were more often related to
conferring bachelor’s degree degrees than TEM8 scores (16.0%). However, TEM4
and TEM8 candidates held similar attitudes towards relating TEM scores to confer-
ring bachelor’s degree degrees, half agree and half disagree.
Besides conferring bachelor’s degrees, TEM results may also affect candidates’
school rank, future employment and further education. The candidates’ responses
suggest that TEM results have affected their future employment more than school
rank, especially for TEM8 candidates who were to graduate soon.
5.4 Students’ Perceptions 139
According to Alderson and Wall (1993), a test will affect what learners learn and
how learners learn, such as the learning content, method, rate and sequence, and
degree and depth. Therefore, the candidates were first surveyed on whether they
were positively or negatively affected by TEM.
As indicated in Table 5.38, most TEM candidates believed that TEM has exerted
more positive washback on their English learning, especially the overall washback
of TEM. Moreover, it seems that TEM has helped candidates set their learning aims,
so most of them were quite positive about the washback of TEM on their learning
aims. Besides, TEM seems to affect the learning content more than the learning
method, with more candidates selecting neutral to the question of the washback of
TEM on the learning method. What is least affected by TEM is course selection,
with around 30% of the candidates choosing “Neutral”, which is possibly due to the
fact that most TEM related courses are compulsory courses, such as fundamental
English and advanced English.
A further investigation is made on the candidates’ attitudes towards the possible
positive and negative washback of TEM—whether they agreed that TEM would
generate washback effects, and the findings are listed below.
From Table 5.39, it can be concluded that a lot more candidates agreed on the
positive washback than the negative washback. More than 70% of the candidates
agreed that TEM would provide them with learning aims, so that they would spend
more time learning English. Moreover, preparing for TEM could help them clarify
the learning requirements and identify their weaknesses, so that they could better
arrange their learning tasks, allocating time for each skill training. Besides, most
candidates believed that their vocabulary had been enlarged and their learning effi-
ciency improved. As a result, 78.9% of the TEM4 candidates and 72.7% of the
TEM8 candidates agreed that TEM had promoted their language learning.
140 5 Stakeholders’ Perceptions of TEM and Its Washback
However, it is also possible that TEM will exert some negative washback on
candidates. For example, preparing for TEM may interfere with their regular learn-
ing practice; they would focus more on test paper work than language skill training;
and they may lower their requirement, satisfied with identifying the correct answers
to the multiple choice questions, instead of seeking the reasons. As indicated in
Table 5.39, such effects did exist, with around 40% of the candidates claiming to be
affected in a negative way.
According to Hughes (1993), a test will influence the participants, the processes and
the products. TEM may first affect the perceptions and attitudes of the students, and
then, their learning activities will differ, so will their learning products. In this sense,
a further investigation was made into students’ learning processes and learning
effects, to find out how their in-class and after-class learning activities are affected
by TEM and what kind of learning effects have been achieved.
TEM preparation courses are provided by some schools and TEM preparation activ-
ities are conducted in some classes, so students’ learning in classes may differ from
that of the non-preparation stage. In this sense, the students were surveyed on their
in-class learning experience, to find out possible differences.
5.4 Students’ Perceptions 141
From Table 5.40, it can be noticed that TEM4 preparation was more often intro-
duced than TEM8 preparation, since around half (45.0%) of the TEM8 students
selected “no preparation”, while only 28.4% of the students selected “no” for TEM4
preparation. When TEM preparation courses were offered, most of them would last
no more than 20 h. This roughly coincides with program administrators’ and teach-
ers’ responses.
Furthermore, questions were asked to find out the major activities conducted in
classrooms (Table 5.41).
142 5 Stakeholders’ Perceptions of TEM and Its Washback
It is common practice that students would spend some time preparing for the test
before the exam, so an investigation was made into students’ after-class learning
activities before TEM, to seek possible differences between preparation and non-
preparation stages (Table 5.44).
5.4 Students’ Perceptions 143
Table 5.44 describes the general situation of TEM preparation. It can be inferred
that most students spent some time on TEM preparation, but not intense. The major-
ity of them would prepare for no more than half a semester and they would spend
no more than 2 h every day on TEM preparation. They would work on the mock test
papers, but no more than 20 copies, and very few of them would take any extracur-
ricular training courses to prepare for TEM.
Next, the students were surveyed on the specific activities that they were engaged
in while preparing for TEM, and whether difference exists between TEM prepara-
tion and non-preparation stages.
144 5 Stakeholders’ Perceptions of TEM and Its Washback
Table 5.45 Learning activities conducted for TEM preparation after class
TEM4 TEM8
Frequent Sometimes Seldom Frequent Sometimes Seldom
TEM listening tasks 62.3% 24.6% 13.1% 54.4% 28.9% 16.7%
Listening to the radio 43.3% 28.9% 27.8% 25.8% 29.1% 45.1%
TEM4 dictation/ TEM8 61.9% 24.9% 16.0% 41.1% 30.3% 28.6%
mini-lecture
TEM reading tasks 52.1% 28.5% 19.5% 42.2% 29.0% 18.9%
Reading English books 23.3% 26.4% 50.3% 24.9% 31.0% 44.0%
TEM4 grammar and 54.6% 25.5% 19.9% 53.3% 26.8% 19.9%
vocabulary/TEM8
general knowledge
Reading grammar 31.4% 30.6% 38.0% 35.4% 30.7% 33.9%
books/Reading books
on linguistics, literature,
culture
TEM4 cloze/TEM8 51.6% 29.3% 19.1% 36.5% 33.7% 29.8%
translation
TEM4 note writing/ 27.7% 29.5% 42.8% 50.8% 27.9% 21.3%
TEM8 proofreading
TEM writing tasks 30.8% 30.8% 38.4% 24.3% 29.9% 45.7%
Writing in English 17.3% 20.6% 62.2% 15.6% 22.4% 62.0%
Oral English practice 25.0% 27.5% 47.4% 20.0% 25.5% 54.4%
Reciting new words 52.2% 26.5% 22.3% 34.5% 29.6% 35.8%
From the above findings, we may come to the conclusion that TEM would motivate
students to learn—they would spend more time learning and their learning activities
seemed to be intensified, both in and after class. However, would extra efforts lead
to better learning effects?
As shown in Table 5.47, on the whole, TEM has facilitated candidates improving
their English proficiency. Students’ vocabulary seemed to improve more, with
around 75% of the candidates claiming that TEM has helped enlarge their vocabu-
lary. Moreover, various English skills of the candidates have been strengthened,
especially those measured in TEM, such as listening, reading and writing.
Comparatively, those skills not covered in TEM seemed to improve less, such as
speaking, with only 40% of the candidates claiming that TEM has helped them
improve their speaking proficiency. One possible reason is that only 5–10% of the
TEM written test candidates would have the opportunity to take TEM oral tests.
Another proficiency that has received a low “strengthen” rate is “techniques in
answering multiple choice questions”—only around 50% of the candidates claimed
that TEM has helped improve their ability to figure out the correct answer to mul-
tiple choice questions. This is desirable because the purpose of TEM preparation is
to help candidates improve their language proficiency, rather than test taking
techniques.
146 5 Stakeholders’ Perceptions of TEM and Its Washback
According to Alderson and Wall (1993), tests will have washback effects for some
learners, but not for others. In this view, a further investigation was made into the
washback effects of TEM on various students, to identify possible factors that may
affect the washback of TEM (Table 5.49).
5.4 Students’ Perceptions 147
From the above analysis, it can be inferred that TEM has exerted some washback on
the learners, and the positive washback seemed to outweigh the negative washback.
However, is TEM an important factor that may determine their learning effects?
From Table 5.50, it can be concluded that the role TEM played in determining
candidates learning effects was very limited. Both TEM4 and TEM8 candidates
believed that their learning effects were affected by their own characteristics and the
teachers’ quality. This is consistent with the view of program administrators and
teachers, which further proves that TEM plays a facilitative rather than decisive role
in teaching and learning.
148 5 Stakeholders’ Perceptions of TEM and Its Washback
As a result, TEM was not considered a major learning purpose by most candi-
dates, as shown in Table 5.51.
As indicated in Table 5.51, candidates’ major aim for learning is to prepare for
their future career and further education. From this point of view, TEM is important
to some extent, because many candidates believed that TEM results would affect
their future career and education (see Table 5.37), which may account for the fact
that around one third of the candidates considered passing TEM one of their aca-
demic aims.
5.5 Triangulation
Questionnaire surveys have been conducted among participants from various lev-
els—experts from the nation level, program administrators from the school level,
and teachers and students from the classroom level. Therefore, a comparison is
made between their responses to the same questions.
5.5 Triangulation 149
Program administrators, teachers and students were all surveyed on their familiarity
with TEM (see Tables 5.2, 5.17, and 5.34), and the following findings can be sum-
marized, based on a comparison between their responses. First, program adminis-
trators were more familiar with TEM than teachers, and teachers had a better
knowledge than students. Second, program administrators and teachers were more
concerned about the Teaching Syllabus and the Test Syllabus, while students were
more familiar with the structure and testing procedures of TEM. Third, compared
with TEM4, TEM8 was less familiar to program administrators and teachers, on
such items as the Test Syllabus, marking criteria and time allotment. However, stu-
dents of TEM4 and TEM8 showed little difference in their familiarity with the test.
Last, the Oral Test Syllabus and the marking criteria were less familiar as compared
with other items, especially to program administrators and teachers; while students’
response showed that many of them were not so familiar with the Teaching Syllabus.
These findings were consistent with our expectation. As school curriculum
designers, program administrators should have a better knowledge of the Teaching
Syllabus and TEM, while students should be more concerned about the practice of
the test than how teaching should be conducted in classrooms.
Moreover, the Oral Test Syllabus and the marking criteria were less familiar not
only to students, but also to teachers and program administrators, which seems to
suggest that information had been inadequately provided and thus more communica-
tion between test constructors and test users was necessary. Furthermore, the TEM-
Oral test should be popularized, so that more importance would be attached to it.
The quality of TEM were surveyed among all the participants, but different ques-
tions were asked based on their respective characteristics. A comparison between
their responses may lead to the following conclusions (see Tables 5.3, 5.18, and
5.35).
First, generally speaking, participants’ opinions towards TEM were fairly posi-
tive, and program administrators seemed to be more satisfied with TEM than teach-
ers, and teachers more satisfied than students.
Second, compared with other items, the test report of TEM received a low satis-
faction rate, both from program administrators and from teachers. Moreover, all
participants agreed that more information should be provided in TEM test reports,
especially the section scores which were considered necessary by most participants
(see Tables 5.4, 5.19, and 5.36). This consensus of opinion seems to suggest that
adequate reform should be adopted to modify the TEM test report, to provide more
information so that TEM can better play its facilitative role in teaching and
learning.
Last, compared with TEM4, test administration for TEM8 was not as satisfac-
tory, as unanimously agreed by program administrators and teachers. Therefore, it
150 5 Stakeholders’ Perceptions of TEM and Its Washback
may indicate that the invigilation procedure of TEM8 should be simplified, so that
it can be better followed by both teachers and program administrators.
According to the experts who were designers of the Teaching Syllabus and TEM
Test Syllabus, the intended use of TEM scores were those listed in the Test Syllabus,
while other uses, such as conferring degrees, evaluating teachers and ranking
schools, can hardly be justified or should not be promoted.
However, such unintended uses, especially those of TEM4 scores, were sup-
ported by around half of the program administrators and teachers and over 40% of
the students (see Tables 5.8, 5.23 and 5.37). This seems to indicate that test users
had not been adequately informed of the purpose of TEM, which would pose poten-
tial hazards in misuse of test scores.
Another point for triangulation is the actual use of TEM scores in different
schools. Program administrators, teachers and students were surveyed on how TEM
scores were used in their schools, and it was noted that their responses to the same
question varied to some extent. As to whether TEM scores were used to confer
bachelor’s degree degrees, the responses of program administrators and teachers
were quite similar, while students’ response showed a marked difference, especially
for TEM4 scores. Around 30% of the program administrators and teachers claimed
that TEM4 scores were used to confer bachelor’s degree degree in their schools,
while around 60% of the students believed that TEM4 scores were used for this
purpose (see Tables 5.7, 5.22 and 5.37). Several reasons may account for this mis-
match. First, students were not well informed of how TEM4 scores were used, so
they took it for granted that the result of such an important test should be related to
degree conferment. Second, the teachers may deliberately tell students that TEM4
scores would be used in degree conferment, so that students would attach more
importance to the test and work harder. Third, program administrators’ and teach-
ers’ questionnaires were distributed to all TEM participating institutions, while stu-
dents’ questionnaire was only conducted in 250 schools selected, so there might be
some variance caused by sampling. Last, the subjective nature of the questionnaire
means that there might be possible misunderstanding or misinterpretation, which
may also lead to possible errors in data collection and analysis. In a word, the mis-
match has indicated that participants’ perception of the same question may vary, so
it is important to triangulate data and to employ various research methods.
An investigation was made to find out whether TEM preparation courses were intro-
duced and how they were conducted. The questions were slightly different among
different target groups: program administrators were expected to describe how TEM
preparation courses were introduced in their school curriculum; teachers were to
5.5 Triangulation 151
describe what kind of courses they had taught to prepare students for TEM; and
students were asked about the TEM preparation courses they had taken (see Tables
5.13, 5.26 and 5.40). Their responses varied in the following aspects.
First, the percentage of program administrators selecting “no preparation” is
much lower than that of the teachers and students. This is quite reasonable, since not
all English teachers would be assigned to teach TEM preparation courses and not all
students would take TEM preparation classes, especially when they are conducted
in a lecture.
Second, compared with program administrators, fewer teachers and students
selected “several lectures”, which might be attributed to the same reason listed
above.
Third, the percentage of program administrators and students selecting “overall
training” was quite similar, while fewer teachers selected it. This seemed to indicate
that most students would take the TEM preparation course which offered them over-
all training. However, as for teachers, only a few of them would, perhaps, have the
chance to teach this course.
Last, the number of teachers and students selecting “skill training” and “vocabu-
lary and grammar training” or “general knowledge training” outweighs that of the
program administrators. This probably suggests that TEM preparation activities
were conducted on courses that were not intended for TEM preparation. For exam-
ple, in a listening class, the teacher may train the students for TEM, employing
certain TEM tasks, even though it is a normal listening course, instead of a TEM
preparation course. In this sense, TEM may have washback on not only TEM prepa-
ration classes, but also non-preparation classes.
Both teachers and students were surveyed on how TEM preparation classes were
conducted and how they differed from non-preparation classes. Since the teachers
and students were not from the same class, it is natural that their responses may vary
to some extent. However, some common messages can be reinforced by their
responses (see Tables 5.27, 5.28, 5.41 and 5.42).
On one hand, various activities were conducted in TEM preparation classes,
rather than just focusing on test papers. Activities based on textbooks or aiming at
skill training were still conducted, and it seems that they took up as much time as
TEM preparation activities. This is a desirable finding, since TEM preparation
should, by no means, become the sole purpose of classroom teaching.
On the other hand, TEM preparation classes seemed to be more intensive than
non-preparation classes, with teachers and students spending more or the same
amount of time on most activities. The activities were not only working on test
papers, but also skill training tasks, while the time spent on textbooks was not
changed much. This indicates that TEM preparation activities were integrated with
other classroom tasks.
152 5 Stakeholders’ Perceptions of TEM and Its Washback
As for the factors influencing teaching and learning, the responses of experts, pro-
gram administrators, teachers and students varied to some extent, because of their
different status. For experts, the important factors were curriculum design, teacher
quality and student effort; program administrators considered teacher quality, stu-
dent effort and quality and leader attention the most important factors; teachers
rated student quality and effort, teacher income and leader attention the most impor-
tant; and to students, their interest, attitude, effort and teacher quality were the most
important factors (see Tables 5.15, 5.31 and 5.50).
From the above analysis, it could be concluded that the effect of teaching was
mainly determined by teachers and students, and the curriculum would also contrib-
ute to it. However, the role TEM played seemed to be quite limited, as unanimously
agreed by the participants.
To help explain the quantitative data collected in the questionnaire surveys, an open-
ended question was asked in each questionnaire and supplementary interviews were
conducted among relevant participants, to collect qualitative data to corroborate
research findings.
cated”. This is a possible reason for the low satisfaction rate for the item of
“convenient administration” (Tables 5.3 and 5.18).
Third, TEM8 was considered “too difficult” by quite a few program administra-
tors, teachers and students, which may account for the fact that TEM8 seemed to
exert less washback than TEM4 (see Tables 5.5, 5.9, 5.20, 5.24, 5.38 and 5.39). As
Green pointed out in his washback model (2007), the intensity of washback was
determined by the importance and difficulty level of the test, with difficult but still
attainable test exerting the most intense washback. In this sense, the washback of
TEM8 is less intense than that of TEM4 because TEM8 is unattainable to some test
takers. Meanwhile, this may also account for the research findings that TEM4
results were claimed to be more often related to degree conferment, teacher evalua-
tion and school ranking (see Tables 5.6, 5.7, 5.21, 5.22, and 5.37). Moreover, TEM4
preparation courses seem to be more widely introduced (Tables 5.13, 5.26, and
5.40).
Apart from the above issues, participants also made quite a few suggestions on
the reform of TEM and teaching, which will be summarized in Chap. 7.
5.6 Summary
This chapter has reported research findings from four questionnaire surveys among
experts, program administrators, teachers and students. These large-scale surveys
aimed at eliciting participants’ perceptions of TEM and its washback, which can be
summarized as follows.
First, TEM enjoyed a high familiarity rate with the participants: the program
administrators and teachers were familiar with the current national Teaching
Syllabus and the TEM Test Syllabus, and the students were familiar with TEM test
content and testing procedures. However, they were less familiar with TEM-Oral
test and the marking criteria.
Second, participants’ evaluation of TEM was more positive than negative, and
the program administrators seemed to hold a more favorable opinion than teachers,
and teachers more positive than students. The test content and method of TEM
received a higher satisfaction rate than its administration, and the test report seemed
to be the least satisfactory item.
Third, the washback of TEM was considered to be more positive than negative,
and it has achieved most of the purposes specified in the Test Syllabus.
Fourth, decisions based on TEM were mostly as intended, but there were also
unintended uses such as ranking schools, evaluating teachers, conferring degrees,
and recruiting employees. Moreover, participants’ attitudes towards these unin-
tended uses varied, with around half holding a supportive view.
Fifth, TEM preparation was conducted in most institutions, but half in the form
of lectures. Moreover, most TEM preparation courses would last for no more than
half a semester, which indicates that TEM preparation was by no means intensive.
5.6 Summary 155
Sixth, various activities were carried out in TEM preparation classes, but TEM
related activities seemed to be more frequently conducted. Besides, there seemed to
be a heavy teaching and learning load in TEM preparation classes.
Seventh, students spent more time learning English before TEM and TEM was
believed to help them improve their English proficiency, especially those skills
assessed in TEM.
Last, the important factors affecting teaching and learning were quite similar for
various participants, such as student effort and teacher quality. Moreover, it is unan-
imously agreed that the role TEM played was quite limited.
Apart from the above findings, open questions and interviews were employed to
collect qualitative data to corroborate questionnaire findings.
Though tentative to some extent, these findings have offered a relatively clear
profile of TEM and it washback, and will thus have implications for further research,
as well as the reform of TEM and teaching.
Chapter 6
Classroom Practice and the Washback
of TEM
Six teachers were selected for observation. They were all English teachers with
TEM preparation experience. Their educational background and teaching experi-
ences are listed in Table 6.1.
As shown in Table 6.1, of the six teachers observed, half had MA degree and the
other half, PhD degree. Four of them were lecturers and the other two were associ-
ate professors. They were all experienced teachers with a teaching experience of
around 10 years.
Altogether 52 classes were observed and each class was around 45 min long. It
would be ideal if TEM preparation and non-preparation courses could be delivered
by the same teacher, so that teacher variance could be limited. However, due to
some practical reasons, two out of the six teachers observed had only TEM8 prepa-
ration courses recorded. In the following semester when they were supposed to
teach advanced English, one (Teacher E) was assigned to teach American Literature,
so Teacher E had no advanced English classes to be observed; and the other (Teacher
F) was transferred from the ordinary university where she worked to a key univer-
sity, where she was assigned to teach fundamental English. Since fundamental
English was, in terms of teaching content, a comprehensive English course, thus
compatible with TEM8 preparation courses. Therefore, her fundamental English
classes were also observed. As for the rest of the teachers observed, they all taught
compatible TEM preparation courses and non-preparation courses, that is, both
TEM4 preparation course and fundamental English, or both TEM8 preparation
course and advanced English. Table 6.2 lists the number of classes taught by them.
Moreover, classes from various schools were observed, to investigate not only
top universities, but also less established schools (Table 6.3).
Last, both TEM-preparation courses and non-preparation courses were surveyed.
Since TEM is a comprehensive test and the TEM preparation courses observed are
often comprehensive ones, equivalent non-preparation courses should also be com-
prehensive in that they deal with all language skills. Therefore, fundamental English
and advanced English courses were selected as the non-preparation courses to be
observed. Moreover, it is common practice for some schools to offer TEM prepara-
tion in fundamental English and advanced English courses in the semester when
TEM4 and TEM8 are held or in the previous semester. Therefore, the TEM
preparation classes observed here are of two types, one is in a separate course named
TEM Preparation Course, and the other is the TEM preparation classes delivered in
fundamental English or advanced English courses. In this sense, whether a class is
considered TEM preparation class or non-preparation class is determined by the
teaching content of that class. The number of TEM preparation and non-preparation
classes observed is listed in Table 6.4, together with the time when these courses
were offered.
From Table 6.4, it can be noted that TEM4 preparation was often included in
fundamental English courses to second-year students who were to take TEM4 at the
6.3 Characteristics of English Classroom Practice 159
end of the fourth semester; while TEM8 preparation courses were mostly offered in
a separate course to fourth-year students who were to take TEM8 at the beginning
of the eighth semester. One TEM8 preparation class was offered to the third-year
students in advanced English courses. Does it mean that TEM8 preparation would
run on for a year? A further interview with the teacher (Teacher D) shows that it was
not because TEM8 preparation would run on for two semesters. To have TEM8
preparation classes in the third year was simply because no advanced English
courses were to be offered in the fourth year, so TEM8 preparation had to be
advanced.
According to Alderson and Wall (1993), a test will influence teaching and learning,
such as the content and method, rate and sequence, degree and depth, and the atti-
tude towards teaching and learning. Therefore, in the observation, the classroom
activities were closely analyzed, to discern the content and method involved, as well
as the classroom atmosphere.
160 6 Classroom Practice and the Washback of TEM
The teaching content of both the TEM preparation and non-preparation classes has
been analyzed, and it is noted that these classes covered a wide range, encompass-
ing various language knowledge and language skills, as well as all the test content
of TEM.
Table 6.5 summarizes the teaching content of all the classes observed. The per-
centage of time devoted to each task was calculated and reported. For example, in a
45-min class, if the teacher spent 20 min explaining new words in the textbook, then
45 would be recorded in the column of grammar and vocabulary (20/45 = 0.44,
which is round up to 45%, with 5%being the minimum unit for calculation).
Accordingly, the numbers listed in the above table are the minimum and the maxi-
mum percentage of time devoted to each task of all the classes observed, as well as
the percentage of time devoted to each task on average. The number in brackets in
the column of “Maximum” is the number of classes devoted solely to a single task.
For example, reading is the only skill tackled in 6 classes. Moreover, while calculat-
ing the average percent of time spent on each task, TEM preparation and non-
preparation classes were calculated together because it is likely that some TEM
tasks may be dealt with in non-preparation classes, and vice versa.
In this sense, the following findings can be summarized from the above table.
Firstly, the classes observed had a wide coverage, which encompassed all the
tasks listed except two: general knowledge and integrated tasks. General knowledge
refers to the special knowledge for English majors, such as linguistics, English
6.3 Characteristics of English Classroom Practice 161
l iterature and culture of the English speaking countries. It has been covered in TEM
preparation classes in the form of TEM preparation, but not handled in non-
preparation classes. This might be attributed to the fact that the non-preparation
classes observed were Fundamental and advanced English courses, which mainly
dealt with intensive reading; while general knowledge would be taught in some
separate courses such as linguistics, English literature (British or American), and
culture of English speaking countries. Since these courses were available in most
schools surveyed (see Table 5.11), the teachers of fundamental and advanced English
would not spend time on them. However, according to the Teaching Syllabus, the
purpose of teaching and learning for English majors is not only to improve their
English proficiency, but also to broaden their horizon and enhance their cultural
awareness. As clearly stated in the course description of advanced English (NACFLT
2000), this is a comprehensive English course which aims at widening students’
knowledge scope, deepening their understanding of life and culture, and cultivating
their ability to appreciate literature. Therefore, it might be helpful if teachers could
spend some time introducing related cultural and literary knowledge in class.
Another task not involved is integrated task. In the current observation scheme, inte-
grated tasks mainly refer to dictation and cloze tasks, tasks that have their equiva-
lents in TEM. A possible reason why these tasks were not covered is that these tasks
were quite time consuming so they were seldom done in class time.
Secondly, non-preparation classes covered a wide range of language knowledge
and language skills, but reading is by far the most frequently tackled skill. This find-
ing is in line with our expectation because the courses observed were Fundamental
and advanced English, which mainly dealt with intensive reading skills. Besides
reading, knowledge of grammar and vocabulary also received much attention in
class, accounting for more than 10% of the class time on average. Less emphasis
was laid on the rest of the skills, such as listening, writing, speaking and translation,
which account for no more than 5% of the class time on average. Comparatively,
translation and speaking tasks were more often conducted, because teachers may
ask students to translate certain sentences in the textbook to check their understand-
ing, and speaking activities may involve various tasks, such as reading aloud, stu-
dent presentation, discussion and so on. However, listening tasks are quite difficult
to conduct, especially in classrooms not equipped with audio facilities, so they are
often done in a separate listening course. As for writing, this is a time-consuming
task, so it is seldom conducted within the class.
Thirdly, all TEM tasks were covered in the classes observed, and the time spent
on each task on average seemed to be more even, as compared with the non-
preparation tasks. This is mainly due to the fact that most TEM preparation classes
observed were comprehensive ones, which were supposed to cover all items of
TEM. However, comparatively, the writing task received most attention, with 5
classes tackling this task, and general knowledge seemed to be the other firm topic
for TEM preparation; while TEM listening seemed to be the least emphasized.
Several reasons may account for this finding. First, general knowledge, as mentioned
by Teacher F in the TEM preparation class, is comparatively easy among all TEM8
items, so it was believed that a brief preparation would help students score higher in
162 6 Classroom Practice and the Washback of TEM
this part. Next, writing received more attention because it is an item included in both
TEM4 and TEM8, and it is common for teachers to brief its marking criteria and
comment on students’ writing assignment in class. Finally, TEM listening received
less attention not because it was unimportant, but that it was difficult to practice in
classroom without audio facilities, and as questionnaire findings indicate, some
schools had offered a separate advanced listening course to cope with this task.
Last, it is noted that activities conducted in some classes were quite monotonous,
covering one task only, and in this respect, TEM preparation classes seemed to be
more focused than non-preparation classes. Among all the 27 TEM preparation
classes, 14 focused on one skill only, while among 25 non-preparation classes, only
6 focused on one task only. This seems to suggest that non-preparation classes offer
a wider range of teaching content than TEM preparation classes.
In addition, a brief pre-observation interview was conducted with the teachers to
find out the teaching materials they would use in the classes observed.
Table 6.6 shows that various teaching materials had been used in the classes
observed, including not only textbooks and TEM test papers, but also TEM mock
tests and other materials such as the authentic materials from newspapers, novels,
and so on. This is a desirable trend since the various materials used may help stu-
dents broaden their horizon and extend their knowledge of English.
The following aspects have been observed while investigating the teaching methods
employed, namely, the activities organized, the use of English by both teachers and
students, and the features of teacher-student interaction.
Classroom observation shows that various activities had been organized, such as
teacher lecture, student presentation, individual seat work, teacher-student interac-
tion, and so on. The percentage of time devoted to each activity is counted and
reported in the following table. For example, in a 45-min class, if the student spent
10 min making presentation, then 20 will be recorded in the column of student pre-
sentation (10/45 = 0.22, which is round down to 20%, with 5% being the minimum
calculation unit). Moreover, in the column of “Maximum”, the number of classes
adopting a single activity is reported, so 4 in “teacher lecture” means 4 classes
employed only one type of activity—teacher lecture.
6.3 Characteristics of English Classroom Practice 163
From Table 6.7, it can be inferred that teacher lecture was the dominant activity
in classrooms, which took up more than three fourths of the class time. Moreover,
there were 4 classes in which teacher lecture was the only type of activity con-
ducted. Among the other activities, teacher-student interaction was the most fre-
quent one, which accounted for around 10% of the class time. The above findings
seem to indicate that teachers still played a dominant role in the classroom and the
time for students to perform was quite limited, so the classrooms were still teacher-
centered, rather than student-centered.
Both teachers and students were observed on how frequently they spoke the target
language—English. Five-point Likert scale was employed to measure the fre-
quency: 5 will be recorded if they speak English for more than 80% of the time, and
4 means from 60 to 80%, and likewise.
From Table 6.8, it can be noted that teachers used English more frequently than
students, but both teachers and students varied vastly among themselves in the use
of English. On average, teachers used English a little more often than not, while
students were just the opposite. First language can play a facilitative role in second
language classrooms; however, teachers’ excessive use of the first language will
deprive students of the opportunity to get enough English input, and students will
lose the opportunity to practice English if they overuse their first language in the
classrooms. According to Macaro (2005), 10–15% constituted an acceptable thresh-
old for the use of the first language.
164 6 Classroom Practice and the Washback of TEM
Teacher-student interaction took up around 10% of the class time, so further inves-
tigation was made to discern the communicative features of their interaction,
namely, the authenticity of the questions asked, the length of questions and
responses, as well as the way teachers offered feedback (Table 6.9).
The authenticity of questions and responses is mainly determined by the unpre-
dictability of the information provided. An authentic question seeks relatively
unpredictable information, that is, “information not easily anticipated in that a wider
range of information can be given”, and its corresponding response would also be
considered authentic (Watanabe 1997: 303). In this sense, it would be considered an
authentic question if the teacher asks, “How do you understand …?”, while “What
is the meaning of arbitrary?” would be considered pseudo. The frequency of authen-
tic questions asked and responses given would be calculated subsequently. The
teacher’s questions would be coded as “authentic” if authentic questions were asked
frequently; while “neutral” means around half of the questions asked were authentic
and “pseudo” means most questions asked were not authentic. As for the length of
questions and responses, five point Likert scale was employed, with 1 referring to
word; 2, phrase; 3, a single sentence; 4, 2–3 sentences; and 5, more than 3 sen-
tences. Teachers’ feedback was also measured in a five-point Likert scale, with 1–5
referring to the different frequency band listed in Table 6.9.
The following findings can be summarized based on Table 6.9.
First, in terms of authenticity, classroom communication was not quite authentic,
especially for teachers’ questions. It was noted that most questions asked by t eachers
6.4 TEM Preparation Classes vs. Non-preparation Classes 165
were about the meaning of a certain word or sentence, or the answer to a certain
exercise question, the aim of which was just to check whether students’ understand-
ing was consistent with the key, instead of seeking any unknown information. It
seems that teachers preferred to ask these closed display questions, that is, questions
with one possible answer that the teacher has already known. In this way, teachers
may avoid unpredictability of students’ responses, so that they can have a better
control of the class (Smith and Higgins 2006). However, students’ questions were
more authentic, usually about the puzzles that they had encountered while
learning.
Second, teachers’ questions and responses were often in a sentence or a cluster
of sentences while students’ questions and responses were much shorter, often in
the form of phrases or a single sentence. This might be caused by the students’ lim-
ited language ability, and another possible reason is that the questions asked by
teachers were often closed display ones, which “elicit short, mechanical responses”
(Ho 2005: p.298). In this sense, it seems that more efforts should be made to encour-
age students to say more, so that they would have more opportunities to practice
their English.
Last, as to teachers’ feedback, it seems that teachers gave feedback sometimes,
and their feedback was more positive than negative. Moreover, their feedback
seemed to focus more on form than meaning.
A comparison has been made between TEM preparation classes and non-preparation
classes, and it is noted that differences exist in the following aspects.
166 6 Classroom Practice and the Washback of TEM
It is natural that the teaching content of TEM preparation classes would be different
from non-preparation classes, since the former would be more TEM oriented, focus-
ing on the test content of TEM. In addition, TEM preparation classes seemed to
either more focused or more diverse.
From Table 6.11, it can be noted that around half of the TEM4 and TEM8 prepa-
ration classes focused on one skill only, while most non-preparation classes covered
two to three skills. This seems to suggest that TEM preparation classes were more
focused. However, 25% of the TEM8 preparation classes covered five skills, sur-
passing all non-preparation classes, which, on the other hand, seems to suggest that
TEM preparation classes were more diverse. A close observation of the TEM prepa-
ration classes shows that it was because TEM preparation classes were mainly of
two types, one was organized by section, so each class focused on one section of
TEM, and the other was comprehensive preparation usually in the form of test paper
analysis, so the teacher would try to cover all the skills measured in TEM.
The above finding further proves that TEM will affect the teaching content, as
well as the organization of the teaching content. The skills or knowledge taught in a
single TEM preparation class may not facilitate or reinforce each other, unlike what
is taught in a non-preparation class. To illustrate, the listening and reading activities
conducted in non-preparation classes were usually related, focusing on the same
topic; while the listening and reading activities conducted in TEM-preparation
classes were often separate tasks on different topics listed in a test paper. In this
sense, the teaching content of non-preparation classes seems to be better
organized.
A comparison has been made between the teaching methods used in TEM prepara-
tion and non-preparation classes, and it is found that they differed significantly in
the classroom activities organized, English language used in class, but no significant
difference has been identified in the features of teacher-student interaction and the
classroom atmosphere.
6.4 TEM Preparation Classes vs. Non-preparation Classes 167
One-Way ANOVA has been employed to compare the means of the four indepen-
dent groups of classes: TEM4 preparation classes, TEM8 preparation classes, fun-
damental English and advanced English, and the research findings indicate that they
differed significantly (significance level: 0.05) in the mean time devoted to teacher
lecture, student presentation and teacher-student interaction.
Table 6.12 shows that teacher lecture took up around 85% of the class time in
TEM preparation classes, while in non-preparation classes, around 70% of the time
was devoted to it. As for student presentation, it was seldom conducted in TEM
preparation classes (1.36% for TEM4 preparation classes and 0 for TEM8 prepara-
tion classes). Moreover, teacher-student interaction took place less frequently in
TEM preparation classes than in non-preparation classes. All the above findings
seem to suggest that TEM preparation classes were more teacher-centered, with
more time devoted to teacher talk and less time to student practice.
Interviews with teachers and students have identified the following reasons for
this variance.
First, the limited time of TEM preparation classes. As suggested by the question-
naire findings (Tables 5.13, 5.26, and 5.40), most schools offered a TEM prepara-
tion course of around 10 class hours, which, according to the teachers interviewed,
was not enough. Therefore, to complete this demanding teaching task, the teachers
lectured a lot in class to cover all the teaching content, thus leaving little time for
students to practice.
168 6 Classroom Practice and the Washback of TEM
In TEM preparation classes, teachers seemed to lecture more, but which language
did they use more often, the first language Chinese or the target language English?
A comparison was made and it was noted that significant difference exists between
TEM preparation and non-preparation classes (Table 6.13).
Five-point Likert scale has been employed to measure the frequency of English
used in class: 5 means over 80% of the class was delivered in English, 4 means
60–80%, 3 means 40–60%, 2 means 20–40%, and 1 means less than 20%. From the
above table, it can be concluded that English was used much more frequent in non-
preparation classes than in TEM preparation classes. In non-preparation classes,
especially advanced English, English was used most of the time (above 80%); while
in TEM preparation classes, especially TEM4 preparation classes, Chinese was
used more often than not.
The interviews with teachers and students show that both teachers and students
preferred Chinese to English in a TEM preparation class, because for teachers, it
was more efficient to explain in Chinese and for students, they claimed to have a
better understanding of TEM when delivered in Chinese.
As for students’ use of English, a similar pattern has been identified (Table 6.14).
Table 6.13 English used by teachers in TEM preparation and non-preparation classes
N Mean Std. Deviation Std. Error Minimum Maximum
TEM4 preparation 11 2.00 .00 .00 2.00 2.00
TEM8 preparation 16 2.68 1.25 .31 1.00 5.00
Fundamental English 17 4.17 1.18 .28 2.00 5.00
Advanced English 8 5.00 .00 .00 5.00 5.00
(F = 21.051, p = 0.000)
Note: N: the number of classes of each type
6.4 TEM Preparation Classes vs. Non-preparation Classes 169
Table 6.14 English used by students in TEM preparation and non-preparation classes
N Mean Std. Deviation Std. Error Minimum Maximum
TEM4 preparation 10 1.40 .51 .16 1.00 2.00
TEM8 preparation 13 1.76 1.09 .30 1.00 5.00
Fundamental English 17 3.82 1.23 .29 1.00 5.00
Advanced English 8 4.50 .53 .18 4.00 5.00
(F = 25.160, p = 0.000)
Note: N: the number of classes of each type
It seems that students used English less frequently than teachers, but the same
pattern of use existed between TEM preparation and non-preparation classes, with
students using English more frequently in non-preparation classes.
might be negative in the macro context (see 4.1.3.4). In a word, TEM preparation
courses can hardly help students improve their language proficiency, although it
may familiarize students with TEM so that they will perform better in the test.
According to the questionnaire survey findings, teaching and learning was affected
by many factors, such as teacher quality, student effort, student quality, and so on,
while the role played by TEM was quite limited (Tables 5.15, 5.31, 5.50). In this
sense, a further analysis has been made to discern differences caused by other fac-
tors, namely, differences caused by teacher difference, school difference (key uni-
versity, ordinary university and college), and student difference (whether they were
first-year, second-year, third-year or fourth-year students). The significant differ-
ences are reported in the following part.
Further analysis seems to indicate that TEM was an overwhelming factor affecting
the language used in class, since the significant differences located in other factors
were, to some extent, still attributed to it.
As shown in Table 6.15, significant difference existed between classes delivered
to students of different grades. However, a close observation of the data reveals that
English was more frequently used in first-year and third-year classes simply because
these mainly consisted of the fundamental English and advanced English classes,
while most TEM preparation classes were delivered in the second and fourth year.
In this sense, this finding further attests to the washback effects of TEM on the lan-
guage used in class.
From Table 6.16, it can be noted that Teacher D used English most frequently
and Teacher E used English least frequently. A further look at the courses they
taught would clarify this finding. Teacher E had only three TEM8 preparation
classes observed, so her use of English in class was minimum, less than 20%; while
Teacher D had three advanced English classes and one TEM8 preparation class
observed, and moreover, her TEM8 preparation class was taught in the advanced
English course, so it retained the style of non-preparation classes. As a result, all the
classes of Teacher D were delivered in English (over 80% of the class time for all
her classes). This finding, like the previous one, may serve as a further proof of the
washback of TEM on the language that teachers use in class.
Another factor that may have contributed to the difference in teachers’ use of
language in class is the school type, that is, whether the class was from a key uni-
versity, ordinary university, or college.
As shown in Table 6.17, the level of school seemed to have some effect on the
English used by teachers in class, with teachers from key universities speaking
English most frequently. This is possibly attributed to differences in teacher and
student quality in various schools. However, since the p value is 0.050, which is
exactly the threshold value for significant difference in the one-way ANOVA test,
the inference should be made with prudence.
No significant difference has been detected between TEM preparation and non-
preparation classes. However, further analysis shows that some differences exist
between classes delivered by different teachers in different schools.
Table 6.18 shows that significant difference existed in almost all communicative
features of teacher-student interaction in the classes delivered by each teacher. This
proves that every teacher had their own teaching method, and their teaching method
will not be affected by TEM significantly, since no significant difference has been
located between TEM preparation and non-preparation classes. This finding is, to
172 6 Classroom Practice and the Washback of TEM
some extent, in line with the research findings of previous washback studies—a test
will influence what teachers teach more than how they teach (Cheng 2005; Gu 2007;
Wall 2005; and so on).
Moreover, it has been noted that the level of school would also have some effect
on teacher-student interaction (Table 6.19).
6.5 Classroom Practice Affected by Other Factors 173
It seems that teachers from different schools adopted different questioning skills,
with teachers from key universities asking the most authentic and lengthy questions.
This may be attributed to the differences in teacher and student quality in different
schools, since teachers with a better command of English are, to some extent, less
reluctant to ask authentic questions that would seek unknown information, ques-
tions that may add to the unpredictability of the class, which would pose some chal-
lenge to the teacher. On the other hand, students with higher language proficiency
would be more capable of answering these open lengthy questions.
Laughter has been used as an indicator of classroom atmosphere, and it is noted that
no significant difference exists between TEM preparation and non-preparation
classes, classes delivered by different teachers, and classes delivered in different
schools. However, classes of different grades are found to be significantly
different.
The Table 6.20 shows that first-year and second-year students laughed more fre-
quently than students in the third and fourth year. This seems to suggest that class-
room atmosphere declined as students entered the advanced stage. Interview
findings supported this view, as some teachers complained that students’ passion for
learning dwindled with the progression of their college education. Moreover, some
students also complained that classes became less and less “meaningful”. In this
6.6 Triangulation 175
view, it seems that more efforts should be made to improve classes, so that students’
interest in learning can be sustained.
6.6 Triangulation
In the questionnaire surveys, both teachers and students answered questions on their
practice in TEM preparation classrooms and the factors affecting teaching and
learning, which can be employed to corroborate classroom observation data.
Questionnaire findings indicate that teaching and learning was affected by many
factors, such as teacher quality, student quality, student effort, and so on (Tables
5.15, 5.31, and 5.50), while the role played by TEM was quite limited. This has
provided support for the classroom observation finding that classroom practice,
especially the features of teacher-student interaction, was affected by many other
factors, rather than TEM.
Furthermore, the factors identified in the questionnaire surveys actually coincide
with those in the classroom observation. According to the questionnaire findings,
teaching and learning was affected by such factors as leader attention, teacher qual-
ity, student quality, student effort, student interest (the opinions of program admin-
istrators, teachers and students varied slightly because of their different perspectives).
These factors were, in fact, equivalent to those identified in the classroom observa-
tion. For example, factors affecting the features of teacher-student interaction
include teachers’ individual differences and the level of schools, which, actually,
encompass such factors as teacher quality, student quality. In this sense, classroom
observation data could be corroborated by questionnaire findings.
Interviews were conducted among relevant teachers and students to verify class-
room observation findings, and it seems that the interview data collected are quite
supportive, providing answers to most of the puzzles that emerged in the analysis of
the observation data. Corroboration has been offered in the analysis of observation
data in Sects. 6.3, 6.4 and 6.5, and the following is a summary.
First, it has been noted that teachers lectured more in the classes observed, and
interview findings offered several possible reasons, such as the limited time of the
TEM preparation classes, teachers’ and students’ attitudes towards teacher lecture
and student practice (see Sect. 6.4.2.1).
Second, in the analysis of the observation data, teachers and students were found
to use Chinese more often in TEM preparation classes than non-preparation classes,
and interview data suggest that Chinese was preferred by both teachers and students
because most of them believed that it was more effective and efficient to communi-
cate in the mother tongue.
Last, in the classes observed, students in the fundamental stage were found to
laugh more often than those in the advanced stage, which, according to interview
findings, could possibly be attributed to the declining interest and passion that stu-
dents had for language learning.
6.7 Summary 177
6.7 Summary
This chapter has presented and discussed classroom observation findings, and the
research results can be summarized as follows.
First, the teaching content of TEM preparation classes varied greatly from that of
the non-preparation classes, with TEM preparation classes focusing more on test
content.
Second, the organization of TEM preparation classes differed from that of the
non-preparation classes in that more teacher lectures were conducted.
Third, the language used for class instruction and communication in TEM prepa-
ration classes was, in some cases, different from that of the non-preparation classes,
since TEM preparation classes were more often delivered in Chinese and the stu-
dents also tended to use Chinese more frequently.
Fourth, no significant difference was identified in the features of teacher-student
interaction between TEM preparation and non-preparation classes, but variance
existed among teachers, as well as different schools.
Fifth, teacher lecture accounted for around 70% of the class time in non-
preparation classes and around 85% of the class time in non-preparation classes
(Table 6.12), which seemed to suggest that the classroom was still dominated by
teachers, be it TEM preparation or non-preparation classes (Table 6.12).
Sixth, research findings of the features of teacher-student interaction showed that
teacher’s questions were not authentic and students’ responses were quite limited.
In this sense, it seems that more efforts should be made to improve the questioning
techniques of teachers so that students would be encouraged to communicate more.
178 6 Classroom Practice and the Washback of TEM
Last, laughter seemed to occur more frequently among students in the fundamen-
tal stage than those in the advanced stage, and moreover, students did not laugh
much in either type of classes, with four times being the maximum. It seems to sug-
gest that more efforts should be made to enliven the classroom atmosphere, so that
students’ interest and passion for language learning could be aroused.
In addition to the above findings, pre-observation interviews and post-observation
interviews were conducted to collect supplementary information about the classes,
and to seek answers to the problems that emerged in the observation process and the
data analysis process. The data collected from various approaches have corrobo-
rated each other.
Chapter 7
Conclusions and Implications
All the five experts surveyed held a positive attitude towards both TEM and its
washback, believing that TEM was reasonably designed and it would facilitate and
promote teaching and learning for English majors. In addition, they were all against
the unintended use and interpretation of TEM results, questioning the justification
of them. Moreover, they offered some suggestions on the reform of TEM, such as
its test content, test method, test report, and so on, which shall be listed in Sect. 7.2,
together with the suggestions from the program administrators, teacher and
students.
The program administrators were, in general, quite positive about TEM and its
washback, but their attitudes differed in some aspects. Their evaluation of TEM test
report and the convenience of TEM8 administration seemed to be less favorable
than that of the other aspects of TEM quality, but TEM4 received a slightly higher
satisfaction rate. Moreover, their attitude towards the unintended use and interpreta-
tion of TEM results varied vastly, with proponents slightly more than opponents.
According to the program administrators, TEM has exerted some influence on their
curriculum design, but different practices were adopted in different schools. TEM
preparation was offered in many schools, but mostly in the form of lectures. TEM4
preparation seemed to be more often offered than that of TEM8, but neither was
intense, lasting for no more than one semester. Compared with other factors such as
teacher quality and student efforts, TEM had limited effects on teaching.
Furthermore, it was noted that program administrators’ perceptions of TEM
would be affected by their personal factors, such as their teaching experience and
educational background.
Teachers’ perceptions of TEM and its washback were, generally speaking, quite
positive, although they seemed to be less favorable than the program administrators.
Like the program administrators, they were less satisfied with the test report and the
convenience of TEM8 administration. Moreover, their attitudes towards the unin-
tended use and interpretation of TEM results also varied, with half agreeing and half
disagreeing. According to the teachers, TEM would affect their teaching aims and
content more than the teaching method and ideology, and their TEM preparation
classes varied from the non-preparation classes mainly in the teaching content, as
the practice and explanation of TEM test papers would be more frequently con-
ducted. However, the role TEM played in their teaching was quite limited, as com-
pared with that of the other factors such as student quality and student efforts.
In addition, teachers’ personal factors, such as their teaching experience and
educational background, were found to affect their perception of TEM washback.
Students’ perceptions of TEM and its washback were more positive than negative,
especially their perceptions of the test content of TEM and the overall washback of
TEM. More than half of the students believed that TEM4 score was used to confer
bachelor’s degree degrees, and their attitude towards this practice was more sup-
portive than repellent.
TEM would influence the learning practice of students, both in class and after
class. In class, the TEM related content was more often learnt, such as test paper
explanation, and after class, students would spend more time doing exercises on the
skills measured in TEM, such as multiple-choice listening and reading exercises.
However, TEM preparation would not last for a long time, with most students
7.2 Implications for the Reform 181
spending no more than half a semester and the time spent daily seldom exceeded
3 h. The positive washback of TEM on the learning effects have been acknowledged
by most students, who claimed that TEM has helped enhance their English profi-
ciency, especially the language knowledge that can be improved in a short time,
such as their vocabulary, grammar knowledge and general knowledge on English
linguistics, literature and culture. However, compared with other factors such as
their learning efforts and interest, TEM had limited effects on students’ learning.
Besides, students’ individual differences, such as gender and language profi-
ciency, were found to affect their perception of TEM washback.
TEM has exerted some influence on the classroom practice, such as teaching con-
tent and teaching methods. The teaching content of TEM preparation classes
focused more on the measured language knowledge and skills in TEM, and the
classes were more teacher-centered, with teacher lecture being the dominant teach-
ing format. Moreover, in TEM preparation classes, Chinese was more often used,
both by teachers and by students.
However, besides TEM, many other factors were found to affect classroom
teaching and learning, such as teachers’ individual differences and students levels of
language proficiency.
The aim of a washback study can be twofold. The fundamental one is to describe
this complicated phenomenon—what and how a test affects teaching and learning
and what are the factors contributing to the washback effects; and the ultimate aim
is to promote positive washback and evade negative washback, that is, to revise the
test so that it will produce more positive washback and to improve teaching and
learning so that the positive washback of a test can be fully realized.
In this sense, the following part will discuss the reform of TEM and teaching and
learning for English majors, based on the research findings of the current washback
study.
First and foremost, the test content of TEM could be more comprehensive to better
match the TEM test construct, thus improving the validity of TEM and bringing
more beneficial washback. If possible, the TEM-Oral test needs to be popularized.
Compared with TEM reaching an annual test population of over 200,000, the num-
ber of TEM-Oral test takers remains much smaller, about 10,000–20,000 every year,
which means that in some universities, no students take the oral test. So, it naturally
follows that many participants surveyed were not familiar with TEM Oral Test
Syllabus (Tables 5.2, 5.17, and 5.34), less emphasis was attached to the speaking
skill in class (Tables 5.27, 5.28, 5.41, and 5.42), and less time was spent on speaking
practice after class (Table 5.45). These findings deserve our close attention. As TEM
is supposed to assess test takers’ language proficiency which includes speaking, the
relatively low familiarity of the TEM-Oral test and the less importance attached to
it are not conducive to promoting positive washback. Therefore, it is necessary to
encourage more students to take the TEM-Oral test, to make it better known to
teachers and students so as to promote a balanced development in students’ lan-
guage proficiency. Moreover, the test content of TEM could be more practical and
cater to students’ needs, both in the test materials selected and test items designed.
On the one hand, the test materials could be more practical, including those articles
that students may use in their daily life and future career. For example, the transla-
tion materials could include not only literary works, but also articles of various
kinds such as business letters and news. On the other hand, the test items should
focus on the common problems of the students. For instance, the proofreading part
in TEM8 could employ more common mistakes made by students, selected from
some corpora of students’ essays.
Furthermore, the test methods employed in TEM could be modified to include
more item types, such as blank filling and short answer questions, so that test method
effect could be reduced, thus improving the validity of TEM. Due to some practical
reasons, multiple choice question is still the most frequently employed item type in
TEM, especially in TEM4. Moreover, some language abilities, such as reading,
cloze and general knowledge, are solely assessed in multiple choice questions. In
this sense, it is recommended that more subjective item types be added as a supple-
ment to better assess test takers’ ability in these aspects.
First of all, TEM test administration could be simplified. As is shown in Tables 5.3
and 5.18, convenience of test administration for TEM8 received the lowest satisfac-
tion rate, which, as is indicated by the responses to the short-answer question, might
have resulted from the complicated administration procedure required for a TEM8
listening task which integrates listening, note-taking with gap-filling. During this
listening task, if answer sheets (a gap-filling passage) were distributed before listen-
ing (they should be distributed after listening), test takers would gain an advantage,
7.2 Implications for the Reform 183
as they could just fill in the gaps without taking notes (test takers are required to take
notes while listening to a mini-lecture). If a small mistake occurred in test adminis-
tration (such as distributing answer sheets prematurely), it could result in construct-
irrelevant variance, thus undermining washback. Therefore, a review of the test
invigilation procedures is needed with a view to improving practicality and thus
maintaining validity.
Second, TEM test administration procedures should be clarified to ensure that
they are strictly followed in every participating school. The following suggestions
have been offered by both teachers and program administrators. First, if possible,
training should be offered to TEM invigilators, so that they will have a better idea of
the administration procedures. Second, a more detailed test administration manual
should be provided, so the teachers who are unfamiliar with TEM can better follow
the administration procedures. Third, directions for the start and end of each section
should be more precise, such as “from 10:15 to 10:25”, instead of “about 10 min”.
Last, additional suggestions have been made on the improvement of TEM test
administration. First, it might be better to collect the answer sheets to TEM8 mini-
lecture and TEM4 dictation after the listening part has been completed, since stu-
dents do not want to be interrupted when they are doing listening tasks. Second, it
might be better if TEM8 could be advanced to the end of the third year or the first
semester of the fourth year, so that it would not interfere with students’ intern at the
beginning of the last semester. Moreover, if TEM8 were administered earlier, stu-
dents could get the TEM8 certificate earlier, which may put them at an advantage in
the job market. Third, the administration of TEM4 might be conducted at the end of
the second year, in June, so that it would not interfere with the routine teaching.
Fourth, it is advised that TEM be administered twice a year, and students should be
allowed to take and retake the test as they want, so they would be under less pressure
when they sit for the test.
ranking in terms of composite score and section scores, so that students may have a
better idea of their strengths and weaknesses. In addition, some teachers suggested
reporting the composite score rather than the band of “excellent”, “good” or “pass”,
because it would encourage students to work for a better score, not just to pass the
test. Moreover, those students who narrowly failed the test, getting 59 or 58, would
feel less disappointed.
Several implications concerning teaching and learning for English majors arise
from the participants’ perception of TEM8 washback, their account of the teaching
and learning practice in their schools, as well as the classroom observation
findings.
TEM may provide useful feedback information for course design because it is
advisable that course design at the school level not only follow the Teaching
Syllabus, but also cater to the needs of the students, which can be reflected in their
TEM performance. As shown in Table 5.11, most schools offered courses recom-
mended in the Teaching Syllabus, although TEM related courses were, compara-
tively speaking, more often introduced. Moreover, short-answer question findings
indicate that additional writing and listening courses were introduced in some
schools because the program administrators thought their students might need more
training in order to meet the required language proficiency standards measured in
TEM. The above findings show that TEM8 has provided some useful feedback to
course design—students’ weaknesses were identified so that adequate teaching
activities were arranged. Thus, TEM results could help program administrators in
designing courses that could be better targeted at students’ needs.
Second, TEM preparation courses should, by no means, interfere with regular
teaching. As is shown in Table 5.13, most schools offered TEM preparation, either
as lectures or as a special course, which is, to some extent, necessary, if we want to
keep our students well informed of the test, so that they can better demonstrate their
English proficiency in TEM. However, at the same time we need to be aware that
excessive preparation, especially test-taking technique training, might not only
deprive students of class time for other courses, but also arouse their negative atti-
tude towards the test. Moreover, classroom observation findings show that TEM
preparation courses were more teacher-dominated and Chinese was used more fre-
quently, so it was not beneficial to the development of language proficiency in the
long run. Thus, it is of importance that we maintain a balance between appropriate
test preparation and excessive test preparation.
186 7 Conclusions and Implications
The role of TEM in teaching should be facilitative rather than decisive. As is stated
in the TEM Test Syllabus, the purpose of TEM is to measure the language profi-
ciency of undergraduate English majors, to check whether they have met the
required level specified in the Teaching Syllabus. Therefore, TEM may provide
important feedback on the strengths and weaknesses of the students, so that better
targeted teaching can be organized. In this sense, TEM may influence teaching, but
its influence should be limited. It would put the cart before the horse if teaching
were to be organized with passing TEM as the aim. As is stated by Qi (2004) in her
study on the washback of NMET, if the aim of teaching and learning would be to
achieve higher scores rather than to develop the desired ability, high-stakes tests
were not likely to achieve their intended washback effects. Therefore, it is advisable
that TEM exert influence on teaching in a manner that is both discreet and
positive.
Next, the washback of TEM on classroom teaching was mainly reflected in the
following three aspects: (1) the teaching content was more TEM-oriented, covering
various sections in TEM; (2) the classes were more teacher-centered, with teacher
lecture being the major form of instruction; and (3) Chinese was more often used in
class, both for teachers and for students. In this regard, the immediate aim of TEM
preparation classes was to familiarize students with TEM, rather than to improve
their English proficiency. Therefore, the teaching content and method of TEM prep-
aration classes should not be set up as a model for non-preparation classes. However,
as indicated by classroom observation findings, non-preparation classes were also
teacher-centered, with teacher lecture covering about 70% of the class time (Table
6.12); meanwhile, there was no significant difference in the features of teacher-
student interaction between TEM preparation and non-preparation classes, both of
which were unauthentic. In this view, efforts should be made to improve English
classroom teaching, to enliven the role of students in the classes so that the class
would be more student-centered. Moreover, teachers should be encouraged to
improve their communicative skills and their questioning techniques, so that stu-
dents will get more involved in classroom activities. As a result, the classroom
atmosphere would be more active and lively.
Finally, efforts should be made to arouse students’ intrinsic interest in English
learning. According to questionnaire surveys, around one third of the students con-
sidered passing TEM one of their learning aims (Table 5.51). In addition, some
teachers expressed their worries that students’ motivation for English learning
would descend when the TEM season was over. In this sense, efforts should be
made to enliven the English classes and to improve its practicality, so as to maintain
students’ interest in English learning. It would be desirable if program administra-
tors design the courses catering to the students’ needs, teachers stress the role of
students in class and involve them in classroom learning, and students develop an
intrinsic interest in English so that their enthusiasm for English learning can be
maintained.
Appendices
T: Ok, let’s begin our class. Today, en Unit 7, The Chaser. First we’ll finish the
words and phrases because last time there were three groups who haven’t done this
presentation. So first we will finish the words and phrases. Then we will look at the
text itself. So let me see, the first group, Group One. Ok, I forgot to mention, today,
Ms Xu, is sitting at the behind of our class because she is going to videotape our
class for her research, just for research purpose. Ok, Group One.
(Students explain the meaning of new words one by one.)
S: 这段的第一个单词…
T: OK, can you make the words larger?
S: Oh, 第一个单词出现在第一段中,kitten, 它的意思是小猫,小动物,然后有一
只很可爱的小猫;作为不及物是产小猫。
第二个单词是creaky, 作为及物动词来说是使咯吱咯吱响;作为名词是 creak,
是指发出咯吱咯吱的声音;作为形容词的话它也是首先是咯吱咯吱作响的,然
后第二个意思是腐朽的,破烂的,要坍塌的,在文中creaky stairs就是非常不牢的,
发出咯吱咯吱响的楼梯。
Appendices 217
第三个词是peer,peer作为不及物动词是凝视、细看、盯着看。例句是She
peered at the neighbours from behind the curtain. 还有可以作为隐约出现,微现 ,
比如形容月亮,就是The moon peered through the clouds.作为名词来说peer可以
表示同辈、同事、同龄人、同等地位的人,就是和你有相同职位的人,或者是
贵族和相匹敌的人,在以前的古语里也用作同伴和伙伴。作为及物动词它也
是过去的用法是与什么相等,还是有相同的意思;作为不及物动词是同等、比
得上,用法就是peer with sb. Your brother can peer with him.
然后第四个单词是dim,作为形容词来说它可以表示暗淡的、昏暗的、模糊
的、看不清的、悲观的、怀疑的,就是表示比较消极阴暗的意思。然后作为
及物动词也是使变模糊、使变暗淡、使变朦胧。在文中是on the dim hallway
也是表示昏暗的意思。第二个意思是相比之下黯然失色,比什么更不明显,更
不鲜亮。The light of a candle is dimmed by electric light. 作为不及物动词它可
以表示变暗的、变模糊。When the music started, the lights dimmed. 还是表示
相同的意思,就是不同的用法。
第五个单词是obscure,作为形容词它的意思是不清楚的、不分明的,表示它
的意思表达不清,或是作为语言文体,可以表示含糊的、晦涩的、不清晰的。
第三个意思也可以表示不显著的、不引人注目的。然后第四个是无名的、微
贱的。当然也可以表示偏僻的、隐藏的。就是他们这些意思都相互联系的,
可以意会,都差不多。然后作为及物动词来说,它可以表示掩盖、遮盖。The
moon was obscured by clouds. The aims were obscured by means.都是表示被遮
盖、隐藏起来,被掩盖起来的。然后第二个意思就是使含混,使转弯抹角、使
难以理解。The item he intentionally obscured would make several interpretations
possible. 就是也是表示使朦胧、使含混的意思。然后第三个意思是使黑暗、
使失色、使不显著,都是作为它形容词用法的动词。它的副词是obscurely,是
费解地、晦涩的,隐蔽的。名词可以是obscurity, obscureness, 都是很常见的名
词变形。
然后第六个单词是buff, 它可以表示浅黄色的、暗黄色的、米色的、橘黄
色的,也可以表示软皮,作为及物动词它可以表示由软皮摩擦的,缓冲、擦亮、
抛光某物。在文中这个应该就是一个形容颜色的。
第七个是大家以前学过的acquaintance。作为名词它可以表示相识、认
识、熟悉,一般是用作acquaintance with someone,或是熟悉、了解、知
道,acquaintance with sb. 也可以表示相识的人们、熟人, 最重要的是Make
acquaintance with sb. 它的动词是acquaint,使了解、使知道、使熟悉, 告知、通
知可以用Acquaint with:You must acquaint with your new duties. 它的名词是
acquaintanceship, 这是表示认识、相识、交往关系,是一个抽象的表示。
第八个单词是deal,deal作为不及物动词可以表示论述、谈及、讨论、谈
论、谈到,通常与with或in连用,比如说Science deals with facts. 科学是论述客观
事实的。第二个可以表示惩处、采取行动,也是通常与with 或in连用,Law
courts must deal with lawbreakers. 第三个是表示平等待人,也是与with连用。第
四个就是我们最常用的to deal with a problem. 第五个就是交易、经营、与什
么做生意,其实也是处理问题的另一种引申的意思 to deal with a firm.。然后作
为及物动词,很常见的是分配,分发。然后在游戏中打牌就是发牌。第三个就
是经营的意思 to deal grapes to a supermarket.
第九个单词就是laxative, 它是表示泻药,然后作为形容词就是通便的。
单词就是这些,谢谢!
218 Appendices
means whether you can buy your sizes of swimming suit or not in this shop? No,
yeah.
Ok, next, perceive, because in our text, imperceptible, you have already men-
tioned it.
Beverage, a hot or cold drink. And I remember someone mentioned beverage,
refers to a drink, which actually means any drink that is not water. I remember you
mentioned that.
Or a phrase that or it’s or that’s just as well. Do you know this phrase? This
phrase is in paragraph 11, the first sentence. “Probably it’s just as well,” said the old
man. Because Alan says, I want nothing of that sort, so the old man answers
“Probably it’s just as well”. Do you know this phrase? This is often used in spoken
English. It is used to say that it is fortunate that something is true or happen, because
otherwise there would be problems. 正好,幸好,刚好。“Probably it is just as well”
means 也许这样更好,你不想买,这样更好。For example, two examples. “It just
as well that we have prepared everything beforehand.” It’s very fortunate that we
have prepared everything in advance. Then the next, a dialogue: A says, “We’re too
late to see the film.” B answers, “Just as well, I hear it isn’t very good.” Just as well
没关系,没什么。
Ok, oblige, this is very formal. You have already mentioned. Well, please pay
attention to the third and fourth one. And here are two expressions we use quite
often, especially on formal occasions.” I’d be obliged if…” It is a polite way to
make a request. “I’d be obliged if…” For example, I’d be obliged if you treat this
letter as strictly confidential. It means…? I will be very grateful if you don’t tell this
letter to anyone else. I’ll be very grateful. Another, I’m much obliged to you. This is
a bit old fashioned. It is used to thank someone very politely.
Ok, that substitute. Last time you’ve already mentioned the word substitute. Here
I would like to emphasize this phrase. Substitute A for B. Pay attention; let’s look at
this example first. The recipe says that you can substitute honey for sugar. Its
Chinese translation 用蜂蜜蜜代替食糖。So you should pay attention to this honey
and sugar, where to put something to be replaced, and where to put this thing, and
this thing which is used to replace something else. So substitute honey for the sugar
means to use honey to replace the sugar. Do not confuse this…this phrase.
Then scorn, you may have already known, scorn means contempt, scorn for.
Rapture, this word is quite literary. Rapture, it means great excitement, happi-
ness. For example, the boys gazed up at him in rapture. And the phrase be in rap-
tures or go into raptures, especially used in British English, is very formal, means to
express or feel great pleasure and happiness about something. Be in raptures or go
into raptures over or about or at sth. or sb. And its adjective, rapturous, very happy
and excited.
Overwhelm, here in our text, if somebody is overwhelmed by an emotion. They
feel it so strongly that they cannot think clearly. We have “overwhelmed” and “over-
whelming”, two adjectives.
Well, another phrase, by any chance, you didn’t mention. By any chance is in
Paragraph 35, the second line. And, by the way, since there are always sirens, if by
any chance you should, later on, slip a little, you need not worry.” Used in spoken
Appendices 221
Ok, then we’ll look at the text, The Chaser. This time we will deal with it in a
different way. In the past, I ask you to answer my questions, just prepare the text
comprehension part before the class.
Today I will give you 6 questions, just in class. The 6 questions are here. Can you
see them clearly? Altogether 6 questions. We have 6 groups here, and here I have 6
pieces of paper, with a number on each paper. So each group will pick up one piece,
and the number in it means the number of the question you will answer. I want you
to have a discussion, try to answer my question. Then after this discussion. I’m not
sure how much time is enough for you. So let’s just go and see. Then after the dis-
cussion, each group will have to answer these questions. Each group can send a
representative, to come here and get a paper?
OK, first, let’s just go through the question list. Actually these questions cover
the whole story.
So the first question, “What does the title The Chaser refer to?”
Two, “Why do you think the old man told Austen about the life cleaner before
selling the love potion?” Because he talked a lot about his life cleaner. So why does
he mentioned or introduced life cleaner in detail before he sold him the love potion.
The third questions, “Why is the love potion sold so cheap and the life cleaner so
expensive?” How does it relate to the theme of the story? Or put it in another way,
it means do they have any symbolic meanings? Love potion sold so cheap while the
life cleaner so expensive.
The fourth question, “What kind of love can the love potion produce in a per-
son?” Please pay attention, attention please, I hope you can describe the effect of
love potion in your own words. Do not simply read the part in the text. Try to sum-
marize it and describe it in your own words. And then what do you think of this kind
of love? So make your own comments on it.
And the fifth one, “How do the young man and the old man differ in their char-
acters and their outlook of love?” Outlook of love means, how to put it, I mean, what
do they think of love. Maybe translate into Chinese, means 爱情观。Their attitude
towards love. And how does the author makes this difference clear to the reader?
Here how does the author make this difference clear to the reader refers to what
technique that the author uses to make this difference clear to the reader.
The last question, six, “It is said that the story is a satire, do you agree? Why or
why not?” So, I mean, the six questions are all essay questions, which means it can’t
be answered by one or two sentences. You need to think about them over.
So you can form groups and have a discussion. Or whether you need to read
through the text carefully before you start your discussion. I mean think about it.
Do you have any questions? If you are still not clear about how to answer the
question, you can just ask.
(The students discuss in groups and the teacher walks from group to group to talk
to the students. 25 minutes)
T: OK, have you all finished the discussion? Are you ready to answer the ques-
tions? Yes or no?
Ss: No.
Appendices 223
T: (wait a while) Ok, so can we answer the questions? Just tell me, are you ready?
Ss: Yes.
T: Ok, and the rest of you? Are you ready?
Actually we can still discuss together when we are trying to answer these ques-
tions. And, I mean, after each group answers these questions, and the other group, if
you have different answers, if you want to argue, you can also talk.
Ok, the first question, what does the title The Chaser refer to? Maybe our first…I
mean, the immediate reaction, means chaser refers to a person, “er” usually refers
to a person. Ok, let’s see. Ok, that group, you may come up here, so we can hear
better.
S: First of all, I think the chaser refers to Alan Austen, who is deeply in love with
the beautiful girl. And the second, I think the chaser means the life cleaner because
the chaser in the vocabulary means a milder drink taken after a strong drink of
liquor. So if the love potion is the strong drink of liquor, then I think Alan Austen
needs the life cleaner to clean the love potion.
T: Do you all agree? So, first the chaser refers to Alan Austen.
S: Because he is in love with a beautiful girl.
T: So actually he is in pursuit of a girl. In that sense he is a chaser. While the chaser,
according to our word list, so please read on page 118. A chaser can mean a milder
drink taken after a strong drink of liquor. So it refers to some kind of drink taken
after something else is taken. So here the chaser refers the life cleaner.
Do you agree? Because the old man believes Alan some day will come back
again for this life cleaner, so the chaser was the life cleaner. Do you agree? Can you
think of any other indication of a chaser? No, OK, thank you. Yeah, most probably,
you get the chaser as the life cleaner.
T: Now, the second question “Why do you think the old man told Austen about the
life cleaner before selling the love potion?” Have you all got the answer?
S: I’ll divide the question into two parts. The first part is why do we think the old
man told Austen about the life cleaner? First, we think the things done firstly will
not get an agreeable result. Maybe after the girl take the love potion mentioned in
the chapter, maybe she will change into another different person.
At that time, for myself opinion, I don’t think he will like her as before. As my
partner’s opinion, it’s human’s nature. If you really get sth., that you are eager for a
long time, you won’t consider it as valuable as before. So maybe the young man
wouldn’t like the girl as before.
The second question is why do the old man told him before selling him the love
potion. As the ordinary transcript? 童话故事剧本?
224 Appendices
T: Script.
S: As the script of most of the fairy tales or novels. There is the magician gives the
man some medicines that have some very bad results? 副作用?
T: Side effects.
S: Yeah, very bad side effects. But they often tell the man after giving them the
medicine. So I think the old man told him before he sold him the love potion,
because he wants him to pay more attention. We think the true purpose is to get the
5000 dollars.
T: What do you think? Actually, the first part of your answer is not very clear. You
said, it’s natural that people will not cherish sth. or someone after they have got it or
the person.
S: Yes, that’s my partner’s opinion, because they think the chaser means the young
man. And after he get the girl’s love, he won’t be a chaser again.
T: How would he?
S: That’s the reason.
T: Then how does it relate to the love potion and the life cleaner? The question is
why the old man told the young man about the life cleaner before selling the love
potion.
S: She means if the girl takes the love potion, the young man got the love and won’t
cherish her again, and want to get rid of the love. And he’ll need the life cleaner to
get rid of the girl.
T: And before means he really wants the ordinary ones to sell the life cleaner.
S: He wants to pay more attention to the life cleaner.
T: Do you all agree, the rest of you? Yes, it’s because his true purpose is to sell the
life cleaner. Actually, we can infer from the story, that Alan is not the first person to
come to the old man. There have been a lot of young people like Alan who are fer-
vently in love and try to get the love potion. So this old man has already seen a lot
of such young people. He knows from his experience that some day Alan will come
back for sure. That’s why he mentioned this life cleaner first, because he’s sure that
some day Alan will come back for it. This is his real purpose.
T: Actually the third question is quite closely related to the second one. “Why is
the love potion sold so cheap? why the life cleaner so expensive?” How does it
relate to the theme of the story or whether they have any symbolic meaning?
S: The first question is from paragraph 39 to 43. It implies that love is far from
being precious and desirable. It is easy for them to fall in love but it is hard for them
to escape it. He should regret some day he would have to pay a much higher price
to put himself out of it. So for the customers, the life cleaner is more important than
the love potion. And the second answer is to the young man, the love means the love
Appendices 225
of a lover. And I think, to anyone it is easy to fall in love, but hard to save it. And we
should cherish it and always pay much attention to it.
T: Do you got it? Do you understand her? Yes, Ok, thank you.
The love potion is sold so cheap because according to the old man in our story,
who says, because young man who need it are seldom rich. So if he does not sell it
so cheap, they will not come back later for something more expensive. So this is a
sales technique.
Or if you relate it to the theme. Any of you have something to say?
Liang has already given us a quite good version. Why one is so cheap while the
other so expensive if relate to the theme?
Here the young man buys the love potion, it means at the same time he gets the
love from the girl. So in a sense, he is trying to buy love. If love can be bought, it’s
cheap. Love got in this way is not so dear. It is cheap, because it can be bought. It’s
not priceless. Why life cleaner so expensive? Because you have the strong idea that
in the future you have to pay a heavy price. That’s why life cleaner is so expensive.
It is related to the theme, its symbolic meaning.
Four, “what kind of love can love potion give a person?”
S: For question 4, if the girl takes this love potion, she would regard the man as the
whole thing of her life. She will care of this man every time, every hour, even every
minute. She may be jealousy of other women to have some relation with her boy-
friend. And even if the man have some affairs with other women, she will be hurted,
but she will forgive him. They will not be divorce. That’s the effect brought by this
love potion. We think this kind of love may be not look rightly like love. You know
love may need more space and maybe trust. And they get too close, the girl care too
much of the man. He would finally be bored of this girl. And they would finally get
departed.
T: So what a coincidence that this question is got by the three young men. Because
in our story Alan is also a young man, and I’m curious about their comment on this
kind of love. So you don’t like this kind of love, this kind of girl.
OK, actually they have made quite thorough summary. However, I don’t think
you have included everything. It’s already time, we’ll continue this question to fin-
ish them next time. And the rest of you please will you think about it? Would you
like a love potion?
Please read the part from paragraph 17 to 37, and next time I will ask some of
you “do you have anything to add about the effect of the love potion”.
T: So everybody, please… Well, we just start our lecture right now. And would you
please keep quiet? And I hope that the students sitting in the back row, would you
please move forward a little bit, so that, yeah, so that I can see your faces. And last
226 Appendices
time, we said from this month we are going to say a little bit about… we’ll just have
lectures on linguistics and so-called general knowledge. So general knowledge will
also be divided into three parts. First of all, we’ll just address the issue of linguistics.
Afterwards, we need to… we also need to say something about, let me see, the cul-
tural part… I mean the general knowledge of the English-speaking countries.
Afterwards, we also need to learn English literature, but of course, we will not
review all of the concepts in literature, but at least we need to know some of the
important concepts and notions. And today, we should say, I’ll first address linguis-
tics. Of course you can take notes, and if you really don’t want to take any notes, it
doesn’t matter. I’ll just upload my ppt. I have already uploaded the key to error cor-
rection and proofreading, and I wonder if anybody checked the key. Did anybody
check that?
Ss: No.
T: No? I didn’t change my mail box, so we will still use the public mail-box we
used last semester. Do you still remember that account and password?
Ss: (Shaking heads.)
T: Ah, nobody remembers that. The account and the password, I hope that you can
check the notes for last semester. And I have already uploaded the keys to that
proofreading part and please check it. And today we will start with the linguistics
part. And tomorrow we will just shift back to that classroom, but that classroom is
not equipped with mass media, so tomorrow we will just say, go on with our work
sheets. And somebody still didn’t get the work sheet, I hope that you can contact
your monitor. Do you have for some extra copies, XXX (the name of a student) ?
Have you got any extra copies? Oh yeah?
S: No.
T: No? So, see if I can get you one.
S: I’ll see to it.
T: And…OK, fine.
And you’d better turn to your work sheet B; in work sheets B you will find three
parts. The first part is listening comprehension. So Listening Comprehension will
be postponed to November.
So this month we will be focused on the part of general knowledge and transla-
tion. So first of all, tomorrow… we Will check two translations, and please do prac-
tice test 1 and 2. OK? In your work sheet A. In your work sheet A, you can see there
are eight tests, altogether eight tests. So I hope that you can finish test 1 and test 2.
You will just leave Listening Comprehension part, and please fulfill the second part
general knowledge and also the translation. But make Chinese-English translation,
Practice 1 and Chinese-English translation. 《文化不限于语言那么简单》, this
one. And Practice 3, so tomorrow I will just check these two practice tests,
OK. Practice 1 and Practice 3. Are you clear? Practice 1 and Practice 3. I mean
Appendices 227
especially the Chinese-English translation part. I need to check it. So I hope that you
can fulfill that two translations today.
Now we will just go on with our lecture.
It’s a pretty long PPT, a pretty long file… Of course, I don’t think I have time to,
you know, elaborate on each slide, because this file actually covers all the contents,
all the contents in linguistics. And these were the reference books and I guess you
are familiar with these two books, right? 何兆熊’s book and 胡壮麟’s book.
Actually… you did linguistics… Ah sorry, sorry, well…(The teacher forgot to turn
on the head projector.)
Well, which one is the textbook for your linguistics course? Which one is your
textbook? 《新编简明英语语言学教程》,是不是?Ah, so that. That one is your
textbook.
For the general knowledge, actually I scanned some of the test items from 2010
to 2012, and some of the items are really difficult, so I hope that if… Of course, if
time permits, you will still go over all of the basic concepts in the realm of linguis-
tics. And here today I’ll go through the basic concepts in the sub branches, such as
semantics and pragmatics and well and… for TEM 8. Of course we still need to be
very patient. It takes two or three minutes.
Thank God, it works. Pay attention to the reference books, and here, three refer-
ence books, but our focus is still this one 《简明语言学教程》。 And it should be
your textbook right? It is your course textbook. So we will go over some basic
concepts.
And first of all, what is language, because some of the test items also address the
concept of language. So what are the basic concepts of language? Do you know
anything about it? What are basic concepts and what are the features of language?
Ss: (murmuring…)
T: Features of language?
Ss: Shaking heads.
T: No idea? You’ve heard about it?
So language can be generally defined as a system of arbitrary vocal symbols used
for human communication. Here some of the key words, so which one is the first key
word? Arbitrary. And please remember all these words. Sometimes they will test
you what are the basic features of language. So arbitrary is the first feature. What is
the meaning of arbitrary?
Ss: 任意的。
T: And then vocal symbols, this is easy to comprehend. So usually language is
used orally, right? For instance, I am now delivering a lecture for the information
conveyed orally. Let us say the vocal system, right? It is the oral system. The vocal
system for human communication. So language is for communication. This is the
reason why now we always say conduct… a lot of educators just promoted the…
communicative approach of language teaching, right? So language is for the pur-
228 Appendices
guages. I mean to use a different word, but there must be some association between
the word and the object. So that is what it refers to. So it should be symbolic.
And it is primarily vocal. So the primary medium is sounds for all languages;
writing system came much later than spoken form. And we all know that in Chinese
history, can be date back to 3300BC. Why? Because we have a long history of writ-
ten records. And you know the first records is called oracles. We carved something
on those turtle shells and bones. So written language can be dated back to 3300BC. So
this is why people, those archeologists, say China has a history of more than 3000
years. Well, of course, there will be more evidence to show that China may have a
longer history of written language. But anyway, vocal system usually comes first. It
is prior to the written system. So language is primarily vocal.
And language is human-specific. So it is different from the human communica-
tion systems than other forms of life possess, e.g. bird songs, bee dance, animal
cries. So it designs and defines human language. So these are the keywords. And I
hope that you can memorize them…Ah I don’t mean that you memorize them by
rote, but at least be familiar with those terms after the lecture… What are the
ARBITRARINESS and PRODUCTIVITY, CREATIVITY. So that we can learn
English language, because you always…say learn something by heart, and always
need to create sentences. Also creativity is also one of the basic features.
And Duality, so what is the meaning of duality? So let’s have a look. I’ll not go
on with this one. SO productivity. So human languages are always productive and
creative. So here are two examples…A gibbon call system… A gibbon?那个长臂
猿不是经常会叫嘛,有一首诗什么诗里面什么……什么……猿啼啊……什
么……它会叫……那么这个叫声呢,是什么样的呢?So it is a fixed repertoire
which is rapidly exhausted. So this one is exhaustible. It is a fixed system. It is a
closed system. But it is not an open system. But human language is an open system.
So that makes a great difference. And bee dance is also limited. It is also limited in
productivity. Only human language is productive and creative. But of course you
cannot be too creative. Especially in the foreign language learners you must obey
grammatical rules and you also need to obey the morphological rules because some
times Chinese students, you would like to coin a new word. But this is really not
advisable because it might result in some deduction of your score in TEM8 writing.
So I hope that you will avoid the coinage of new words.
Next is Duality. So double articulation. So what is duality? Duality refers to the
lower level and higher level of the linguistic system, the language system. So the
lower level, the sounds… when you listen to a foreign language, it might not mean
anything to you. So the sounds, the basic levels of language. And then high level.
High level means larger units of meaning. So high level is meaning and of course
we need to know the meaning of those words, we need to know. You first listen to
those syllables and then you address the different syllables to different word units,
then meaning. Right? And then, afterwards, the larger units of meaning would make
sense to you. Otherwise, language doesn’t make sense to you. So language is also
the system of duality. It is a communication system with duality. It is considered
more flexible. Yeah, far greater number of messages can be sent. A small number of
sounds can be grouped and regrouped in a large number of units of meaning, and the
230 Appendices
units of meaning can be arranged and rearranged into an infinite number of sen-
tences. So that is duality. Did you get it?
Last semester we also talked about the important notions and corpus linguists
and I asked you to remember the collocations. And usually those corpus linguists,
they emphasize larger units of words and phrases. Before we always use words or
isolated words. And Chinese students also memorize the isolated words. And after-
wards the corpus linguists, they turn the larger units of words, the word items. They
would like to ask students to not only memorize, but at least you should be familiar
with the collocations and items, and yeah, remember them. That would be really
useful in your writing, especially when I am grading on your writings, I really found
some students, well, they witness their great improvement especially in terms of
collocation. So last semester of learning is still very effective. And some student
also checked, I mean, you also referred to them to the corpus and then find those
useful collocations.
But grammar is still a big issue. A lot of students still make very simple gram-
matical mistakes. E.g. the subject-predicate concord…就是这个主谓一致。我们
以前改错的时候已经讲过很多遍了。那个主谓一致啊,它的什么宾语呀,还有
什么什么物主代词啊,你都得发生变化!但是很多时候这个“一致”还是做得很
差……经常这个语法错误还是会出现。那么我们说这个专八在改卷子的过程
当中,中国老师还是会非常注重你们这个语法。如果语法好的话,一般分数肯
定要高一些。所以尽量避免这些大错。还有一些同学犯一些比较小的错误,
那么肯定就是你没有最后做检查,啊,最后5分钟一定要做一个检查,最后哪怕
时间再紧,哪怕你Conclusion写得短一点,但是5分钟做检查是必须的。恩,因为
一遍下来不可能不犯什么语法错误,然后检查一遍,相对来说好一些,这样子的
话得分率也会高一些。我们还是比较强调语法的。 Ah of course content is also
a major issue but to Chinese teachers I don’t think that content makes any sense. Of
course they will see if you delivered very creative ideas, but if your composition is
full of mistakes, I don’t think they will give you a good score. So remember that…
ah… grammar still goes first.
And then, displacement. This might be a new word to you. So what is displace-
ment? What does displacement mean? Language can be used to refer to things,
which are not present: real or maybe it also can refer to the imagined matters in
the… Here are some examples.
A gibbon never utters a call about something he ate last year. No, because a gib-
bon’s call system… of course, is rather limited. So to say exhaustible… right? It’s
limited system, and they will not refer to a call to…ah… something they ate… ah
last year. But language is different. We can refer to something we will do in future.
We can also move back to our history, to use the simple past tense to refer to some-
thing we did last year, or maybe… ah… in the previous time. So it is called dis-
placement. Yeah!
There is something special about the bee dance. And bee dance is something
similar to gibbon calls. I will not repeat it. But anyway it shows that human lan-
guage is more creative. We can refer to something in our imagination, like dragon.
Dragon is never a real animal, but we also use dragon as our cultural symbol. You
see… ah, it is also an example of it. It is an example of displacement.
Appendices 231
define a language? So, in terms of pragmatic rules, they are descriptive or prescrip-
tive? 像语用学的一些,对,就是曾说是descriptive. 我们不能称之为prescriptive。
啊这个是有一点深度的。 Prescriptive的话呢,你可以区分它的错和对,我们说
语法它是这个叫做prescriptive, 那它是规范性的。你可以知道,你说I is,那肯定
是错的,对吧,肯定是错的,那么这个是prescriptive。但是如果我们讲到这个语
用,啊,语用学,讲合作原则。那么这些我们只能说是descriptive。 You can never
exhaust, you can suppose that you employ the Grice’s theory that… ah… so called
cooperative principle as frame work or research. You can always find those extra
examples that violate all those principles. So you can never exhaust your examples
in your framework. So the framework can never cover a language. So, it is usually
descriptive. We describe the language phenomena and finally draw some conclu-
sions. And those conclusions are so called principles. But those principle can never
exhaust concrete examples.所以这个descriptive和prescriptive最大的区别,一个就
是规范性的,它能够说明什么是对的,什么是错的。但是语用学的一些东西,它
只能说是descriptive。我说violate the cooperative principle, 啊,那也是可以的。
啊,你像中国的说话都是绕着弯说,那绕着弯说,那肯定是violate the cooperative
principle. If we conform to cooperative principle, we use, of course, very precise
and very direct language. But no, in some times, in some situations, we will never
use direct language, for the sake of our cultural conventions. Right? So this shows
these examples violate the principles. So those principles are never prescriptive,
they are descriptive.
So Synchronic vs. diachronic.
So Synchronic study is description of a language at some point of time. But
Diachronic study is the… I didn’t remember any students who did diachronic study.
Usually they will just do synchronic study, because diachronic study will take a long
time. So usually diachronic study will take 2 to 3 years. So there is too much time
consuming. We also do… we will not suggest that. But synchronic study is easier.
Ah, Speech and writing.
这个就没什么好说了,啊,那这个一定要记住的,这个经常考的Langue vs
parole。这个是那个Saussure,Saussure是那个就是语言学之父啦,现代语言学之
父。那么这个人的名字要记住的,啊。这个无论如何要记住。现代语言学之
父哦。
And he used to be called the father, the grandfather of modern linguistics and this
father of modern linguistics and never wrote a book. Actually the book was com-
piled by his students. The students compiled all of his lecture notes. And these lec-
ture notes are compiled into book. 所以 he didn’t write a book, but at least he is the
most important linguists. One of the most important linguists, ah.
Langue refers to the abstract linguistic system shared by all members of the
speech community.那么它指的是语言的形式,语言的形式,啊,Langue, the
abstract linguistic system,啊,那么我们把它分为语言和言语,那么有的人把它翻
译成语言和言语。那么所谓语言和言语,它的区别……语言呢,它指的就是一
个abstract system, abstract linguistic system, but parole is different. Parole is the
actualization, the realization of that linguistics system. 所以就是把它分为语言和
言语,这个一定要记住的,啊,这个肯定会考的,啊,反正就是,还有就是这个人的
名字肯定要记住。And… OK.
Appendices 235
This is also called something between the American sound system and British sound
system. And different accents represent the different intonation. And we call that
American intonation as a staircase intonation, because the Americans usually use
very vivid intonations, but British, they do not use vivid intonations. So that also
makes difference between RP (Received Pronunciation ) and general American.
Americans are very lazy to move their lips. So you can hardly read their lips, because
the vowels produce in a somewhat backward way…and they seldom move their
lips. So we can hardly read their… American’s lips. So we usually mock at
Americans because they are lazy, right? But there must be variations of intonation.
就是美音的话,它的这intonation,它的语调,它的起伏很大,所以我们把它称之为
staircase intonation。那么如果你想要了解,这个美因和英音的区别的话,那么,
就是说,你可以再去看一下,始终它的这个语调,是不一样的。那么我们这个语
音就讲到这里了。
然后就是这个Morphology,Morphology呢,就是把我们有一些……额……具
体的有一些……那个……就是说专有名词啊,因为也考到过,morphology也考
到过,我们看一下它有些专有名词是不一样的。那么这个我们是应该知道的,
Open class word and closed class word. So what kind of words could be categorized
as open class words,恩,XX,please? What kind of words could be categorized as
closed class words? Closed calls words… open class words.
S: Open class words is that you can add new words.
T: Such as a… what? Nouns. Right? Nouns, adjectives. They were called the con-
tent words, ah, open class words. But what kind of words are called closed class
words, for example, articles, a, an, the… are these words open class words? Or
closed class words? Are they closed or open?
S: Yes?
T: 还能加伐?你说我冠词再加一个上去?我再加一个b上去,有吗?没有的是伐?
那么这个是称之为closed class。啊,那么考细一点的话,也可以考的,啊,我说
what are the following choices are closed class words,那也可以考的是伐?啊,which
of the following, except一个横杠 is the so called open class words. Right?
So Morpheme is the minimal unit of meaning.……like one morpheme。这
个……额,大家应该有数的哦?一个语素,是伐,a boy is a one morpheme word. But
boys… boys one morpheme or two morphemes? Boys, one morpheme or two
morphemes?Two morphemes, right? Ah, so three morphemes and four morphemes,
like gentlemanliness. So they mean something like a gentle… and man… they mean
something, gentleman. It’s a combined word. And –ly, ly represents what? So ly
usually represents the part of speech, it is the adverb. But here gentlemanly is an
adjective. Because an plus ly, it could be an adjective and -ness it is definitely a mark
of noun, right? So Morpheme is the minimal unit of meaning.
Affix, prefix 什么意思啊?prefix,前缀。那么suffix呢?后缀,啊,这个没什么问
题。
Free morpheme & bound morpheme.
Free morpheme is one that may constitute a word in free form… in free form.
But a bound morpheme means those morphemes cannot appear in single words.
Appendices 239
Can they appear in single words? Like –ly or -s or -ness can they appear as single
words? Independent words. No, never, ah, they can never appear as single words. So
they usually appear a… what? They usually appear with those free morphemes. So
bound morpheme just refers to one that may appear with at least one other mor-
pheme. They can not stand by themselves. Ah, like this -s… ah, -ness…ed… ly…
这些,那么都叫做bound morphemes. Because they cannot appear independently.
So they must appear together with other morphemes.
Derivational morphemes. Usually we say… ah, foolish is derived from fool. So
these are called derivational morphemes.
So Inflectional morphemes are lexical… they have lexical. Their morphemes
which are for the most part purely grammatical markers, signifying such concepts
as tense, number, case and so on. So actually inflectional morphemes include what?
Ah, include number. Ah -s… ah plural s… or aspect. Right? Or case. Or tense…
-ed… so these are called inflectional morphemes. 这个,我们morphemes 就讲到这
里。
那么syntax, syntax, 我们还是稍微过一遍。那么最低的一个层面,我们从最
核心的phonetics, phonology开始讲,然后呢,是什么?上面一个层次就是什么?
啊,叫什么?恩,词法叫什么?morphology. 然后呢,词法在上面的就是什么?句
子。Study of sentences. The study of sentences is called syntax. So the adjective
form is syntactic. It is a branch of linguistics to study how words are combined to
form sentences, and rules that govern the formation of sentences. Chomsky says…
well Chomsky of course studies syntax. And he is the first one that initiated XP
analysis, right? So he is more interested in similarities and surface structure, deep
structure. So I guess you are familiar with these terms? Right? And Chomsky is
more interested in similarities. More than their differences. So Chomsky proposes
that grammars of all languages share the same framework. So that is called universal
grammar. And nowadays, with that new notion, because of that new notion of
Context. This context is... So it seems Chomsky’s universal grammar doesn’t have
very large market now. But of course, traditionally in America, lots of linguists still
study the universal grammar, but of course, Chomsky’s own students also betrayed
him, right? They found that a new sub branch, a new branch called Cognitive
Linguistics. But anyway, Chomsky proposes that he is more interested in the univer-
sal features. The similarities between different languages, so he proposes different
grammar, different linguistics share the same grammatical rules of the same gram-
matical framework. And that framework is called universal grammar.
Eh, well, so here are some important notions. Like category. Category refers to a
group of linguistic items which fulfill the same or similar functions in a particular
language such as a sentence, a noun phrase or a verb. The most central categories to
the syntactic study are the word-level categories, for example, part of speech is a
category. The criterion on which the category is determined, meaning, Inflection,
Distribution and the most reliable criterion on which determining a word’s category
is its distribution. Distribution of the place. Of…Yeah… the place of that word.
Coordinative rule. Coordinative sentences. Are you familiar with the coordina-
tive sentences? Coordinative sentences are also called compound sentences. They
are different from the complex sentences. And would you please translate some-
240 Appendices
thing into Chinese? What are called complex sentences and what are called com-
pound or coordinate sentences? E.g. and, but… sentences with and or but are called
coordinative sentences or complex sentences. Are they compound sentences? XX,
are they compound sentences?
S: En..
T: They are compound sentences, coordinative sentences. 就是并列的,并列的,这
种句子,这种从句的形式。Coordinative sentences. Complex sentences 就不一样
了,就是主从复合句了。Ah, e.g the attributive clauses. Sentences with attributive
clauses are definitely called complex sentences.
So, OK, we will just skip that one. Let’s just… Also very detailed… very detailed
rules in syntax. I hope that you can review them yourselves and phrase and
elements…
And are also called specifier head…
And Complements…specifiers…
Of course they are of semantic functions
Semantically, specifiers make more precise the meaning of the head; syntacti-
cally, they typically mark a phrase boundary. Specifiers can be determiners as in
NP. NP means nominal phrases, VP, verb phrases.
Complements, so… Complements themselves can be a phrase, they provide
information they provide complementary information about entities and locations
whose existence is implied by the meaning of the head, e.g. a story about a senti-
mental girl; There can be no complement, one complement, or more than one com-
plement in a phrase. So a story about a sentimental girl. Of course story is the head,
about the sentimental girl, the complement.
And modifiers, so modifiers could be pre modifiers and post modifiers.
Deep structure & surface structure, 这个是肯定要记住的,这个是肯定要记住
的,啊深层结构、表层结构,因为这个是句法里面经常讲的,然后要你们画图,画
VP,就是说什么VP、NP画出来,对吧,那么来区分它这个deep structure, surface
structure… ah… sometimes they share the same surface structure, but actually
when you analyze those sentences they had different deep structures. And here:
John is easy to please.
John is eager to please.
They have the same surface structure. Right? They have the same surface struc-
ture, but actually the meanings are different. The deep structure is different. So
structurally similar sentences might be very different in their meanings, for they
have quite different deep structures. Like these were also the frequent quoted exam-
ples to determine their different structures. Consider the sentence, flying planes can
be dangerous. So flying planes can be dangerous here, flying planes can be… what?
The deep structure, the meaning could be different in different complex, like flying
plane. For instance, in one case, it means you are engaged… you are engaged in this
dangerous activity. So flying planes, here fly… this is definitely a verb. A verb. So
here flying is verb. And plane is the object of the verb. Right? So flying planes
Appendices 241
means you are involved in that dangerous activity. But flying planes flying could
also be a pre-modifier that modifies plane, and then flying planes could be an NP. A
non-verb phrase. So flying planes could be dangerous. Because flying planes is an
object that is not stable. And it might collide with some high building, like 911, so
the flying planes… may collide with the high rising buildings…
So you see this sentence actually involves different deep structures.
And… 这个就不讲了哦,实在讲起来也特别地,这个枯燥。反正这个是……
句法里面最重要的就是,这个啊,surface structure, deep structure,反正你们去过
一遍吧,一些基础的一些理论再去过一遍。
那么semantics也经常要考到的,啊,也经常要考到的,比方说,比如说像这个,
有一年就考到了,啊,有一年就考到了,那么我们来复习一下它是什么意思。我
来讲讲看它是啥意思,啊,它叫这个Synonymy,啊,Synonymy。Synonym, we all
know synonym, antonym, right? So what is the meaning of synonym, antonym?
And Homonymy and Hyponymy, would you please translate that into Chinese? Can
you please translate that into Chinese?
啊,XXX, please, so what are synonyms?
S: 同义。
T: 哎,同义,啊,同义. Like small and tiny, they are synonyms. And what are
antonyms?
S: 反义。
T: Like black and white, they are antonyms. So 反义关系, Polysemy?
S: En…
T: 就是一词多义,一词多义。Polysemy. And… then Homonymy, 同音异义,也
属于这个Homonymy。Ah, that’s right, 那么这个Hyponymy呢?Hyponymy是怎
么样的一个结构啊?啊,怎么样一个关系啊?Hyponymy,哎上下义关系,那么我们
说,比如说colour,是个上位词,对吧?那么下位词可以有很多了,包括red, yellow,
green啊,那么这个是上下义的关系,叫Hyponymy。那么这个是考过的,是考到
过的,所以你还是要记住它。它是什么意思你还是要记住的。So they are called
the major sentence relations, because they are part of the central part of semantics
Synonymy, right? Synonymy refers to the sameness or close similarity of mean-
ing. Words that are close in meaning… they are called synonyms.
Dialectal synonyms are synonyms used in different regional dialects, like
autumn - fall, biscuit – cracker… so they are called dialectal. They are derived
from… this dialectal is derived from the word dialect. I’m sure that you know dia-
lect… Shanghai dialect, other dialects? So that means synonyms used in different
regions. For instance, the British language, the American language, right? Ah
American English, British English… so some words we use in British English and
some words we use in American English, like fall is more frequently used in
American English and gasoline…
242 Appendices
And… So because they are synonyms. They are definitely synonyms. That means
synonymous to each other, but they are used in different regions… ah different
words are used in different regions, but actually they mean the same thing.
Ah, Stylistic synonyms,这个考到过了,啊,Stylistic synonyms考到过了。就是
说,这个. kid, child, offspring,那么为什么说它是这个Stylistic difference啊?为什
么啊?Well because they are stylistic. So they differ in terms of what? They differ in
terms of what?
S: En…
T: 恩?Why are they called stylistic synonyms? 啊,XXX please, why are they called
stylistic synonyms?
S: En…
T: They means the same, right! Offspring, or my offspring… my kid, my child,
they means the same, but they differ in terms of their style So they differ in terms of
their… what? Which one sounds more far? Which one sounds more far? Offspring
or kid?
S: Offspring.
T: Yes, offspring, so they differ in terms of formality. So this is called stylistic. Ah
difference.
那么我们下次课的话, 下次lecture的话结束掉,然后再将一部分那个文化方
面的。反正我们争取在三堂lecture 里面把这个general knowledge结束掉。然
后大家不要忘记做什么东西啊?就是你们一个practice 1, 还有一个practice 3,把
那个general knowledge 和那个translation 都做好,我明天要检查,明天还要讲一
下,就是那个,translation对掉。
所以明天上课的话,我们还是到那个traditional classroom, OK? So thank you
very much. And please hand in your compositions. OK, thank you for your
attention.
We are conducting a study on the impact of TEM on English teaching and learning.
Could you please help us by answering the following questions? Your responses
will be treated in confidence and will not be used for any other purposes. Thank you
very much for your time and cooperation!
1. Do the English majors in your school attend TEM 4/8?
2. Are you familiar with TEM? (test content, method, score allotment, time allot-
ment, marking criteria, etc.) What do you think of the marking criteria of TEM?
Appendices 243
We are conducting a study on the impact of TEM on English teaching and learning.
Could you please help us by answering the following questions? Your responses
will be treated in confidence and will not be used for any other purposes. Thank you
very much for your time and cooperation!
1. Have you ever taught TEM related courses? (If no, are you familiar with TEM?)
2. Are you familiar with TEM? (test content, method, score allotment, time allot-
ment, marking criteria, etc.) What do you think of the marking criteria of TEM?
(Is it easy to follow? Do you refer to it in TEM preparation?)
3. How do you evaluate TEM? (Is it difficult? Are the test content and method
appropriate?)
4. How does TEM affect your teaching? (teaching content, method, pressure, etc.)
5. What do you think is most important in TEM preparation? What do you think of
TEM preparation?
6. What do you think is the difference between TEM preparation classes and non-
preparation classes, such as fundamental English course? What do you think of
the difference? Will it lead to students’ better performance in TEM? Will it help
improve English ability?
(If the teacher has not taught TEM preparation courses, go to Question 10.)
7. Do you speak Chinese more in TEM preparation classes? What do you think of
this practice?
8. Do you lecture more in TEM preparation classes? Do students practice less?
What do you think of this practice?
9. Do you have students work on TEM items after class? Do you mark their home-
work? Which item do you mark the most frequently?
10. Do you have any comments and suggestions on TEM and teaching for English
majors?
244 Appendices
We are conducting a study on the impact of TEM on English teaching and learning.
Could you please help us by answering the following questions? Your responses
will be treated in confidence and will not be used for any other purposes. Thank you
very much for your time and cooperation!
1. Have you ever taken TEM (4,8)?
2. Are you familiar with TEM? (test content, method, score allotment, time allot-
ment, marking criteria, etc.) What do you think of the marking criteria of TEM?
(Is it easy to follow? Do you refer to it in TEM preparation?)
3. How do you evaluate TEM? (Is it difficult? Are the test content and method
appropriate?)
4. How does TEM affect your English learning? (interest, effort, etc.)
5. How do you evaluate your TEM preparation experience?
6. What do you think is the difference between TEM preparation classes and non-
preparation classes, such as fundamental English course? What do you think of
the difference? Will it lead to better performance in TEM? Will it help you
improve your English ability?
7. Does your teacher speak Chinese more in TEM preparation classes? What do
you think of this practice?
8. Does your teacher lecture more in TEM preparation classes? Do students prac-
tice less? What do you think of this practice?
9. Which item type in TEM do you practice the most? (Do you spend less time on
sbjective items such as writing? Why?)
10. Do you have any comments and suggestions on TEM and teaching for English
majors?
Bibliography
C. Alderson, Foreword. [A], in Washback in Language Testing: Research Context and Methods,
ed. by L. Cheng, Y. Watanabe, A. Curtis (Eds), (Lawrence Erlbaum Associates, Inc, Mahwah,
2004), pp. ix–xii
C. Alderson, L. Hamp-Lyons, TOEFL preparation courses: A study of washback [J]. Lang. Test.
13(3), 280–297 (1996)
C. Alderson, D. Wall, Does washback exist? [J]. Appl. Linguis. 14(2), 116–129 (1993)
R. Al-Mahrooqi et al. (eds.), Revisiting EFL Assessment: Critical Perspectives [C] (Springer,
Cham, 2017)
American Educational Research Association (AERA), American Psychological Association(APA),
National Council on Measurement in Education(NCME), Standards for Educational and
Psychological Testing [Z] (AERA, APA, NCME, Washington, DC, 1985)
L. Bachman, Fundamental Considerations in Language Testing [M] (Oxford University Press,
Oxford, 1990)
L. Bachman, A. Palmer, Language Testing in Practice [M] (Oxford University Press, Oxford,
1996)
L. Bachman, A. Palmer, Language Assessment in Practice [M] (Oxford University Press, Oxford,
2010)
K. Bailey, Working for washback: A review of the washback concept in language testing [J]. Lang.
Test. 13(3), 257–279 (1996)
C. Bogdan, K. Biklen, Qualitative Research in Education: An Introduction to Theory and Methods
[M] (Allyn & Bacon, Boston, 2006)
J. Cai, CET iBT: Significance and washback [J]. Comput. Assist. Foreign Lang. Educ. China 5,
3–8 (2009)
L. Cheng, How does washback influence teaching? Implications for Hong Kong [J]. Lang. Educ.
11, 38–54 (1997)
L. Cheng, Changing Language Teaching through Language Testing: A Washback Study [M]
(Cambridge University Press, Cambridge, 2005)
L. Cheng, Washback, impact and consequences. [A], in Encyclopedia of Language and Education,
Vol. 7: Language Testing and Assessment, ed. by E. Shohamy, N. H. Hornberger (Eds), 2nd
edn., (Springer, New York, 2008), pp. 349–364
L. Cheng, Voices from test takers: Further evidence for language assessment and use [J]. Educ.
Assess. 16, 104–122 (2011)
L. Cheng, Y. Watanabe, A. Curtis (eds.), Washback in Language Testing: Research Context and
Methods [C] (Lawrence Erlbaum Associates, Inc., Mahwah, 2004)
D. Cohen, The coming of age of research on test-taking strategies [J]. Lang. Assess. Q. 3, 307–331
(2006)
J. Cronbach, Course improvements through evaluation [J]. Teach. Coll. Rec. 64, 672–683 (1963)
W. Dai, H. Feng, A retrospect of the tests for foreign language majors [J]. Foreign Lang. World 6,
2–8 (2010)
Z. Dang, Improving the Authenticity of the Listening Sub-tests in TEM [D]. Unpublished PhD
Dissertation. Shanghai International Studies University, 2004
A. Davies, Demands of being professional in language testing [J]. Lang. Test. 14(3), 328–339
(1997)
A. Davies et al., Dictionary of Language Testing [Z] (University of Cambridge Local Examinations
Syndicate, Cambridge, 1999)
N. Denzin, Research Act: Theoretical Introduction to Sociological Methods [M] (McGraw-Hill,
New York, 1978)
I. Ferman, The washback of an EFL national oral matriculation test to teaching and learning [A], in
Washback in Language Testing: Research Context and Methods, ed. by L. Cheng, Y. Watanabe,
A. Curtis (Eds), (Lawrence Erlbaum Associates, Inc, Mahwah, 2004), pp. 191–210
N. Fredericksen, The real test Bias: The influence of testing on teaching and learning [J]. Am.
Psychol. 39(3), 193–202 (1984)
R. Frederiksen, A.A. Collins, Systems approach to educational testing [J]. Educ. Res. 18(9), 27–32
(1989)
M. Frohlich, N. Spada, P. Allen, Differences in the communicative orientation of L2 classrooms
[J]. TESOL Q. 19(1), 27–57 (1985)
R. Gorsuch, Factor Analysis [M] (Lawrence Erlbaum, Hillsdale, 1983)
A. Green, IELTS Washback in Context [M] (Cambridge University Press, Cambridge, 2007)
X. Gu, Positive or Negative—an Empirical Study of CET Washback [M] (Chongqing University
Press, Chongqing, 2007)
X. Gu, Y. Peng, A longitudinal study on college English teachers’ perception of CET and its wash-
back [J]. Foreign Lang. Their Teach. 6, 37–41 (2010)
L. Hamp-Lyons, Ethical test preparation practice: The case of the TOEFL [J]. TESOL Q. 33(2),
329–337 (1998)
L. Hamp-Lyons, Fairness in language testing [A], in Fairness and Validation in Language
Assessment, Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University
Press, Cambridge, 2000a), pp. 30–34
L. Hamp-Lyons, Social, professional and individual responsibility in language testing [J]. System
28, 579–591 (2000b)
R. Hawkey, Impact Theory and Practice: Studies of the IELTS Test and Progetto Lingue 2000 [M]
(Cambridge University Press, Cambridge, 2006)
D. Ho, Why do teachers ask the questions they ask? [J]. Reg. Lang. Cent. J. 36(3), 297–310 (2005)
A. Hughes, Introducing a needs-based test of English language proficiency into an English
Medium University in Turkey [A], in Testing English for University Study (ELT Documents
#127), ed. by A. Hughes (Ed), (Modern English Publications in association with the British
Council, London, 1988), pp. 134–146
A. Hughes, Backwash and TOEFL 2000 [Z]. Unpublished manuscript, University of Reading,
1993
A. Hughes, Testing for Language Teachers [M] (Cambridge University Press, Cambridge, 2003)
Y. Jin, Backwash effect of CET-SET on the teaching of EFL in China [J]. Foreign Lang. World 4,
56–61 (2000)
Y. Jin, On the improvement of test validity and test washback—The CET washback study [J].
Foreign Lang. World 6, 65–73 (2006)
Y. Jin, J. Fan, Test for English Majors (TEM) in China [J]. Lang. Test. 28(4), 589–596 (2011)
M. Kane, Current concerns in validity theory [J]. J. Educ. Meas. 38(4), 319–342 (2001)
T. Kellaghan, V. Greaney, Using Examinations to Improve Education: A Study of Fourteen African
Countries [M] (World Bank, Washington, DC, 1992)
Y. Kong, J. Nie, Compound dictation in CET and its backwash effect on language teaching [J].
Foreign Lang. World 2, 51–57 (2002)
Bibliography 247
A. Kunnan, Fairness and justice for all [A], in Fairness and Validation in Language Assessment,
Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University Press,
Cambridge, 2000), pp. 1–14
A. Kunnan, Language assessment from a wider context [A], in Handbook of Research in Second
Language Teaching and Learning, ed. by E. Hinkel (Ed), (Lawrence Erlbaum Associates, Inc.,
Mahwah, 2005), pp. 779–794
H. Latham, On the Action of Examinations Considered as a Means of Selection [M] (Deighton,
Bell and Company, Cambridge, 1877)
X. Li, How powerful can a language test be? The MET in China [J]. J. Multiling. Multicut. Dev.
11(5), 393–404 (1990)
S. Li, X. Chen, On the present and future of TEM from the perspective of language testing theories
[J]. Foreign Lang. World 3, 15–20 (2012)
X. Lian, On the testing device and the modifications of the scoring system for dictation in CEPTEM
[J]. Foreign Lang. Educ. 2, 75–79 (1991)
W. Lu, SPSS for Windows Statistical Analysis [M] (Publishing House of Electronics Industry,
Beijing, 2006)
E. Macaro, Codeswitching in the L2 classroom: A communication and learning strategy [A], in
Non-native Language Teachers: Perceptions, Challenges and Contributions to the Profession,
ed. by E. Llurda (Ed), (Springer, New York, 2005), pp. 63–84
F. Madaus, The influence of testing on the curriculum [A], in Critical Issues in Curriculum, ed. by
L. N. Tanner (Ed), (Chicago University Press, Chicago, 1988), pp. 83–121
F. McNamara, Measuring Second Language Performance [M] (Longman, Harlow, 1996)
S. Messick, Validity [A], in Educational Measurement, ed. by R. Linn (Ed), 3rd edn., (ACE and
Macmillan, New York, 1989), pp. 13–103
S. Messick, Validity and washback in language testing [J]. Lang. Test. 13, 241–256 (1996)
K. Morrow, The evaluation of tests of communicative performance [A], in Innovations in Language
Testing: Proceedings of the IUS/NFER Conference, ed. by M. Portal (Ed), (NFER/Nelson,
London, 1986), pp. 1–13
A. Moss, J. Girard, C. Haniford, Validity in educational assessment [J]. Rev. Res. Educ. 30, 109–
162 (2006)
NACFLT, Syllabus for University English Language Teaching [Z] (Shanghai Foreign Language
Education Press/Foreign Language Teaching and Research Press, Shanghai/Beijing, 2000)
NACFLT, Syllabus for TEM4 [Z] (Shanghai Foreign Language Education Press, Shanghai, 2004a)
NACFLT, Syllabus for TEM8 [Z] (Shanghai Foreign Language Education Press, Shanghai, 2004b)
NACFLT, Syllabus for TEM4-Oral [Z] (Shanghai Foreign Language Education Press, Shanghai,
2005)
NACFLT, Syllabus for TEM8-Oral [Z] (Shanghai Foreign Language Education Press, Shanghai,
2008)
Q. Patton, Qualitative Evaluation and Research Methods, 2nd edn. [M]. (Sage, London, 1990)
J. Popham, Two-plus decades of educational objectives [J]. Int. J. Educ. Res. 11(1), 31–41 (1987)
L. Qi, The intended washback effect of the National Matriculation English Test in China: Intentions
and reality [M] (Foreign Language Teaching and Research Press, Beijing, 2004)
L. Qi, A review of washback studies [J]. Foreign Lang. Learn. Theory Pract. 4, 23–28 (2011)
X. Qin, Fully exploiting the positive washback effects of TEM4/8 to arrest the decline in the qual-
ity of teaching and learning [J]. Foreign Lang. World 3, 10–14 (2012)
B. Resnick, P. Resnick, Assessing the thinking curriculum: New tools for educational reform [A],
in Changing Assessments: Alternative Views of Aptitude, Achievement and Instruction, ed. by
B. G. Gifford, M. C. O’Conner (Eds), (Kluwer Academic Publishers, Boston, 1992), pp. 37–75
X. Shi, A longitudinal backwash study of the new CET4/6 listening tests [J]. Foreign Lang. World
3, 80–86 (2010)
C. Shih, Perceptions of the General English Proficiency Test and its Washback: A Case Study at two
Taiwan Technological Institutes [M] (LAP LAMBERT Academic Publishing, Saarbrucken,
2011)
248 Bibliography
E. Shohamy, Fairness in language testing [A], in Fairness and Validation in Language Assessment,
Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University Press,
Cambridge, 2000), pp. 15–19
E. Shohamy, The Power of Tests: A Critical Perspective of the Uses of Language Tests [M]
(Longman, Harlow, 2001)
E. Shohamy, S. Donitsa-Schmidt, I. Ferman, Test impact revisited: Washback effect over time[J].
Lang. Test. 13(3), 298–317 (1996)
L. Smith, Put to the test: The effects of external testing on teachers [J]. Educ. Res. 20(5), 8–11
(1991)
H. Smith, S. Higgins, Opening classroom interaction: The importance of feedback [J]. Camb.
J. Educ. 36(4), 485–502 (2006)
M. Spaan, Enhancing fairness through a social contract [A], in Fairness and Validation in Language
Assessment, Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University
Press, Cambridge, 2000), pp. 35–38
B. Spolsky, The examination-classroom backwash cycle: some historical cases [A], in Bringing
About Change in Language Education, ed. by D. Nunan, R. Berry, V. Berry, (The University of
Hong Kong, Department of Curriculum Studies, Hong Kong, 1996), pp. 55–66
SPSS Inc, SPSS Statistics for Windows, Version 17.0 [Z] (SPSS Inc., Released, Chicago, 2008)
Y. Sun, Context, Construct, and Consequences: Washback of the College English Test in China [M]
(LAP LAMBERT Academic Publishing, Saarbrucken, 2016)
Y. Tang, J. Peng, The washback effect of CET spoken English test upon college English learners
[J]. Foreign Lang. World 1, 25–30 (2004)
The TEM Test Centre, The Test for English Majors (TEM) validation study [M] (Shanghai Foreign
Language Education Press, Shanghai, 1997)
R. Ullmann, E. Geva, The Target Language Observation Scheme (TALOS): Handbook [M]
(Modern Language Center, Ontario Institute for Studies in Education, Mimeo, 1982)
L. Van Lier, Reeling, writhing, drawling, stretching and feinting in coils: Oral proficiency inter-
views as conversation [J]. TESOL Q. 23, 489–508 (1989)
D. Wall, Introducing new tests into traditional systems: Insights from general education and from
innovation theory [J]. Lang. Test. 13(3), 334–354 (1996)
D. Wall, The impact of high-stakes testing on teaching and learning: can this be predicted or con-
trolled? System 28(4), 499–509 (2000)
D. Wall, The Impact of High-Stakes Examinations on Classroom Teaching: A Case Study Using
Insights from Testing and Innovation Theory [M] (Cambridge University Press, Cambridge,
2005)
D. Wall, J. Alderson, Examining washback: The Sri Lankan impact study [J]. Lang. Test. 10(1),
41–69 (1993)
Y. Watanabe, Does grammar-translation come from the entrance examination? Preliminary find-
ings from classroom-based research [J]. Lang. Test. 13(3), 318–333 (1996)
Y. Watanabe, Constructing a classroom observation scheme for the test impact research[J]. Sophia
Linguistica 41, 297–313 (1997)
Y. Watanabe, Methodology in washback studies [A], in Washback in Language Testing: Research
Context and Methods 9, ed. by L. Cheng, Y. Watanabe, A. Curtis (Eds), (Lawrence Erlbaum
Associates, Inc, Mahwah, 2004), pp. 9–36
J. Weir, Communicative Language Testing [M] (Prentice Hall, New York, 1990)
Q. Wen, K. Wu, W. Wang, Y. Ren, A study of the feasibility of the National Oral Test for English
majors-band 8 [J]. Foreign Lang. World 4, 67–73 (2002)
Q. Wen, W. Wang, D. Zhou, Y. Wang, A study on the implementation of National Oral Test for
English majors-band 8 [J]. Foreign Lang. World 5, 53–58 (2005)
H. Wesdorp, Backwash effects of language-testing in primary and secondary education [J]. J. Appl.
Lang. Stud. 1(1), 40–55 (1982)
Y. Wu, On the new version of TEM8 syllabus and its wash-back effects [J]. Foreign Lang. Teach.
Abroad 1, 36–40 (2005)
Bibliography 249
Q. Xi, J. Jiang, TEM4 testing modes and fundamental English teaching reform [J]. Foreign Lang.
Their Teach. 4, 24–27 (2006)
W. Xiao, X. Gu, C. Ni, The washback mechanism of CET: a longitudinal study via multi-group
structural equation modeling [J]. Foreign Lang. Learn. Theory Pract. 3, 37–43 (2014)
Q. Xu, A study on the washback effects of TEM8 [J]. Foreign Lang. World 3, 21–31 (2012)
N. Yao, Giant footsteps: A retrospective account of TEM4/8 from 1990 to 2010 [J]. Foreign Lang.
World 6, 19–25 (2010)
J. Ye, Thoughts on the backwash of CET to teaching [J]. Foreign Lang. World 3, 40–43 (1998)
G. Yu, Y. Jin (eds.), Assessing Chinese Learners of English: Language Constructs, Consequences
and Conundrums[C] (Palgrave Macmillan, London, 2016)
Y. Zhu, Towards the standardization and Scientificness of Tem4 and TEM8 [J]. Foreign Lang.
World 1, 24–29 (1995)
P. Zhu, Changes in the new editions of syllabus for TEM4 and TEM8 [J]. Foreign Lang. World 1,
67–70 (2005)
S. Zou, A survey report on TEM4 and TEM8 in some universities and colleges [J]. Foreign Lang.
World 1, 20–23 (1995)
S. Zou, The alignment between teaching syllabus and language tests: In the case of TEM8 design
and administration [J]. Foreign Lang. World 6, 71–78 (2003)
S. Zou, Understanding the washback effects of tests—With special reference to the revision of the
TEM4/8 test battery [J]. Foreign Lang. World 5, 59–66 (2005)
S. Zou, Test for English majors – Its past and future [A], in Foreign Language Educators on
Language Education, ed. by Z. Zhuang (Ed), (Shanghai Foreign Language Education Press,
Shanghai, 2008), pp. 378–390
S. Zou, Looking back and ahead – In commemoration of the 20th anniversary of TEM4 and TEM8
[J]. Foreign Lang. World 6, 9–25 (2010)
S. Zou, Q. Xu, Criterion-referenced assessment and its washback effects: The case of TEM [J].
Foreign Lang. Learn. Theory Pract. 1, 42–48 (2014)