(Qian Xu Jun Liu) A Study On The Washback Effects (B-Ok - CC)

Qian Xu · Jun Liu
A Study on the
Washback Effects of
the Test for English
Majors (TEM)
Implications for Testing and Teaching
Reforms
A Study on the Washback Effects of the Test
for English Majors (TEM)
Qian Xu • Jun Liu
A Study on the Washback

Effects of the Test for
English Majors (TEM)
Implications for Testing and Teaching
Reforms
Qian Xu Jun Liu
Shanghai International Studies University Shanghai University of Electric Power
Shanghai, China Shanghai, China
ISBN 978-981-13-1962-4 ISBN 978-981-13-1963-1 (eBook)

https://doi.org/10.1007/978-981-13-1963-1
Library of Congress Control Number: 2018951416
© Springer Nature Singapore Pte Ltd. 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Acknowledgements
This book is funded by the Fundamental Research Funds for the Central Universities
in China (Research Grant No. KY01X0222017082).
v
Contents
1 TEM: A High-Stakes Standardized Test in the Chinese Context�� 1

1.1 The Educational System in China�� 1
1.2 The History of TEM�� 3
1.2.1 The Launch of TEM�� 3
1.2.2 The Development of TEM�� 4
1.2.3 The Soaring Test Population �� 6
1.2.4 Research on TEM �� 6
1.3 TEM Test Characteristics�� 9
1.3.1 TEM Test Purpose�� 9
1.3.2 TEM Test Content and Format�� 9
1.3.3 TEM Administration and Marking�� 11
1.3.4 TEM Test Score Report and Interpretation �� 12
1.4 TEM Test Characteristics vs. the Teaching Syllabus�� 13
1.4.1 The Teaching Syllabus�� 13
1.4.2 A Comparison Between the Teaching Syllabus
and TEM Characteristics�� 14
1.5 Summary �� 16
2 Washback: Definitions and Dimentions�� 17
2.1 Definitions of Washback �� 17
2.1.1 Washback or Backwash�� 17
2.1.2 Various Terminology�� 18
2.1.3 Washback and Validity�� 18
2.2 Dimensions of Washback�� 19
2.2.1 Direction �� 19
2.2.2 Extent�� 20
2.2.3 Intensity�� 20
2.2.4 Intentionality �� 21
2.2.5 Length �� 21
2.2.6 Specificity �� 21
vii
viii Contents
3 Theoretical Frameworks for Washback Studies �� 23

3.1 Frameworks for Washback Studies�� 23
3.1.1 Alderson and Wall’s Washback Hypotheses �� 23
3.1.2 Hughes’ Washback Trichotomy�� 24
3.1.3 Bailey’s Basic Model of Washback�� 25
3.1.4 Green’s Washback Model �� 25
3.2 Empirical Studies on Washback�� 26
3.2.1 Summary of Empirical Studies �� 27
3.2.2 A Critique of Approaches in Washback Studies�� 31
4 Research Framework for TEM Washback Study�� 35
4.1 Framework for TEM Washback Study�� 35
4.1.1 Contributing Factors to TEM Washback�� 35
4.1.2 TEM Washback Mechanism �� 36
4.1.3 Several Terms Redefined�� 37
4.1.4 Research Design�� 39
4.2 Participants�� 40
4.3 Instruments�� 41
4.3.1 Research Design for the Questionnaire Surveys�� 41
4.3.2 Classroom Observations �� 94
4.3.3 Interviews�� 98
4.4 Considerations for Validity and Reliability�� 98
4.4.1 The Development of Research Instruments�� 99
4.4.2 Data Collection and Analysis�� 101
4.4.3 Triangulation�� 104
4.5 Summary �� 105
5 Stakeholders’ Perceptions of TEM and Its Washback �� 107
5.1 Experts’ Perceptions �� 107
5.1.1 Demographic Information�� 108
5.1.2 Experts’ Evaluation of TEM �� 108
5.1.3 Experts’ Perception of TEM Washback�� 109
5.1.4 Factors Influencing Teaching�� 110
5.2 Program Administrators’ Perceptions �� 110
5.2.2 Program Administrators’ Evaluation of TEM�� 110
5.2.3 Program Administrators’ Perception
of TEM Washback�� 116
5.2.4 Washback on Course Design�� 118
5.2.5 Individual Differences�� 120
Contents ix
5.3 Teachers’ Perceptions �� 123

5.3.2 Teachers’ Evaluation of TEM �� 124
5.3.3 Teachers’ Perception of TEM Washback�� 128
5.3.4 Washback on Classroom Teaching�� 129
5.4 Students’ Perceptions�� 134
5.4.2 Students’ Evaluation of TEM �� 135
5.4.3 Students’ Perception of TEM Washback�� 139
5.4.4 TEM Preparation�� 140
5.4.5 Learning Effects�� 145
5.4.7 Factors Influencing Learning�� 147
5.5 Triangulation�� 148
5.5.1 Data Triangulation�� 148
5.5.2 Methodological Triangulation�� 152
5.6 Summary �� 154
6 Classroom Practice and the Washback of TEM �� 157
6.1 The Teachers Observed �� 157
6.2 The English Classes Observed�� 158
6.3 Characteristics of English Classroom Practice �� 159
6.3.1 Teaching Content�� 160
6.3.2 Teaching Methods�� 162
6.4 TEM Preparation Classes vs. Non-preparation Classes�� 165
6.4.1 Teaching Content�� 166
6.4.2 Teaching Methods�� 166
6.5 Classroom Practice Affected by Other Factors �� 170
6.5.1 English Used in Class �� 170
6.5.2 Features of Teacher-Student Interaction �� 171
6.5.3 Classroom Atmosphere �� 174
6.6 Triangulation�� 175
6.6.1 Classroom Observation Findings vs. Questionnaire
Findings�� 175
6.6.2 Classroom Observation Findings vs. Interview
Findings�� 176
6.6.3 Classroom Observation Findings vs. Document
Analysis�� 177
6.7 Summary �� 177
x Contents
7 Conclusions and Implications �� 179

7.1 Summary of the Major Findings �� 179
7.1.1 Washback Effects Perceived by Experts �� 179
7.1.2 Washback Effects Perceived by Program
Administrators�� 179
7.1.3 Washback Effects Perceived by Teachers �� 180
7.1.4 Washback Effects Perceived by Students�� 180
7.1.5 Washback Effects on English Classrooms�� 181
7.2 Implications for the Reform�� 181
7.2.1 Implications for TEM Reform�� 181
7.2.2 Implications for the Reform of Teaching
and Learning �� 185
Appendices�� 187
Appendix I Survey Questionnaires�� 187
Questionnaire for Experts (EQ) �� 187
Appendix II Classroom Observations�� 215
Bibliography �� 245

List of Acronyms
A Advanced stage
AQ-4 Questionnaire for TEM4 related administrators
AQ-8 Questionnaire for TEM8 related administrators
C-E Chinese to English
CET College English Test
E-C English to Chinese
EQ Questionnaire for experts
ESOL English for speakers of other languages
F Foundation stage
Gra Grammar
HKCEE Hong Kong Certificate of Education Examination in
English
HKEA Hong Kong Examinations Authority
IELTS International English Language Testing System
KMO Kaiser-Meyer-Olkin
MCQ Multiple choice question
MET Matriculation English Test
NACFLT National Advisory Committee for Foreign Language
Teaching
NMET National Matriculation English Test
PhD Doctor of philosophy
PL2000 Progetto Lingue 2000
SPSS 17.0 Statistical Product and Service Solutions 17.0
SQ-4 Questionnaire for TEM4 related students
SQ-8 Questionnaire for TEM8 related students
TEM Test for English Majors
TEM4 Test for English Majors Grade 4
TEM4-Oral Test for English Majors Grade 4 Oral Test
TEM8 Test for English Majors Grade 8
TEM8-Oral Test for English Majors Grade 8 Oral Test
The Teaching Syllabus The National Teaching Syllabus for English Majors
xi
xii List of Acronyms
TOEFL Test of English as a Foreign Language

TQ-4 Questionnaire for TEM4 related teachers
TQ-8 Questionnaire for TEM8 related teachers
Voc Vocabulary
Wpm Words per minute
List of Figures
Fig. 1.1 Educational hierarchy of English teaching for English majors........... 2

Fig. 4.1 TEM washback model......................................................................... 39
xiii
List of Tables
Table 1.1 A comparison between the previous TEM and new TEM............. 5
Table 1.2 TEM studies published over the years........................................... 7
Table 1.3 Research focus of the journal articles and PhD dissertations........ 8
Table 1.4 Test content and format of TEM.................................................... 10
Table 1.5 A comparison between the teaching syllabus
and TEM characteristics................................................................. 15
Table 3.1 Overview of the empirical studies on washback............................ 28
Table 4.1 A multi-phase, multi-method research design
for TEM washback study............................................................... 40
Table 4.2 The seven questionnaires............................................................... 43
Table 4.3 The number of questionnaires distributed,
returned and valid........................................................................... 45
Table 4.4 Reliability test for AQ-4, TQ-4, SQ-4, AQ-8, TQ-8
and SQ-8........................................................................................ 47
Table 4.5 KMO and Bartlett’s tests................................................................ 48
Table 4.6 Rotated component matrix for the questionnaires......................... 49
Table 4.7 Inter-coder reliability (first round)................................................. 103
Table 4.8 Inter-coder reliability (second round)............................................ 103
Table 4.9 Intra-coder reliability..................................................................... 104
Table 5.1 Demographic information of program administrators................... 111
Table 5.2 Familiarity with TEM.................................................................... 112
Table 5.3 Evaluation of TEM......................................................................... 113
Table 5.4 Information to be provided in the test report................................. 114
Table 5.5 Use and interpretation of TEM scores........................................... 115
Table 5.6 Influence of TEM scores................................................................ 115
Table 5.7 Unintended use of TEM scores...................................................... 116
Table 5.8 Attitudes towards unintended uses of TEM scores........................ 116
Table 5.9 Views on TEM washback............................................................... 117
Table 5.10 Washback exerted by different aspects of TEM............................. 117
xv
xvi List of Tables
Table 5.11 Courses introduced by TEM participating universities.................. 119

Table 5.12 Washback from TEM revisions...................................................... 120
Table 5.13 TEM preparation courses............................................................... 121
Table 5.14 TEM washback perceived by different program
administrators................................................................................. 121
Table 5.15 Factors affecting English language teaching.................................. 122
Table 5.16 Demographic information of teachers............................................ 124
Table 5.17 Familiarity with TEM.................................................................... 125
Table 5.18 Evaluation of TEM......................................................................... 126
Table 5.19 Information to be provided in the test report................................. 126
Table 5.20 Use and interpretation of TEM scores........................................... 127
Table 5.21 Influence of TEM scores................................................................ 127
Table 5.22 Unintended uses of TEM scores.................................................... 127
Table 5.23 Attitudes towards unintended uses of TEM scores........................ 128
Table 5.25 Washback exerted by different aspects of TEM............................. 129
Table 5.26 TEM preparation experience.......................................................... 130
Table 5.27 TEM preparation classes................................................................ 131
Table 5.28 Difference between TEM preparation
and non-preparation classes........................................................... 131
Table 5.29 Washback from TEM revisions...................................................... 132
Table 5.30 TEM washback on teaching perceived by different teachers......... 133
Table 5.31 Factors affecting English teaching................................................. 134
Table 5.32 Sources of pressure for teachers..................................................... 134
Table 5.33 Demographic information of students............................................ 135
Table 5.34 Students’ familiarity with TEM..................................................... 136
Table 5.35 Students’ evaluation of TEM......................................................... 137
Table 5.36 Information to be provided in TEM test report.............................. 138
Table 5.37 Use and interpretation of TEM results........................................... 139
Table 5.39 Positive and negative washback of TEM....................................... 140
Table 5.40 TEM preparation courses............................................................... 141
Table 5.41 Activities conducted in TEM preparation classes.......................... 141
Table 5.42 Difference between TEM preparation
Table 5.43 Students’ preference for classroom activities................................ 143
Table 5.44 After-class preparation efforts........................................................ 143
Table 5.45 Learning activities conducted for TEM preparation
after class........................................................................................ 144
Table 5.46 Difference between TEM preparation and non-preparation
stages.............................................................................................. 145
Table 5.47 Washback of TEM on learning effects........................................... 146
Table 5.48 Progress in fundamental/advanced stage....................................... 146
Table 5.49 TEM washback on students with different characteristics............. 147
Table 5.50 Factors affecting English learning.................................................. 148
Table 5.51 Major aims for learning.................................................................. 148
List of Tables xvii
Table 6.1 Background information of the teachers observed......................... 158

Table 6.2 No. of classes taught by each teacher............................................ 159
Table 6.3 No. of classes observed from different schools.............................. 159
Table 6.4 No. of TEM preparation and non-preparation
classes observed............................................................................. 159
Table 6.5 Teaching content of all the classes observed.................................. 160
Table 6.6 Teaching materials used in class.................................................... 162
Table 6.7 Classroom activities organized in all the classes observed............ 163
Table 6.8 English used in classrooms observed............................................. 164
Table 6.9 Features of teacher-student interaction.......................................... 164
Table 6.10 Frequency of students’ laughter..................................................... 165
Table 6.11 No. of skills (knowledge) covered in each class............................ 166
Table 6.12 Mean time devoted to each activity (percentage of time).............. 167
Table 6.13 English used by teachers in TEM preparation
Table 6.14 English used by students in TEM preparation
Table 6.15 English used by teachers in classes of different grades................. 170
Table 6.16 English used by different teachers in class ................................... 171
Table 6.17 English used by teachers in classes from different schools........... 171
Table 6.18 Communicative features of classes delivered
by different teachers....................................................................... 172
Table 6.19 Communicative features of classes from different schools............ 173
Table 6.20 Laughter in classes of different grades........................................... 174
Chapter 1
TEM: A High-Stakes Standardized Test
in the Chinese Context
1.1 The Educational System in China
In China, quite a few universities offer a four-year program for undergraduate

English majors, who will be conferred a bachelor’s degree in English Language and
Literature if they have met all the requirements set by their universities, such as
credits, degree thesis and TEM certificates. The following figure demonstrates the
educational hierarchy of college English teaching for English majors.
As illustrated in Fig. 1.1, teaching for English majors is under the influence of
many factors, which can be mainly divided into three levels.
The top level is the national level, which is composed of policy-makers, the
Teaching Syllabus designers and the TEM test developers. Policy makers are the
officials in the Ministry of Education who supervise and evaluate the overall quality
of college education, particularly those in the National Advisory Committee for
Foreign Language Teaching (NACFLT), who are responsible for establishing poli-
cies in English education. The policies they have formulated will be implemented
by the Teaching Syllabus designers and the TEM test developers, who, although
from two separate sections, are in close relation, and will jointly decide how the
policies should be managed and how the teaching and testing syllabuses should
incorporate the basic principles of the policy.
The second level is the school level—the administrators. Based on the Teaching
Syllabus, the administrators will set the teaching aims and design the curriculum
accordingly, such as the courses to be introduced, the textbooks to be used, and so
on. However, different curricula will be developed in each school due to the follow-
ing reasons. First is regional difference. China is a vast country, and the east coastal
regions far surpass west inland areas in educational level, so the language require-
ments set for English majors differ from region to region. Second is the school level.
In China, universities are divided into key universities, ordinary universities and
colleges, and the admission standard differs a lot among them, with key universities
recruiting the top students. Therefore, English teaching aims and curricula differ
© Springer Nature Singapore Pte Ltd. 2018 1

Q. Xu, J. Liu, A Study on the Washback Effects of the Test for English Majors (TEM),
https://doi.org/10.1007/978-981-13-1963-1_1
2 1 TEM: A High-Stakes Standardized Test in the Chinese Context
Ministry of Education
Department of Higher Education
National Advisory Committee for Foreign Language Teaching
Teaching Syllabus Designers TEM Testing Committee
Teaching Syllabus TEM
School Administrators
Teachers
Students
Fig. 1.1 Educational hierarchy of English teaching for English majors
markedly among schools of different levels. Third, schools of different type may
vary in their English curriculum. For instance, foreign language universities may
focus more on language and literature while technology institute may introduce
more science-related English courses.
Last, the bottom level is the classroom level, including teachers and students who
are responsible for the implementation of the school curriculum and the Teaching
Syllabus. Based on the school curriculum, teachers may design their respective
course syllabus, and their previous knowledge and available resources will affect
their teaching practice. Moreover, university students are from different regions and
background, who may perform differently in the same classroom.
All the people involved in this hierarchy are TEM stakeholders, and the higher
the level, the fewer the stakeholders, but the greater the power they have. Besides,
stakeholders from each level may overlap: some members in the NACFLT have
participated in the development of the Teaching Syllabus, some experts in the TEM
testing committee are also school administrators, and some school administrators
are English teachers at the same time. Therefore, in the current study, although the
research focus is the washback of TEM on English teaching and learning, opinions
of stakeholders from each level will be sought, with teachers and students as the
major participants.
In this view, a top down approach will be adopted when investigating the wash-
back effects of TEM, which will be carried out in three phases. First, opinions of
1.2 The History of TEM 3
stakeholders from the national level will be investigated, through the analysis of
related documents and a semi-open questionnaire survey among the experts, who
are either members of the NACFLT or syllabus designers, or both. Then, a question-
naire survey will be conducted among school administrators, to find out different
practices in all the TEM participating schools. Last, teachers and students will be
investigated, through large-scale questionnaires and case study classroom observa-
tions. As a supplementary instrument, interviews will be conducted in all the three
phases, to obtain follow-up data.
1.2 The History of TEM
The history of TEM can be traced back to the early 1990s, and in its two-decade
development, TEM has undergone several revisions. The following part will intro-
duce the launch and development of TEM, its soaring test population, as well as the
related research on TEM.
1.2.1 The Launch of TEM
Before reviewing the history of TEM, we will first introduce the formulation and
publication of the Teaching Syllabus. After the Cultural Revolution, foreign lan-
guage education in China has entered a new era. The early 1980s witnessed a rapid
increase in the demand for English majors in every field, because of our government
adoption of the policy of reform and opening up to the outside world. As a result,
many universities set up English departments and offered English programs.
However, due to various reasons, huge gap existed among different universities in
curriculum design, teaching material selection and course evaluation, which led to
an urgent demand for a national teaching syllabus. Hence, the first two national
teaching syllabuses for college English majors, foundation stage and advanced
stage, were published in 1989 and 1990 respectively. The two teaching syllabuses
serve as guidelines for the development of curriculum and teaching materials for
English majors in various universities throughout China. They have established spe-
cific teaching requirements and English proficiency standards for undergraduate
English majors, and thus, it is possible and necessary to develop a nation-wide test
to check the implementation of the Teaching Syllabus.
In line with this demand, two groups of specialists started working on TEM,
based in two different universities. Shanghai International Studies University was
responsible for TEM4, a test for English sophomores to measure their proficiency in
the foundation stage; while Guangdong University of Foreign Studies was respon-
sible for TEM8, a test for English majors in their fourth year to measure their profi-
ciency in the advanced stage. The first TEM4 and TEM8 were launched in 1990 and
1991 respectively. Commissioned by NACFLT, Shanghai International Studies
University has assumed responsibility for both tests since October, 1992. To stan-
dardize the test, the first TEM Syllabuses were drafted in 1993 and published in
1994, which marked the official beginning of TEM.
Following the introduction of TEM4 and TEM8 written tests, Nanjing University
was given responsibility to conduct studies on the feasibility of TEM4-Oral and
TEM8-Oral in 1994 and 1998 respectively. After several years’ of trialing, TEM4-
Oral and TEM8-Oral were officially launched in 1999 and 2003, and the Oral Test
Syllabuses were published in 2005.
1.2.2 The Development of TEM
It has been more than 20 years since TEM was first implemented, and in the course
of its development, TEM has undergone several revisions.
After its publication in 1994, the TEM Syllabuses (TEM4 and TEM8) has under-
gone two major revisions, one in 1997 and the other in 2004. A comprehensive vali-
dation study of TEM was conducted from 1993 to 1996, and based on its findings,
a revised edition of TEM Syllabuses were published in 1997. The second major
revision took place in 2004 as a response to the revision of the Teaching Syllabus in
2000. The revisions of TEM Syllabuses aim at improving test quality, standardizing
test development and administration, and achieving fairness of test result interpreta-
tion. (Jin and Fan 2011; Zou 2010).
In 2005, the new versions of TEM4 and TEM8 were introduced, employing more
integrative tasks and using lengthier and more authentic materials for listening and
reading. The test content and format of the previous TEM and new TEM are listed
in Table 1.1.
From the above table, three major changes can be identified. First and foremost,
the new versions of TEM set a higher requirement on the test takers. The listening
and reading speed has been increased, with faster listening materials and longer
reading passages; short listening materials like statements have been replaced by
longer ones like passages; a lengthier composition is required for the writing part;
and the passages for cloze and proofreading also lengthened. Second, speed reading
has been dropped from both TEM4 and TEM8. Last, a new section of general
knowledge has been included in TEM8.
As for the oral tests, a major reform was adopted in 2008, supplementing the
tape-mediated format with computer-based spoken English tests (Jin and Fan 2011).
Further discussions on these revisions will be provided in Sect. 1.4.
Table 1.1 A comparison between the previous TEM and new TEM
Task Input Format % Time
TEM4 (previous
version)
Writing 1) A topic, chart or graph A text of about 150 15 35
composition words
2) note-writing Written prompts A note of 50–60 words 5 10
Dictation A text of about 150 words, listen Word-for-word dictation 15 15
4 times, 120 wpm
Listening Statements, conversations and 25 MCQs 15 20
news broadcasts, 120 wpm
Cloze A text of about 250 words 15 MCQs 10 15
Grammar & 25 sentences 25 MCQs 15 15
vocabulary
Reading 1) careful 4–5 texts of about 1500 words 15 MCQs 15 25
altogether
2) speed 4–5 texts of about 900 words 10 MCQs 10 5
altogether
TEM4 (new version)
Writing 1) A topic, chart or graph A text of about 200 15 35
composition words
Dictation A text of about 150 words, listen Word-for-word dictation 15 15
4 times, 120 wpm
Listening Conversations, passages and news 30 MCQs 15 20
broadcasts, 120 wpm
vocabulary
Reading 4–5 texts of about 1800 words 15 MCQs 20 25
altogether
TEM8 (previous
version)
Listening Talks, conversations and news 15 MCQs 15 15–
broadcasts, 140–170 wpm 20
Mini-lecture, 700 words, 140 Note-taking and 10 20
wpm gap-filling (10 blanks)
Proofreading and A text of about 200 words Error identification and 10 15
error correction correction
Reading 1) careful 4–5 texts of about 2500 words 15 MCQs 15 30
2) speed 6–7 texts of about 3000 words 10 MCQs 10 10
Translation 1) C-E A text of about 150 Chinese Translation 10 30
characters, 250 wpm
2) E-C A text of about 150 words, 250 Translation 10 30
wpm
(continued)
Table 1.1 (continued)

Writing A topic and prompts A text of 300 words 20 60
TEM8 (new version)
Listening Mini-lecture, 900 words, 150 Note-taking and 10 10
wpm gap-filling (10 blanks)
Conversations and news 10 MCQs 10 25
broadcasts, 150 wpm
Reading 4–5 texts of about 3000 words 20 MCQs 20 30
General knowledge 10 sentences 10 MCQs 10 10
Proofreading A text of about 250 words Error identification and 10 15
correction
Translation 1) C-E A text of about 150 Chinese Translation 10 30
characters, 250–300 wpm
2) E-C A text of about 150 words, Translation 10 30
250–300 wpm
Note: %: weighting, with the overall percentage being 100; Time is given in minutes; Wpm = words
per minute; MCQ = multiple choice question; C-E: Chinese to English; E-C: English to Chinese
1.2.3 The Soaring Test Population
After more than two decades of development, TEM has grown into one of the pre-
dominant English tests in China, winning extensive recognition from test takers,
relevant institutions and society at large (Jin and Fan 2011). The number of test
takers and participating universities has increased rapidly and significantly.
As for the written tests of TEM4 and TEM8, from 1992 to 2015, the number of
participants increased respectively from around 10,000 to 270,000 and from 5000 to
210,000. The number of participating schools also increased from around 100 to
more than 800 for both TEM4 and TEM8. As for TEM oral tests, the number of test
takers and participating schools has also increased significantly, but its test taking
population is no more than one-tenth of its written counterpart, around 18,000 and
10,000 for TEM4 and TEM8 respectively in 2015. (data released by TEM test
office).
With its substantial improvement and increasing test population, TEM is playing
a more and more important role in English teaching and learning at the tertiary
level, exerting profound washback effects on the students and teachers, the educa-
tional system and even the society as a whole.
1.2.4 Research on TEM
Ever since the implementation of TEM in the early 1990s, quite a few studies have
been conducted to investigate its feasibility, validation, reform, implication for teach-
ing, and so on. The research findings have contributed to the reform of TEM and
Table 1.2 TEM studies published over the years

1991–1995 1996–2000 2001–2005 2006–2010 2011–2015
CSSCI journal article 11 7 33 35 13
PhD dissertation 0 0 4 9 5
English teaching and learning, which can also be considered an aspect of washback,
according to Bailey (1996). Therefore, in this part, research findings on TEM will be
introduced and analyzed, to summarize their impact on English teaching and learning.
Due to the limited space and time, as well as the vast number of studies on TEM, only
influential studies are selected here. The studies selected are: (1) articles published in
CSSCI journals in linguistics, since CSSCI journals are well established in the aca-
demic field which are widely received and which may exert influence on future
research; (2) PhD dissertations, because they are often based on in-depth research in
the field, and should be of high quality and will give us original insight into this issue.
A search in the CSSCI journals1 and CNKI2 Corpus of PhD and Master
Dissertations, using the key word “TEM”, shows that up till the end of 2015, 99
articles and 18 PhD dissertations have been published. Table 1.2 briefly summarizes
the number of research published over the years.
The first journal article on TEM was published in 1991 on the feasibility of using
dictation in TEM4 (Lian 1991), and the first PhD dissertation was completed in
2004 on the authenticity of listening comprehension questions in TEM (Dang 2004).
From the above table, we can find an apparent increase in the number of studies
conducted on TEM since 2001.
In terms of the research focus, the 99 journal articles and 18 PhD dissertations
cover almost every skill assessed in TEM (Table 1.3).
From Table 1.3, it can be noted that more journal articles focus on TEM in gen-
eral, such as the test report of TEM in a certain year, or comment on the revision of
TEM Syllabus, while PhD dissertations are more focused, usually concentrating on
one skill in TEM. Moreover, writing and translation seem to be studied more than
the rest of the language skilled measured in TEM, which is probably because of the
subjective nature of these two parts—test development, marking criteria and pro-
duced texts are all subjects for research.
These researches on TEM may contribute to English teaching and learning in the
following ways:
1. Test reports on TEM analyzed the test items and students’ performance in each
item, which would provide useful feedback information on the strengths and
weaknesses of students;
1
Eleven journals on linguistics (foreign language) included in CSSCI (2013–2014) are searched,
which are Foreign Language Teaching and Research, Foreign Language World, Journal of Foreign
Languages, Modern Foreign Languages, Chinese Translators Journal, Foreign Languages in
China, Foreign Language Research, Foreign Language Education, Foreign Languages and Their
Teaching, Computer-Assisted Foreign Language Education, and Foreign Language Learning
Theory and Practice.
2
CNKI: China National Knowledge Infrastructure, an online resource sharing platform which has
a collection of published articles.
8
1
Table 1.3 Research focus of the journal articles and PhD dissertations
General Listening Reading Writing Speaking Translation Culture Gra& voc Cloze Dictation
CSSCI 28 11 5 16 11 22 2 2 1 1
PhD 1 3 3 6 1 2 1 1 0 0
TEM: A High-Stakes Standardized Test in the Chinese Context
1.3 TEM Test Characteristics 9
2. Validation of a certain item in TEM investigated the construct of that section,

which would assist teaching and learning. For instance, validation of the reading
part in TEM may analyze the strategies needed to accomplish the reading tasks;
while the training of these strategies may also help students improve their read-
ing ability;
3. Some researches focused on development and use of the TEM-based corpus,
both of written and oral English. Analysis of these corpora may help us better
understand the language proficiency of the students so that proper teaching can
be offered.
1.3 TEM Test Characteristics
TEM is a set of criterion-referenced English tests administered by NACFLT on

behalf of the Higher Education Department, Ministry of Education, People’s
Republic of China (Jin and Fan 2011). The following part will introduce TEM test
purpose, TEM test content and format, TEM test administration, and TEM test score
report and interpretation.
1.3.1 TEM Test Purpose
As stated in the Test Syllabus, the purpose of TEM is twofold: (1) to measure the
English proficiency of undergraduate English majors, to check whether they have
met the required levels specified in the Teaching Syllabus; (2) to check the imple-
mentation of the Teaching Syllabus and to bring about reform on language
teaching.
1.3.2 TEM Test Content and Format
In the Teaching Syllabus, the four-year undergraduate program for English majors
is divided into the foundation stage (freshman and sophomore year) and the
advanced stage (junior and senior year). Accordingly, TEM4 and TEM4-Oral are
administered at the end of the second year to assess candidates’ English proficiency
in the foundation stage and TEM8 and TEM8-Oral are administered at the end of
the fourth year to assess candidates’ English proficiency in the advanced stage.
The test content of TEM4 includes listening, reading and writing, and two more
skills of translation and general knowledge (linguistics, culture and literature) are
assessed in TEM8. Basic language knowledge of grammar and vocabulary are mea-
sured in both tests, but in different formats, multiple-choice questions in TEM4 and
error correction in TEM8. As for oral tests, TEM4-Oral focuses on speaking, while
TEM8-Oral assesses not only speaking but also interpretation. Test content and for-
mat of the four tests are listed in Table 1.4.
Table 1.4 shows that various item types are employed in TEM, both objective
ones and subjective ones, both discrete-point items and integrative tasks. In TEM4
and TEM8, objective items of multiple-choice questions account for 60% and 40%
of the test, and the rest are all subjective items, such as writing and translation.
Discrete-point items are used, such as multiple-choice questions of grammar and
vocabulary in TEM4 and general knowledge in TEM8, and integrative tasks include
dictation and cloze in TEM4 and mini-lecture and proofreading in TEM8.
Table 1.4 Test content and format of TEM

TEM4
Writing 1) A topic, chart or graph A text of about 200 words 15 35
composition
Dictation A text of about 150 words, Word-for-word dictation 15 15
listen 4 times, 120 wpm
Listening Conversations, passages and 30 MCQs 15 20
news broadcasts, 120 wpm
vocabulary
Reading 4–5 texts of about 1800 15 MCQs 20 25
words altogether
TEM8
Listening Mini-lecture, 900 words, 150 Note-taking and gap-filling (10 10 10
wpm blanks)
Conversations and news 10 MCQs 10 25
broadcasts, 150 wpm
Reading 4–5 texts of about 3000 20 MCQs 20 30
words
General 10 sentences 10 MCQs 10 10
knowledge
Proofreading A text of about 250 words Error identification and correction 10 15
Translation 1) A text of about 150 Chinese Translation 10 30
C-E characters, 250–300 wpm
2) E-C A text of about 150 words, Translation 10 30
250–300 wpm
TEM4-oral
Retelling A story of about 300 words, Retelling for 3 min, no time for 20 9
listen twice preparation
(continued)
1.3 TEM Test Characteristics 11

Talking on a A topic related to the above Talking for 3 min after 3-min 20 9
given topic story preparation
Role-playing A topic for discussion 4-min discussion between two test 20 7
takers, after 3-min preparation
TEM8-oral
Interpreting (E-C) A speech of about 300 Listen to the speech once, listen 20 9
words, with about 150 words again and interpret sentence by
for interpretation sentence, 5 sentences for
interpretation
Interpreting (C-E) A speech of about 400 Listen to the speech once, listen 20 9
characters, with about 200 again and interpret sentence by
characters for interpretation sentence, 5 sentences for
interpretation
Making a A topic for presentation Talk for 3 min after 4-min 20 7
comment on a preparation
given topic
Note: %: weighting, with the overall percentage being 100; Time is given in minutes; the overall
score of TEM4-Oral and TEM8-Oral is 100, composed of 5 parts: the three tasks, pronunciation
and intonation, and grammar and vocabulary
1.3.3 TEM Administration and Marking
All the four tests of TEM are administered once a year: TEM4 and TEM4-Oral in
May for second-year students, TEM8 in March and TEM8-Oral in December for
fourth-year students. The total test time is 135 min for TEM4 and 195 min for
TEM8, and the two oral tests take approximate 25 min each.
For the written tests of TEM4 and TEM8, the test papers are sent directly to the
participating schools before administration, together with detailed guidelines, pro-
viding instructions on how the test should be administered, such as when to distrib-
ute test papers or answer booklets. The test is administered at the same time
throughout China and the listening section is broadcast from a specified radio chan-
nel. After administration, used test papers and answer booklets will be sent back to
TEM Test Office in Shanghai International Studies University.
As for the oral tests, they are administered in the audio laboratories of the partici-
pating schools, and candidates’ responses will be recorded either in tapes or in
discs, which will be sent to Nanjing University, Guangdong University of Foreign
Studies, or Dalian University of Foreign Languages, as assigned by the TEM Oral
Test Office in Nanjing University.
In TEM4 and TEM8, the objective items of multiple-choice questions are marked
by machines, and there is no penalty for wrong answers. The subjective items are
scored according to carefully-prepared marking criteria by well trained raters.
Detailed marking schemes are developed for TEM4 dictation, TEM8 mini lecture
and proofreading, to determine the marks deducted for each type of mistakes.
Moreover, analytic marking is adopted for writing and translation. Writing perfor-
mance is judged against the criteria of ideas and argument (50%, including content,
effectiveness and organization), language use (40%, including correctness, rich-
ness, appropriateness and fluency) and mechanics (10%, including spelling, punc-
tuation, neatness, and so on); while translation performance is evaluated against the
criteria of faithfulness (60%, faithfulness to the original text, both in content and in
style) and fluency (40%, correctness in language and elegance in expression).
TEM4-Oral and TEM8-Oral adopt a mixed marking approach. The three tasks
are marked separately, 20% for each, according to different marking schemes devel-
oped for each task, and the rest 40% is the quality of language, 20% for pronuncia-
tion and intonation and 20% for grammar and vocabulary. The performance of each
candidate will be marked by two raters independently.
1.3.4 TEM Test Score Report and Interpretation
The pass score for all the four tests of TEM is 60, and candidates scoring above 60
will receive a certificate from NACFLT, on which their level of performance will be
reported, including excellent (above 80 for TEM4 and TEM8, above 90 for TEM4-
Oral and TEM8-Oral), good (70–79 for TEM4 and TEM8, 80–90 for TEM4-Oral
and TEM8-Oral), and pass (60–69 for TEM4 and TEM8, 60–79 for TEM4-Oral and
TEM8-Oral). Candidates scored below 60 will receive no certificate. The test scores
are not reported to the test takers, but they can check their composite test scores of
TEM4 and TEM8 through the Academic Affairs Office in their schools. Scores of
TEM4-Oral and TEM8-Oral are not reported.
Of the four tests in the battery, only TEM4 is required of all undergraduate
English majors, while TEM8 and the two oral tests are optional. Decisions concern-
ing penalties for those failing TEM4 are left in the hands of the participating univer-
sities (Jin and Fan 2011).
As stated in the Test Syllabus, TEM results can be used to evaluate English
teaching and learning in the light of the Teaching Syllabus, and to provide feedback
to teaching and learning. However, due to the increasing social recognition of TEM
as an important benchmark of English proficiency, more and more organizations in
China are using TEM results to select and recruit employees; some schools do not
confer bachelor’s degree to those who have failed to obtain TEM certificates; and
teachers’ evaluation is connected with students’ performance in TEM in some
schools. (Jin and Fan 2011).
The increasing social stakes of TEM will have double effects. On one hand, more
importance will be attached to TEM, which may enhance its facilitative role in
English teaching and learning. On the other hand, the extensive social weighting of
TEM may lead to overly test-oriented English teaching and learning among under-
graduate English majors, which might be harmful.
1.4 TEM Test Characteristics vs. the Teaching Syllabus 13
1.4 TEM Test Characteristics vs. the Teaching Syllabus
Since the purpose of TEM is to evaluate English teaching and learning in the light
of the Teaching Syllabus, TEM test characteristics should comply with the Teaching
Syllabus. Therefore, this section will first introduce the Teaching Syllabus, and then
make a comparison between the Teaching Syllabus and TEM test characteristics.
1.4.1 The Teaching Syllabus
The Teaching Syllabus is designed to standardize education for undergraduate

English majors throughout China, which may serve as guidelines for the develop-
ment of English program, curricula and textbooks.
As stated in the Teaching Syllabus, the aim of college education is to cultivate
students with solid language knowledge and profound cultural awareness who can
work as teachers, translators, researchers and so on in the field of foreign affairs,
education, economy and trade, culture, etc. To meet this end, courses introduced in
undergraduate English program include three main categories: courses on language
skills (phonetics, grammar, listening, speaking, writing, translation, oral interpreta-
tion, fundamental English, advanced English, and so on), courses on general knowl-
edge of English (linguistics, English literature, English culture, and so on), and
courses on related knowledge (foreign affairs, economics and trade, law, and so on).
Compulsory courses recommended for the foundation stage and advanced stage are:
1. Foundation stage: fundamental English, phonetics, listening, speaking, reading,
writing, grammar, and culture of English speaking countries;
2. Advanced stage: advanced English, writing, oral interpretation, translation, lin-
guistics, British literature, American literature, and academic writing.
Based on the two stages, the requirements on English teaching and learning are
further divided into eight grades, one grade each semester, and English proficiency
is divided into ten subcategories: pronunciation, grammar, vocabulary, listening,
speaking, reading, writing, translation (including translation and oral interpreta-
tion), use of reference books (including dictionaries, encyclopedias, Internet
resources, and so on), cultural awareness (familiarity with Chinese culture and cul-
ture of English spoken countries). TEM4 and TEM8 correspond to the required
levels at the end of the fourth semester and eighth semester, and the ten subcatego-
ries outline the construct of TEM.
1.4.2 Comparison Between the Teaching Syllabus and TEM

A
Characteristics
TEM is designed to measure the implementation of the Teaching Syllabus, and the
four tests in the TEM battery assess nine out of the ten subcategories of language
abilities listed in the Teaching Syllabus. The subcategory of “cultural awareness”
has been assessed since 2005, when a new section of “General Knowledge” has
been included in TEM8. The subcategory of “use of reference books” is not included
because this skill aims at cultivating students’ ability to learn by themselves and do
research, which can be measured by the writing tasks of term papers and bachelor’s
degree degree paper. Table 1.5 lists the required language abilities in the Teaching
Syllabus and how they are measured in TEM.
From Table 1.5, it can be concluded that most language requirements in the
Teaching Syllabus are measured in TEM with several items and various test meth-
ods in their corresponding stages. Direct testing methods are employed to measure
writing, speaking, translation and oral interpretation. Listening, grammar and
vocabulary are assessed with various testing methods, both objective multiple-
choice questions and subjective methods of dictation, gap-filling, error-correction,
and so on. Pronunciation, though not tested directly in any item, is adopted as an
essential marking criterion which accounts for 20% of the composite score of
TEM4-Oral and TEM8-Oral. Reading and cultural awareness are the only two abili-
ties employing multiple-choice questions as the sole testing method.
From the above analysis, it can be inferred that TEM characteristics, generally
speaking, overlap with its construct stated in the Teaching Syllabus. However, pos-
sible mismatches lie in the following aspects:
1. Using multiple-choice questions as the sole testing method to measure language
abilities such as reading and cultural awareness will lead to construct-irrelevant
variance, which is caused by the possible guessing factor involved in multiple-
choice questions. Such variance will contaminate the interpretation of the test
score and may encourage test-taking technique training.
2. The small test population of TEM oral tests may lead to construct under-
representation. The number of TEM4-Oral and TEM8-Oral test takers is around
20,000 and 10,000 respectively, which is only 10% of the candidates of TEM4
and TEM8 written tests. However, speaking is a required ability in the Teaching
Syllabus, which should be of equal weighting as the other abilities. The lack of
oral tests for most candidates will perhaps lead to less importance attached to
speaking, and thus, the candidates’ low ability in speaking.
3. The misuse of TEM results is also a possible cause of construct irrelevant vari-
ance. As mentioned in Sect. 1.3.4, TEM results, as stated in the test syllabus, is
meant to measure the English proficiency of undergraduate English majors, to
find out whether they have met the required level specified in the Teaching
Syllabus. However, with its increasing social weighing, TEM scores are used by
some organizations and schools to recruit employees, rank schools and evaluate
teachers. These uses of TEM results have not been justified, and will possibly
lead to negative washback effects.
1.4 TEM Test Characteristics vs. the Teaching Syllabus 15
Table 1.5 A comparison between the teaching syllabus and TEM characteristics
Category Courses Stage TEM item Test method
Pronunciation Phonetics F TEM4-oral, A marking
criterion
TEM8-oral
Grammar Grammar, fundamental F TEM4: Grammar and Multiple-choice
English, advanced English vocabulary
A TEM8: Proofreading Error
identification and
correction
TEM4-oral, A marking
TEM8-oral criterion
Vocabulary Fundamental English, F TEM4: Grammar and Multiple-choice
advanced English vocabulary
A TEM8: Proofreading Error
identification and
correction
TEM4-oral, A marking
TEM8-oral criterion
Listening Listening, fundamental F TEM4: Dictation, Dictation,
English, advanced English listening multiple-choice
A TEM8: Mini-lecture, Note-taking and
listening gap-filling
Speaking Speaking, fundamental F TEM4-oral: Retelling, Retelling
English, advanced English presentation,
conversation
A TEM8-oral: Presentation
Presentation Conversation
Reading Reading, fundamental F TEM4: Reading Multiple-choice
English, advanced English A TEM8: Reading
Writing Writing, fundamental F TEM4: Note writing, Writing
English, advanced English composition writing
A TEM8: Writing
Translation Translation, oral F TEM8: Translation Translation (C-E,
interpretation, fundamental E-C)
English, advanced English A TEM8-oral: Oral Oral interpretation
interpretation (C-E, E-C)2
Culture English culture, fundamental F TEM8: General Multiple-choice
English, advanced English knowledge
A
Note: F=Foundation stage; A = Advanced stage; E-C = from English to Chinese; C-E = from
Chinese to English.
1.5 Summary
This chapter provides an overview of TEM, its context, history and characteristics.
Moreover, a comparison is made between TEM characteristics and the Teaching
Syllabus. Several key points are summarized below.
First, in the past 20 years, TEM has developed into a high-stakes national test
with extensive social recognition. The test results are used not only to examine the
implementation of the Teaching Syllabus, but also to recruit employees, rank
schools and evaluate teachers. Therefore, it is imperative to conduct a research on
the washback effects of TEM.
Next, there seems to be an endless list of TEM stakeholders, but in the educa-
tional system, TEM stakeholders can be roughly divided into three levels: policy
makers, Teaching Syllabus designers and TEM constructors in the national level,
the administrators in the school level and teachers and students in the classroom
level. These are the major participants of the current study.
Last, as a criterion-referenced test, TEM is not based on any set of textbooks or
some particular language theory, but adheres to the Teaching Syllabus which out-
lines the construct of TEM. A comparison between TEM characteristics and the
Teaching Syllabus shows that TEM characteristics generally overlap with its con-
struct, which may lead to positive washback. However, mismatch also exists in the
test content, method and use, which might be the source of negative impact.
Chapter 2
Washback: Definitions and Dimentions
2.1 Definitions of Washback
Tests influence teaching and learning. A test, if well designed and appropriately
employed, may not only help candidates develop positive attitude and strong moti-
vation for learning, but also provide them with timely information on their strengths
and weaknesses in learning, thus accelerating their learning process. Teachers can
also benefit from the test through the feedback it offers, which makes it possible for
them to adjust their teaching accordingly. However, if poorly designed, a test may
not only provide false information, but also lead teaching and learning astray. Such
influence is termed washback.
2.1.1 Washback or Backwash
In the field of applied linguistics, washback and backwash are both used and gener-
ally considered interchangeable (Alderson 2004; Hughes 2003). However, a search
in the dictionaries shows that “washback” is rarely seen; while only “backwash” can
be found, which means either “a backward flow of water, caused by an oar, wave,
etc.” or “the bad situation that remains after something bad has happened” (Longman
Dictionary of Contemporary English). The negative connotations of these two defi-
nitions seem to suggest that the effects of testing on teaching and learning are more
likely to be negative than positive. According to Spolsky (1996), the word “back-
wash” is often applied to unintended and negative effects, and until the 1980s it
generally carried a negative meaning in the field of applied linguistics.
However, the word “washback” has gained in currency and is now more com-
monly used in applied linguistics (Cheng et al. 2004; Green 2007; Hawkey 2006).
Attitudes towards it have also become neutral, with many studies aiming at promoting
beneficial washback (Alderson and Wall 1993; Gu 2007; Hughes 2003). In this sense,

https://doi.org/10.1007/978-981-13-1963-1_2
18 2 Washback: Definitions and Dimentions
the term “washback” will be used throughout in this dissertation, except when quot-
ing from other authors.
2.1.2 Various Terminology
In applied linguistics, washback is generally defined as the effect of testing on

teaching and learning, which can be harmful or beneficial (Hughes 2003). Messick
(1996: 241) further specified it as “the extent to which the introduction and use of a
test influences language teachers and learners to do things they would not otherwise
do that promote or inhibit language learning”.
The effect or influence can be so powerful that it may drive teaching and learning,
thus leading to the terms “teaching to the test” (Madaus 1988) and “measurement-driven
instruction” (Popham 1987). Consequently, “curriculum alignment” (Smith 1991;
Resnick and Resnick 1992) was proposed, because it is essential for the test content and
format to overlap with that of the curriculum, if a test is to drive teaching along the desir-
able direction. To make better use of the guiding role of testing in teaching, a new or
revised test can be introduced, with the aim that teaching and learning will improve
accordingly, which is termed “systemic validity” (Fredericksen and Collins 1989).
The above terms all focus on the role of testing in teaching, while “impact” tries to
differentiate the influence at the micro level from the macro level. Impact refers to the
influence of testing on society, educational systems and the individuals within these
systems; while washback occurs at the micro level, only affecting the individuals, mainly
teachers and students (Bachman and Palmer 1996; Hamp-Lyons 1998; McNamara
1996; Shohamy 2001). Later, impact was termed “consequence” in the Assessment Use
Argument developed by Bachman and Palmer (2010). In this regard, the term “wash-
back” shall be used in the present study, because the research mainly focuses on the
effects of TEM on English teaching and learning, which is in the micro level.
2.1.3 Washback and Validity
Washback is such an essential criterion for evaluating a test that some suggested it
should be considered an aspect of validity, coining the term “washback validity”
(Frederiksen and Collins 1989; Morrow 1986; Weir 1990). Messick (1989, 1996)
further specified it by proposing “six distinguishable aspects of construct validity”,
locating washback within the theoretical concept of consequential validity.
The consequential aspect of construct validity includes evidence and rationales for evaluat-
ing the intended and unintended consequences of score interpretation and use in both the
short- and long-term, especially those associated with bias in scoring and interpretation,
with unfairness in test use, and with positive or negative washback effects on teaching and
learning. (1996: 251)
2.2 Dimensions of Washback 19
However, this definition is disputed. Ferman (2004) suggested that it is difficult

to see a direct connection between washback and validity because validity is not a
property of the test, but rather the meaning of the test scores. Alderson and Wall
(1993) pointed out that the complexity of washback phenomenon makes it impos-
sible to be directly related to validity. Davies (1997: 335) shared the same view, “the
apparent open-ended offer of consequential validity goes too far…it is not possible
for a tester as a member of a profession to take account of all possible social
consequences.”
Messick (1996: 251) also recognized the complexity, since washback involves “a
function of numerous factors in the context or setting and in the persons responding
as well as in the content and form of the test”. Therefore, he advised that “rather
than seeking washback as a sign of test validity, seek validity by design as a likely
basis for washback” (1996: 252).
However, despite the conflicting views, there seems to be little disagreement
over the importance of investigating washback when evaluating a test.
In conclusion, the above discussion on washback definitions can be summarized
as follows:
1. Washback refers to the effects of testing on teaching and learning;
2. Washback can be so powerful that it may direct teaching and learning;
3. A test may affect not only teaching and learning, but also the educational system,
and even the society as a whole;
4. Washback is an essential aspect to consider when evaluating a test.
2.2 Dimensions of Washback
As many studies on washback attest, one of its key features is that it is highly com-
plex (Bailey 1996; Cheng 1997; Watanabe 1996, etc.). Therefore, to better analyze
this phenomenon, it is necessary to investigate washback from different dimensions,
so that we may look into its nature from various aspects. Based on the previous
research (Bachman and Palmer 2010; Green 2007; Hawkey 2006; Watanabe 2004),
six main dimensions are to be introduced here, namely, direction, extent, intensity,
intentionality, length and specificity.
2.2.1 Direction
Washback can be either positive or negative. A test, if well designed and appropri-
ately used, may facilitate teaching and learning, providing useful information and
generating strong motivation. This is positive washback. On the other hand, if
poorly designed and inappropriately used, a test may lead to negative washback.
Whether a test will generate positive or negative washback is the focus of most
washback studies, with many researchers proposing various ways to promote posi-
tive washback (Bailey 1996; Gu 2007; Hughes 2003; Kellaghan and Greaney 1992),
which can be summarized into the following three main aspects:
1. To improve the test, such as using direct and authentic testing, sampling widely,
reflecting the full curriculum, employing various testing formats, and so on;
2. To facilitate the stakeholders, ensuring that they have adequate knowledge and
resources to meet test demands, as well as taking a desirable attitude towards the
test;
3. To enhance communication between test designers and stakeholders, such as
providing timely test report, offering teacher training, and so on.
2.2.2 Extent
A test may influence students, teachers, school administrators, textbook writers, etc.
Therefore, in a washback study, it is important to first determine the research extent,
whether it is the school context or the educational system as a whole. Generally
speaking, the more important a test is, the greater its washback effects will be. A
classroom achievement test may affect a certain class only; while a high-stakes test,
such as a nation-wide selection exam, the National Matriculation Test, may affect
not only the teachers and students involved, but also the related parents, textbook
writers, and even the whole society.
Hence, in a washback study, the extent of the washback effects shall be investi-
gated, so that a better overview of the study can be achieved.
2.2.3 Intensity
Washback can be either strong or weak. If the impact is strong enough a lot of class-
room teaching activities can be determined by it, such as what to teach and how to
teach, even stakeholders’ attitude to teaching and learning. The intensity of wash-
back is often related to the stakes of a test; that is, the more important the test, the
more intense its washback. Undoubtedly, an important test like the National
Matriculation Test will affect teaching and learning more than a classroom achieve-
ment test. Therefore, a test with more intense washback tends to attract more atten-
tion from the researchers and most of the washback studies are conducted on
high-stakes tests like TOEFL, IELTS and National Matriculation Test (Cheng 2005;
Green 2007; Alderson and Hamp-Lyons 1996; Hawkey 2006; Qi 2004; Wall 2005;
Watanabe 1996).
2.2 Dimensions of Washback 21
2.2.4 Intentionality
Washback can be either intended or unintended. Intended washback refers to the

effects that test designers can predict or want to encourage. For example, the
intended washback effects of National Matriculation Test are to motivate students to
learn, to provide feedback information, and so on. Moreover, if some new test items
are introduced, the intended washback may be to improve teaching and learning
with the guiding effects of the test. However, a test may also bring about some unin-
tended effects, such as the anxiety among students, long-term test-taking technique
training, etc.
Therefore, in a washback study, extra efforts should be made to investigate the
unintended washback that may have occurred in actuality.
2.2.5 Length
Washback may exist for a short time or last for a long term. The effects like motivat-
ing students to learn may disappear shortly after the test is over, while the learning
habits developed when preparing for the test may last a life time. Thus, longitudinal
study should be carried out to investigate not only the immediate effects of a test,
but also its influence in the long run.
2.2.6 Specificity
Washback can be either general or specific. General washback refers to the effects
that may be generated by any test, such as motivating students to learn and provid-
ing feedback information; while specific washback relates only to a certain test or
test type, such as the washback effects of multiple-choice phonetics questions.
Therefore, when investigating washback, attention should be paid to exploring the
specific effects of a test, apart from reviewing the general washback.
Among the dimensions listed above, the most frequently studied is the nature of
washback, whether it is positive or negative; while the rest of the dimensions are
seldom looked into. However, they can also provide researchers with valuable infor-
mation and a fresh insight into the role of testing in teaching and learning. Therefore,
to view washback from different dimensions may be a necessary supplement to
washback definition and an important step forward in washback studies, which shall
contribute to the improvement and systematization of research in this field.
2.3 Summary
This chapter has reviewed the definitions and dimensions of washback effects,
which may suggest that different dimensions shoud be investigated in a washback
study. It is essential to look into its direction, whether it is positive or negative, or
whether its positive washback outweighs its negative washback. However, besides
direction, other dimensions can also be explored, such as whether a test has realized
its intended washback, whether the washback effects will last a long time or disap-
pear when the test is over, and so on. Therefore, in a washback study, more efforts
can be made to probe into the other dimensions of washback effects, to obtain a
more comprehensive picture of the phenomenon.
Chapter 3
Theoretical Frameworks for Washback
Studies
3.1 Frameworks for Washback Studies
The early 1990s witnessed a great step forward in washback studies—a shift from
making assertions to establishing theoretical frameworks (Gu 2007). Alderson and
Wall (1993) were the first to develop a framework—Fifteen Washback Hypotheses.
Following them, many other scholars have proposed different models. Here, some
of the significant ones will be analyzed in detail, namely, Hughes’ (1993) Washback
trichotomy, Bailey’s (1996) basic model of washback and Green’s (2007) model of
washback, incorporating intensity and direction.
3.1.1 Alderson and Wall’s Washback Hypotheses
In “Does Washback Exist?”, Alderson and Wall (1993) provided a thorough and
comprehensive framework to illustrate the washback effects of language testing.
Based on the most general Washback Hypothesis “a test will influence teaching”,
they further developed fourteen hypotheses:
1. A test will influence teaching.
2. A test will influence learning.
3. A test will influence what teachers teach; and
4. A test will influence how teachers teach; and therefore by extension from 2 above:
5. A test will influence what learners learn; and
6. A test will influence how learners learn.
7. A test will influence the rate and sequence of teaching; and
8. A test will influence the rate and sequence of learning.
9. A test will influence the degree and depth of teaching; and
10. A test will influence the degree and depth of learning.
11. A test will influence attitudes to the content, method, etc. of teaching and learning.
12. Tests that have important consequences will have washback; and conversely
13. Tests that do not have important consequences will have no washback.

https://doi.org/10.1007/978-981-13-1963-1_3
24 3 Theoretical Frameworks for Washback Studies
14. Tests will have washback on all learners and teachers.

15. Tests will have washback effects for some learners and some teachers, but not for
others. (Alderson and Wall 1993: 120–121. original text in bold as indicated.)
The hypotheses proposed by Alderson and Wall provide us with a detailed
description of washback effects, clarifying this vague concept. When classifying the
Fifteen Washback Hypotheses, Alderson and Wall tried to make a distinction
between teaching (1, 3, 4, 7, 9 and 11) and learning (2, 5, 6, 8, 10 and 11), content
(3 and 5) and method (4 and 6), and practice (3–10) and attitude (11). Furthermore,
the last four hypotheses (12–15) were raised to acknowledge differences between
tests and participants.
According to Alderson and Hamp-Lyons (1996: 282), the purpose for raising
these hypotheses is not to argue for or against any of them, but to “lay out the terri-
tory” so that the questions for further studies on washback can be more clearly seen.
Moreover, these Fifteen Washback Hypotheses constitute an open and adaptable
framework, since it is highly likely that the research context is more complex than
the fifteen hypotheses proposed, so the nature of the test and the educational context
should be taken into account (Alderson and Wall 1993).
Therefore, in their 1996 article, Alderson and Hamp-Lyons (1996: 296) revisited
the Washback Hypotheses, suggesting an expansion of the Fifteenth Washback
Hypotheses, “Tests will have different amounts and types of washback on some
teachers and learners than on other teachers and learners.” Moreover, the amount
and type of washback will vary according to the status of the test, the extent to
which the test is counter to current practice, the extent to which teachers and text-
book writers think about appropriate methods for test preparation and the extent to
which teachers and textbook writers are willing and able to innovate.
3.1.2 Hughes’ Washback Trichotomy
While Alderson and Wall (1993) tried to “lay out the territory” of washback ques-
tions to be addressed in future research, Hughes’ (1993) attempt was to construct a
model of washback mechanism. Hughes (1993: 2) proposed, “In order to clarify our
thinking on backwash, it is helpful, I believe, to distinguish between participants,
process and product in teaching and learning, recognizing that all three may be
affected by the nature of a test”.
According to Hughes (1993:2), participants are “all of whose perceptions and
attitudes towards their work may be affected by a test”, such as teachers, students,
school administrators and material writers. Process refers to “any actions by the
participants which may contribute to the process of learning”, such as syllabus
design, material development, the selection of teaching content and methods, etc.
Last, product is “what is learned (facts, skills, etc.) and the quality of learning (flu-
ency, etc.)”. He further states:
The trichotomy into participants, process and product allows us to construct a basic model
of backwash. The nature of a test may first affect the perceptions and attitudes of the
3.1 Frameworks for Washback Studies 25
p articipants towards their teaching and learning tasks. These perceptions and attitudes in
turn may affect what the participants do in carrying out their work (process), including
practicing the kind of items that are to be found in the test, which will affect the learning
outcomes, the product of that work. (1993:2)
Besides, Hughes added that the following five conditions were necessary for the
washback effects to occur:
1. Success on the test must be important to the learners;
2. Teachers must want their learners to succeed;
3. Participants must be familiar with the test and understand the implications of its
nature and content;
4. Participants must have the expertise which is demanded by the test (including teach-
ing methods, syllabus design and materials writing expertise);
5. The necessary resources for successful test preparation must be available. (1993:
2–3)
3.1.3 Bailey’s Basic Model of Washback
Combining Alderson and Wall’s Washback Hypotheses with Hughes’ Washback

Trichotomy, Bailey (1996) put forward a basic model of washback.
In the basic mode, Bailey not only listed the possible participants and products,
but also illustrated the direction of influence. Washback effects are from tests to
participants, and then, to products; while “washforward” effects are possible influ-
ences from the participants on the test. Moreover, she further distinguished between
“washback to the learners” and “washback to the program”, with “the program”
covering not only teachers, but all the other participants apart from the students.
“Washback to the learners” refers to the effects of “test-derived information pro-
vided to the test-takers and having a direct impact on them” (Bailey 1996: 263),
which includes five of Alderson and Wall’s Washback Hypotheses (2, 5, 6, 8 and
10). Moreover, Bailey (1996: 264) listed 10 processes that might be involved, such
as practicing items similar to those on the test and applying test-taking strategies.
On the other hand, “washback to the programme” is “results of test-derived infor-
mation provided to teachers, administrators, curriculum developers, counselors,
etc.” (1996: 264), which are directly related to Washback Hypotheses 1, 3, 4, 7 and
9. However, no processes are specified, maybe because of the complicated nature of
the programme, which seems to cover an unlimited range.
3.1.4 Green’s Washback Model
In his washback study on IELTS, Green (2007) proposed a model, incorporating

two dimensions of washback—direction and intensity.
In his model, Green pointed out that washback direction is mainly determined by
the quality of the test, “the more closely the characteristics of the test reflect the
focal construct as understood by course providers and learners (the greater the over-
lap), the greater the potential for positive washback” (2007: 14). Moreover, test use
and test stakes, or the perceived consequences of test scores will also affect the
direction of washback.
Meanwhile, test stakes and test difficulty, as perceived by the stakeholders, will
influence the intensity of washback, with important and challenging (but attainable)
tests exerting the most intense washback.
However, the washback effects of the same test may vary from participant to
participant because of their different experiences and beliefs, such as their knowl-
edge and understanding of test demands, resources to meet test demands and accep-
tance of test demands.
In sum, from the above models, it can be concluded that when investigating
washback, three questions are to be answered—what, how and why. First, what will
be affected by a test? According to Alderson and Wall (1993), a test will influence
both teaching and learning, such as the content and method, the rate and sequence,
and the degree and depth. Next, how will washback effects work? According to
Hughes (1993) and Bailey (1996), the washback mechanism consists of three
parts—participants, processes and products. The participants, such as teachers, stu-
dents, school administrators and material writers, will first be affected by the test—
their perceptions of and attitudes to the test will influence their concept of language
teaching and learning. Consequently, the teaching and learning processes will
change, and the learning products will differ. Last, why would a test exert washback
effects? According to Green (2007), the direction of washback is related to the over-
lap between test construct and test characteristics; while the intensity of washback
effects is determined by test stakes and test difficulty. Moreover, the washback
effects of the same test may vary from participant to participant because of their
different experiences and beliefs. In a word, washback effects are determined by
both the test, such as its validity, stakes and difficulty, and the stakeholders, such as
their values and characteristics.
3.2 Empirical Studies on Washback
Although washback has long been a concept under discussion (Cronbach 1963;
Fredericksen 1984; Latham 1877), empirical studies on the washback effects of
language testing have only a short history of no more than 30 years. Since the late
1980s, washback has gained importance in language testing and there has been a
rapid increase in the number of empirical studies conducted. As a result, Language
Testing, a renowned academic journal in the field, published a special issue on
washback effects in 1996, reporting the latest developments in washback theory and
practice at that time, such as Alderson and Hamp-Lyons’ washback study on
TOEFL, Watanabe’s study on the University Entrance Exam in Japan, Wall’s study
on O-Level exam in Sri Lanka and Shohamy’s study on Arabic and English tests in
Israel. Moreover, in the 2000s, Cambridge University Press published four books on
3.2 Empirical Studies on Washback 27
washback studies (Cheng 2005; Green 2007; Hawkey 2006; Wall 2005), which
indicates that research on washback has advanced into a new era, becoming system-
atic and comprehensive.
In mainland China, the first washback study on language testing might be the one
conducted by Li (1990) on Matriculation English Test (MET). In the next two
decades, more research has been done on high-stakes nation-wide tests, such as
MET, CET and TEM, with CET being the most widely investigated, as indicated by
a search in the CSSCI1 journals (Cai 2009; Jin 2000, 2006; Xiao et al. 2014; Ye
1998; etc.). Moreover, two books (Gu 2007; Qi 2004) have been published on CET
washback and MET washback respectively. Comparatively, TEM washback is less
studied, with only three articles (Qin 2012; Xu 2012; Zou and Xu 2014) published
in CSSCI journals till the end of 2015. Therefore, it is necessary to conduct more
empirical research on TEM washback, to better understand the role of TEM in
English teaching and learning, so that positive effects can be achieved.
3.2.1 Summary of Empirical Studies
To better clarify the research content and process of washback studies, some of the
large-scale and influential empirical studies on the washback of language testing are
to be introduced, with the following table summarizing the basic information such
as research participants, methods and findings. The following six studies are selected
because books have been published on them, providing detailed information on the
implementation of these studies.
From Table 3.1, it can be concluded that most of the tests studied are large-scale,
high-stakes tests, which mainly fall into two categories—selection tests and profi-
ciency tests.
1. Selection tests: NMET (National Matriculation English Test), HKCEE (Hong
Kong Certificate of Education Examination in English), O-level English Exam.
These tests play such important roles as determining whether candidates will be
allowed to enter higher education.
2. Proficiency tests: IELTS (International English Language Testing System),
PL2000 (PL2000 is a reform project to encourage external certification of learn-
ers’ language proficiency through the examinations of providers such as
Cambridge ESOL), CET (College English Test). These are large-scale profi-
ciency tests, either national or international, which assess candidates’ language
proficiency so that important decisions can be made.
1
Eleven journals on linguistics (foreign language) included in CSSCI (2014–2015) are searched,
which are Foreign Language Teaching and Research, Foreign Language World, Journal of Foreign
Languages, Modern Foreign Languages, Chinese Translators Journal, Foreign Languages in
China, Foreign Language Research, Foreign Language Education, Foreign Languages and Their
Teaching, Computer-Assisted Foreign Language Education, and Foreign Language Learning
Theory and Practice.
28
Table 3.1 Overview of the empirical studies on washback

Author Test Participants Methods Research findings
Qi 2004 NMET in Test constructors; Interviews; classroom NMET has largely failed to achieve its intended washback effects;
China English inspectors; observations; questionnaires; A high-stakes test is a powerful spur to trigger pedagogical
teachers; students document analysis responses, but it tends to be ineffective to bring about desirable
changes.
Cheng Revised Policy makers; textbook Questionnaires; classroom The revision of a test will lead to efficient washback on teaching
2005 HKCEE in publishers; teachers; observations; interviews; materials;
Hong Kong students document analysis Teaching content will be affected, but teachers’ attitudes and
behaviors will not change much;
3
A change in the test only can hardly fulfill the intended goal of
improving teaching and learning.
Wall O-level test in Teachers; teacher Interviews; questionnaires; Tests have washback on what teachers teach, but not how they
2005 Sri Lanka advisors classroom observations; teach, and there is both positive and negative washback;
document analysis Many factors contribute to or inhibit washback, implying that the
nature of curricular innovation is much more complex.
Hawkey IELTS and Teachers; students; Questionnaires; interviews; IELTS is perceived as a reasonable direct communicative test, using
2006 PL2000 in Italy receiving institution classroom observations; authentic and appropriate content; it is hard but fair and reading and
administrators; school textbook analysis writing are more difficult modules; it is a high-stakes test, causing
heads; parents both anxiety and motivation;
PL2000 appears to have positive effect on communicative teaching
from teachers’ views, but students’ views differ;
It has positive washback on motivation and attitudes, assessment,
exam candidacies and performances;
It helps school management and provides teachers with professional
support.
Theoretical Frameworks for Washback Studies
Green IELTS Course providers; Questionnaires; interviews; Discrepancies exist between IELTS test and participants’ beliefs
3.2
2007 teachers; students classroom observations; about academic writing;

document analysis; tests IELTS preparation courses include a narrower range of academic
writing tasks than other unrelated courses;
Learners can profit from giving attention to IELTS preparation
activities, but the additional benefit is limited.
Gu CET in China Conference participants Questionnaires; interviews; Most CET stakeholders applaud the test, but some problems are
2007 (government officials, classroom observations; mentioned, such as the overuse of MC format, the lack of direct
researchers, publishers, document analysis; test score reports to the teachers, and so on;
etc.); teachers; students performance analysis College English classroom teaching and learning are mainly regular
rather than test-oriented, and most teachers use textbooks and other
teaching materials;
Empirical Studies on Washback
The coaching materials do not have reliability and validity, and

test-oriented practice leads to negative effects on teaching and
learning;
CET writing and speaking tests have exerted positive washback on
teaching and learning;
CET is an effective means of measurement, but it is just one of the
factors that influence college English teaching and learning.
29
Washback studies are often carried out on these large-scale high-stakes tests
because they tend to exert great impact on language teaching and learning.
Qi (2004) investigated the washback effects of NMET, one of the three compul-
sory tests (the other two are Chinese and mathematics) of the college entrance
examinations in China. As a nation-wide high-stakes test, NMET is expected to
induce intended washback effects of reforming teaching in secondary schools, shift-
ing its focus from language form to language use. In her study, Qi used interviews,
questionnaires and classroom observations to collect data from 8 test constructors,
6 English inspectors, 388 teachers and 986 students. The research findings indicated
that NMET failed to achieve its intended washback because of the overwhelming
functions it was expected to perform—the function of selection imposed constraints
on test design and development, and the function of evaluation boosted test stakes
and stimulated intensive test preparation. As a result, Qi came to the conclusion that
“tests whose stakes are too high will fail to exert the intended washback effect
although high-stakes tests have the potential to do so” (2004: 297).
Cheng (2005) examined the washback of the revised HKCEE, which was meant
to cause changes in classroom English teaching in secondary schools. She collected
data both before and after the revised test was introduced (baseline and follow-up
data), so that a comparison could be made. In her study, Cheng developed a three-
level working model, which was carried out in three phases, investigating (1) deci-
sion making agencies from the top, such as HKEA (Hong Kong Examinations
Authority); (2) intervening agencies such as textbook publishers and tertiary institu-
tions; and (3) implementing agencies such as teachers and students in secondary
schools. The results show that the revision of a test will have washback on the teach-
ing content, but teachers’ attitudes and behaviors will not change much; therefore
the change in the test can hardly fulfill the goal of educational innovation and what
actually gets taught is far more complicated than what a test can determine.
Wall (2005) explored the washback effects of a new O-level English exam in Sri-
Lanka, which, together with a new textbook series, was introduced to bring about
innovations in English classrooms. Both baseline and follow-up data were collected
through interviews and classroom observations. After careful comparison, Wall
came to the conclusion that what teachers teach will be affected more than how
teachers teach.
In his book, Hawkey (2006) reported two washback studies, one on IELTS and
the other on PL2000. The IELTS impact research is part of the IELTS validation
systems, which will contribute to the monitoring and test development process of
IELTS. Interviews, questionnaires and classroom observations were conducted
among candidates, teachers and receiving institution administrators, which led to
generally favorable conclusions: IELTS is perceived as a reasonable direct commu-
nicative performance test, using authentic and appropriate content; IELTS is fair,
but hard, especially Reading and Writing module; IELTS is a high-stakes test, caus-
ing motivation and moderate anxiety. The other washback study was conducted on
PL2000, a language education reform project in Italy, which aimed to provide for-
eign language education in state schools to meet the communication and certifica-
tion needs of students. One of its key policies was to encourage the external
3.2 Empirical Studies on Washback 31
certification of learners’ language proficiency through the examinations of provid-

ers such as Cambridge ESOL. Thus, Cambridge ESOL was encouraged to carry out
a PL2000 impact study, collecting data from students, teachers, parents, and so on,
to hear their perception and evaluation of this project. The research findings are
mainly positive: PL2000 appears to have positive effects on motivation and atti-
tudes, communicative teaching, school management, etc., although students’ views,
to some extent, differ from teachers’ views on communicative teaching.
Green (2007) focused on the washback of IELTS on academic writing. In his
study, Green compared IELTS preparation courses and unrelated academic writing
courses, through interviews, questionnaires, classroom observations and several
tests administered at course entry and exit. The research findings show that discrep-
ancies exist between IELTS writing test and participants’ beliefs about academic
writing, and IELTS preparation courses seem to include a narrower range of aca-
demic writing tasks than other courses. Besides, although learners can profit from
giving attention to IELTS preparation activities, the additional benefit is limited.
Gu (2007) discussed the washback effects of CET on college English teaching
and learning. A wide range of stakeholders (about 4500 in total), such as
administrators, teachers and students, were investigated, through various research
methods of interviews, questionnaires and classroom observations. The research
findings are mainly positive: most of the stakeholders think highly of the test,
believing that the positive washback is much greater than the negative washback,
and besides, the negative washback is primarily due to the misuse of test by test
users. At the end of the study, various suggestions are made to teachers, students,
administrators and test constructors, to promote positive washback of CET.
3.2.2 A Critique of Approaches in Washback Studies
From the above studies, it can be concluded that washback studies are mostly large-
scale, multi-method, multi-phase and longitudinal.
Washback studies are large-scale ones because the tests investigated are often
large-scale and high-stakes tests which affect a wide range of stakeholders.
Therefore, in terms of participants, all the above studies look into washback on
teachers or students, since they are the major stakeholders. Besides, many other
stakeholders are investigated, such as test constructors, school administrators, text-
book writers and parents. As a high-stakes test, the range of its stakeholders seems
to be infinite, so the selection of participants is a vital problem to solve, which is
mainly determined by the research purpose and practicality. For instance, if inten-
tionality of washback is to be explored, it is necessary to investigate not only teach-
ers and students, but also test constructors, so that both the intended and unintended
washback can be identified.
Various methods have been employed in washback studies, among which ques-
tionnaires, interviews and classroom observations are the most frequently used.
Since washback is a complex phenomenon, it is essential to obtain both quantitative
and qualitative data, to ensure the validity and reliability of the research. Moreover,
document analysis is often conducted, with documents covering related research,
syllabuses, textbooks, and so on. Besides, relevant tests can be designed to compare
the outcome of test-preparation courses and non-preparation courses, just as what
Green (2007) did in the washback study on IELTS. Meanwhile, it should also be
noted that the data collection instruments, such as language tests and interview
questions, must be validated. When designing the instrument, approaches like
brainstorming, expert opinion, reporting and reviewing, piloting and trialing should
be carefully planned and meticulously conducted. Moreover, validation techniques
can also be employed when processing the data collected, such as factor analysis,
multi-trait multi-method validation, triangulation of data collected from different
sources or by different instruments, and so on.
Washback studies are often multi-phase and longitudinal research because they
involve a wide range of stakeholders and employ various research methods. Besides,
it takes time for some washback effects to emerge. For instance, in Cheng’s (2005)
research, a three-phase study was developed, to investigate policy makers, textbook
writers, teachers and students step by step. Moreover, a baseline study had been
conducted to collect data before the revised test was introduced, and then more
research was done afterwards. Therefore, it is essential to conduct a diachronic
study on washback, to explore its causes, working mechanisms and evolving effects.
As for research findings, different conclusions have been proposed, with differ-
ent research foci in each study. In Qi’s (2004) research, it is concluded that high-
stakes tests can hardly realize their intended washback because of the overwhelming
functions they are expected to perform; while in Gu’s (2007) study, the conclusion
is that positive washback of CET far outweigh its negative washback and the nega-
tive washback is mainly caused by the misuse of the test. Meanwhile, similar find-
ings have been reported. In many studies (Cheng 2005; Qi 2004; Wall 2005), it is
found that a test will have washback effects on the teaching content, but can hardly
affect the teaching method. Another common conclusion is that washback is a com-
plex issue which varies from participant to participant and which is mediated by
many other factors apart from test design and test quality.
To sum up, washback study is comprehensive and systematic, which can be con-
ducted either in the micro context (the school setting) or in the macro context (the
educational system or the society), and which involves various factors, such as the
test factor (test content, test use, etc.), the prestige factor (test stakes) and the per-
sonal factor (characteristics of teachers, students, etc.). Therefore, it is impossible
and impractical to develop an inclusive approach which may cover all aspects of
washback. In this sense, when planning a washback study, it is important to define
the research purpose first, and then relevant participants and methods can be
selected. The research findings, whatever they are, may contribute to establishing a
washback framework.
3.3 Summary 33
3.3 Summary
This chapter has reviewed related literature on washback models and empirical
studies, and several key points in washback studies have emerged in the analysis,
which are summarized as follows:
1. To have a clear research purpose. A test may affect not only the classroom set-
ting, but also the educational system, even the society as a whole. Therefore, a
washback study may involve students, teachers, school administrators, textbook
writers, policy makers… It is impractical and impossible to investigate all the
stakeholders, so a clear research purpose is essential. Once a research purpose is
defined, whether to investigate washback in the micro or macro context, the rel-
evant research participants can be selected, and then the research methods. In
this sense, the research will be more emphasized.
2. To investigate different dimensions of washback. Almost all washback studies
focus on its direction, such as whether it is positive or negative, or whether its
positive washback outweighs its negative washback. However, apart from direc-
tion, other dimensions can also be explored, such as whether a test has realized
its intended washback, and whether the washback effects will last a long time.
Therefore, in a washback study, efforts can be made to probe into various dimen-
sions of washback effects, to obtain a more comprehensive picture of the
phenomenon.
3. To draft a practical washback model. Quite a few washback models have been
proposed, such as Wall’s Washback Hypotheses and Bailey’s Basic Model.
However, it is undecided as to which one can best depict the washback phenom-
enon and illustrate its working mechanism. It still remains a challenging issue in
washback studies, since washback is such a complex phenomenon that it seems
impossible to draft a model to cover every aspect of it. Nevertheless, a theoretical
model is essential. Therefore, it is necessary to draft a washback model with
focused aims to guide the empirical research.
4. To employ appropriate research methods. A multi-method, multi-phase and lon-
gitudinal approach will be adopted to investigate the washback effects of
TEM. Large-scale questionnaires can be conducted to collect quantitative data
from a wide range of stakeholders, and interviews and classroom observations
can be conducted as case study to collect qualitative data. Moreover, data col-
lected in the previous phase may assist the development and implementation of
research in the coming phases. In this sense, data collected can be triangulated
and data collection instruments can be validated.
In conclusion, a theoretical framework should be developed to guide the present
study, taking the above four problems into consideration.
Chapter 4
Research Framework for TEM Washback
Study
4.1 Framework for TEM Washback Study
Based on the above theoretical models and empirical studies, a TEM washback
model (Fig. 4.1) is drafted, to illustrate its causes, contents and mechanisms.
This model, as shown in Fig. 4.1, incorporates Bailey’s basic model and Green’s
model, which demonstrates basic factors that may contribute to the generation of
washback effects, as well as the working mechanism of TEM washback. The model
will be explained in detail in the following part.
4.1.1 Contributing Factors to TEM Washback
From TEM washback model, it can be concluded that TEM washback is mediated
by the following three factors:
1. The overlap between test construct and test characteristics. The test construct of
TEM is stated in its test syllabus, and the test characteristics including test con-
tent, method, test result interpretation and use, etc. The overlap between test
construct and test characteristics may contribute to the direction of washback—
the greater the overlap, the more likely a test will exert positive washback.
Conversely, negative washback will be generated, if a test fails to test what is
stated in its construct (the part to the left of the “overlap”, e.g. the oral part is
missing in some English tests which claim to measure all the four language skills
of listening, speaking, reading and writing), or if a test tests something not stated
in its construct (the part to the right of the “overlap”, e.g. an English reading
exam tests candidates’ knowledge in a specific field, such as law or commerce);
2. Test stakes and test difficulty. These may contribute to the intensity of washback,
with important and difficult, but still attainable tests exerting the most intense
washback. In this sense, washback of TEM4 and TEM8 may be different, and

https://doi.org/10.1007/978-981-13-1963-1_4
36 4 Research Framework for TEM Washback Study
TEM Focal TEM Test

Construct
Overlap Characteristics
Positive
Negative
Negative
stakes difficulty
Material writers
PARTICIPANTS
Curriculum designers
Students Teachers Researchers Experts
School administrators
Etc.
What What What What and What and

PROCESSES
and and and how to how to

how to how to how to research develop
learn teach design
PRODUCTS
Learning Teaching New materials Research New syllabuses

outcome effect New curricula results New approaches
Etc. Etc.
Fig. 4.1 TEM washback model
TEM washback may vary from school to school, because of the difference
between tests and the gap between schools.
3. Participants’ perceptions and characteristics. Washback effects of TEM vary
from participant to participant because of their different perceptions of the test
and their unique characteristics. Participants differ in how they perceive test dif-
ficulty, test importance, test purpose, test use, and so on. Moreover, participant
characteristics and values also differ, such as their previous knowledge and
beliefs, available resources, and so on.
4.1.2 TEM Washback Mechanism
This model (Fig. 4.1) combines Wall’s Washback Hypotheses with Bailey’s basic
model, illustrating both the content and the mechanism of TEM washback.
4.1 Framework for TEM Washback Study 37
TEM will first influence its stakeholders, whose perceptions of English teaching
and learning will change accordingly. The stakeholders include teachers, students,
school administrators, textbook writers, researchers, and so on. Here, a new cate-
gory is added—the experts. Experts in TEM refer to TEM test designers, construc-
tors and the Teaching Syllabus designers. It is important to investigate TEM experts
because they are familiar with TEM, its implementation, related policies and the
intended washback. Therefore, their opinions can provide us with an illuminating
insight into TEM and its washback (refer to Sect. 1.1 for more information). Of
course, the stakeholders can be more than those listed in the model, if impact in the
macro level is to be investigated.
Under the influence of TEM, the stakeholders’ learning, teaching, and designing
processes will change, yielding different products. Moreover, every category of par-
ticipants, processes and products may influence each other, as indicated by the
arrows. For instance, teachers’ teaching can be affected by the textbooks, the
research findings and the new syllabuses. Besides, the participants, processes and
products may influence the test, leading to test revision and reform, which is termed
“washforward” by Van Lier (1989).
4.1.3 Several Terms Redefined
To better apply this model to TEM washback study, it is necessary to clarify a few
terms, namely, TEM test construct, TEM test characteristics, overlap, and positive
and negative washback.
4.1.3.1 TEM Test Construct
Test construct is the trait or traits that a test is intended to measure, which, in the
case of a language test, refers to the language ability to be measured (Davies et al.
1999). As for TEM, its construct is stated in its test syllabuses, which include
Syllabus for TEM4, Syllabus for TEM8, Syllabus for TEM4-Oral and Syllabus for
TEM8-Oral, since TEM is composed of these four separate tests. Moreover, lan-
guage abilities stated in the Teaching Syllabus should also be included when defin-
ing TEM construct, since the test purpose of TEM is to examine whether candidates
have met the required levels of English abilities specified in the Teaching Syllabus
(Jin and Fan 2011).
4.1.3.2 TEM Test Characteristics
Test characteristics refer to the test content, method, marking, as well as the use and
interpretation of test results. In this sense, TEM test characteristics include its test
contents (listening, reading, writing, speaking, translation, grammar and vocabu-
lary, and so on), test methods (item types used, such as multiple choice, blank
filling, error correction, and so on), marking (rated by machines or by human

beings, using holistic marking or analytic marking, etc.), and use and interpretation
of test results (whether TEM results are related to employment, university ranking,
and so on).
4.1.3.3 Overlap
Overlap means the extent to which the test characteristics can reflect the test con-
struct; that is, whether TEM test can elicit performance which can reflect the English
ability stated in the Syllabuses. For example, it is stated in both the Teaching
Syllabus and the Test Syllabus that candidates should be able to follow news broad-
cast from native countries, and in both TEM4 and TEM8, news broadcast is a part
of listening comprehension. In this sense, TEM characteristics overlap with TEM
construct. However, if test characteristics and test construct do not match, it would
be termed either “construct under-representation” or “construct-irrelevant variance”
(Messick 1996). Construct under-representation means that some language abilities
in the test construct are not measured. For instance, speaking is a language skill
listed in the Teaching Syllabus, but only a small fraction of the candidates are eli-
gible to take the oral test, so this skill in TEM construct is under represented in the
test. On the contrary, construct-irrelevant variance refers to the problem that test
characteristics fail to reflect test construct. For instance, in a reading test, construct
irrelevant variance may be caused by background or cultural knowledge. The item
which tests candidate’s cultural knowledge is irrelevant to the reading ability to be
measured, and will lead to misinterpretation of the test scores.
The overlap and mismatch between TEM construct and TEM characteristics
would, to some extent, determine whether the test will exert positive or negative
washback.
4.1.3.4 Positive and Negative Washback
The direction of washback, whether it is positive or negative, is considered the most

important issue in washback studies, and promoting beneficial washback is the ulti-
mate goal of most empirical studies on washback. However, what is positive wash-
back and what is negative washback? General criteria have been offered, such as
whether the effects are beneficial or damaging to the development of learners’ abili-
ties (Green 2007), or whether it encourages or inhibits the attainment of educational
goals held by learners, educators, or both (Bailey 1996). However, no one has devel-
oped a set of clear-cut and practical criteria to distinguish positive and negative
washback (Qi 2011). The possible difficulty lies in the fact that it is a highly subjec-
tive matter—“the distinction between positive and negative could usually be made
only by referring to the audience” (Watanabe 2004: 21). What is considered positive
washback by the teachers may be considered negative by the experts.
4.1 Framework for TEM Washback Study 39
Therefore, in the current study, two principles will be followed when judging the
direction of washback effects. First, participants’ evaluation of washback will be
sought separately. Students, teachers, school administrators and experts will all be
investigated about their opinions on the washback direction in their respective ques-
tionnaires, and in classroom observations, teachers’ and students’ behavior will be
analyzed separately, to determine the different washback on them. Moreover, a dis-
tinction will be made between positive washback in the micro context and in the
macro context. Positive washback in the micro context refers to immediate but
short-term benefits, such as improving candidates’ test performance, enhancing
efficiency of teaching and learning, and so on; while positive washback in the macro
context will last a long time, beneficial to improving candidates’ abilities. For
instance, teachers’ moderate use of the mother tongue may help clarify the testing
demands more efficiently, so it can be considered positive in the micro context;
however, the excessive use of mother tongue, more than 10–15% as suggested by
Macaro (2005), may impede the development of learners’ language ability, so it is
negative in the macro context.
4.1.4 Research Design
As a high-stakes test, TEM will affect not only teachers and students, but also the
educational system, and even the society as a whole. The current study will focus on
the washback effects of TEM on college education for English majors, so only the
stakeholders in the educational system are investigated, while stakeholders like
employers and parents are not included.
As outlined in the TEM washback model (Fig. 4.1), the two major aspects to be
investigated are contributing factors to TEM washback and TEM washback mecha-
nism, and the participants involved are students, teachers, school administrators
(material writers, curriculum designers, etc.), researchers and experts.
In reference to the TEM washback model, a muti-phase and multi-method
research design has been developed to investigate the washback effects of TEM on
experts, school administrator, teachers and students, to explore their perception and
evaluation of TEM, as well as the impact of TEM on their teaching or learning prac-
tice. Moreover, various research methods will be employed, such as document anal-
ysis, questionnaire surveys, classroom observations and interviews.
As indicated in Table 4.1, the current research can be conducted in the following
stages. Firstly, such documents as the Teaching Syllabus, TEM syllabuses, and
research findings on washback and TEM will be analyzed. The research findings of
TEM studies and washback studies are summarized in Sects. 1.2.4 and 3.2 respec-
tively and in Sect. 1.4 a comparison is made between the Teaching Syllabus and
TEM characteristics. Next, in the second stage, large-scale questionnaire surveys
are conducted among experts, program administrators, teachers and students, to col-
lect their basic information and to investigate their perception and evaluation of
Table 4.1 A multi-phase, multi-method research design for TEM washback study
Focus Instruments
TEM-related researches Document analysis
The Syllabus and TEM Syllabus Document analysis
Candidates Questionnaire survey among candidates
Classroom observation
Interview
Teachers Questionnaire survey among teachers
Classroom observation
Interview
Administrators Questionnaire survey among administrators
Interview
Experts Questionnaire survey among experts
Interview
TEM lesson observation Classroom observation
TEM, as well as the washback of TEM on them. Altogether seven questionnaires are
designed, one for experts, and the washback of TEM4 and TEM8 is investigated
separately among relevant program administrators, teachers and students. Then,
based on the above research findings, some sample classes are selected for observa-
tion, to explore the washback of TEM on classroom teaching and learning. Last,
supplementary interviews will be conducted among selected participants, to seek
answers to problems emerged in the questionnaire surveys and classroom
observations.
4.2 Participants
Major participants of the current study are experts, program administrators, teachers
and students. Experts are specialists in English education who have either partici-
pated in the Teaching Syllabus design or TEM development, and some of them are
also members of NACFLT, an advisory committee in the Ministry of Education
which has been established to guide foreign language education in China. They
have a deep understanding of the language education policies in China, the Teaching
Syllabus and TEM. Moreover, some of them are also administrators in their respec-
tive universities and professors who teach English courses. Therefore, their opin-
ions may provide us with illuminating insights into TEM and its washback, to help
us understand this issue from the perspective of supervisors, administrators and
teachers.
4.3 Instruments 41
Program administrators are either deans of the English department or course

leaders in charge of English teaching in the foundation stage or advanced stage.
They are responsible for curriculum design and course planning for English majors
in their schools, so they are familiar with both the Teaching Syllabus and the practi-
cal situation in their schools. Furthermore, some of them are teachers, and often
experienced teachers, so they are familiar with the teaching practice in their own
schools as well. In this sense, a survey among them may help us understand how the
Teaching Syllabus is implemented and how TEM has affected English program
design in different universities throughout China.
Teachers are those with TEM related teaching experiences, who have either
taught TEM preparation courses or the required courses in the corresponding stage
as listed in the Teaching Syllabus (e.g. teachers of fundamental English will be sur-
veyed on the washback of TEM4 since fundamental English is a course recom-
mended for English majors in the foundation stage). As practitioners, they have
better understanding of how English is taught in classrooms and how TEM has
affected their teaching practice.
Students are TEM candidates who have just taken the test. They have prepared
for TEM and must have some comments to make on their test preparation experi-
ence, such as how the preparation has affected their English learning and how their
TEM preparation classes differed from non-preparation classes. An investigation
among them will help us understand the washback of TEM on the learning process
and products, which is of utmost importance in this research, since the fundamental
aim of education is to help students learn.
4.3 Instruments
Major research instruments used in the current study are questionnaires, classroom
observations and interviews, which will be introduced in detail in the following part.
4.3.1 Research Design for the Questionnaire Surveys
Questionnaire is a useful research tool to gather information from a large number of

people. Therefore, in the current research, questionnaire surveys are conducted first
to collect information from program administrators, teachers and students nation-
wide, to find out the general situation of TEM washback throughout China.
4.3.1.1 The Survey Questionnaires
Altogether seven questionnaires have been developed to collect information from

experts (EQ, questionnaire for experts), TEM4 related administrators (AQ-4, ques-
tionnaires to investigate administrators’ opinions on TEM4), TEM4 related teachers
(TQ-4, questionnaire to investigate teachers’ opinion on TEM4), TEM4 related stu-
dents (SQ-4, questionnaire for TEM4 candidates), TEM8 related administrators
(AQ-8, questionnaires to investigate administrators’ opinions on TEM8), TEM8
related teachers (TQ-8, questionnaire to investigate teachers’ opinion on TEM8),
and TEM8 related students (SQ-8, questionnaire for TEM8 candidates).
When designing the seven questionnaires, the researcher has mainly referred to
the following resources:
1. The Teaching Syllabus, TEM Syllabuses, TEM test papers, related documents on
English education policies;
2. Washback theories, such as Alderson and Wall’s hypotheses, Bailey’s washback
model, and so on;
3. Empirical studies on TEM and washback, such as those studies analyzed in
Sects. 1.2.4 and 3.2;
4. Communication with experts, administrators, teachers and students;
5. Personal experience as a teacher for undergraduate English majors.
When designing and revising the questionnaires, the researcher has frequently
consulted the testing experts, discussed with fellow PhD candidates, and communi-
cated with colleagues teaching related courses. Every questionnaire had undergone
two rounds of piloting and three rounds of revising before the operational version
was finalized. The content and layout of the questionnaires is listed below in
Table 4.2. (See Appendix I for the complete version of the questionnaires).
Among the seven questionnaires, expert questionnaire employs quite a few short
answer questions, 11 altogether, because this is a small scale survey conducted
among five experts. Due to some practical reasons, the researcher was not able to
interview the experts face to face, so a semi-open questionnaire was designed to
investigate their opinions, and the experts all provided detailed answers to the open
questions. In the other six questionnaires, most of the items employed are objective
questions that are easy to answer, since they are intended for large-scale surveys.
The content of the questionnaire surveys mainly covers five aspects. First, the
stakeholders’ personal information is collected, because as pointed out in Green’s
washback model (2007), different participants’ characteristics may lead to wash-
back variance—stakeholders’ educational background and work experience may
contribute to their perception of TEM and its washback, as well as what and how
they are affected by TEM. Second, stakeholders’ evaluation of TEM is surveyed,
such as whether the test content and method employed in TEM is well selected.
This relates to the validity of TEM—whether TEM assesses what it is purported to
assess—an essential factor that may determine the direction of TEM washback.
Meanwhile, efforts are made to find out how TEM results are interpreted and used
in different schools, because the use and interpretation of test results is an indis-
4.3 Instruments 43
Table 4.2 The seven questionnaires

No. of
Questionnaire Theme Question type questions
EQ Personal information; Multiple choice; 30
Perception and evaluation of the Blank filling;
Teaching Syllabus, TEM Syllabuses,
TEM, and TEM washback;
General comments. Short answer;
Ranking.
AQ-4 Personal information; Multiple choice; 54
Perception, views on TEM4 and its Five-point Likert scale;
washback;
Use & interpretation of TEM4 scores; Ranking;
Teaching and test preparation practice; Short answer.
Factors affecting teaching;
General comments.
TQ-4 Personal information; Multiple choice; 52
Perception, views on TEM4 and its Blank filling;
washback;
Use & interpretation of TEM4 scores; Five-point Likert scale;
Teaching and test preparation practice; Ranking;
Factors affecting teaching; Short answer.
General comments.
SQ-4 Personal information; Multiple choice; 55
washback;
Related learning experience and effects; Ranking;
Factors affecting learning; Short answer.
General comments.
AQ-8 Personal information; Multiple choice; 54
Perception, views on TEM8 and its Five-point Likert scale;
washback;
Use & interpretation of TEM8 scores; Ranking;
Teaching and test preparation practice; Short answer.
Factors affecting teaching;
General comments.
(continued)

No. of
Questionnaire Theme Question type questions
TQ-8 Personal information; Multiple choice; 52
washback;
Teaching and test preparation practice; Ranking;
Factors affecting teaching; Short answer.
General comments.
SQ-8 Personal information; Multiple choice; 57
washback;
Related learning experience and effects; Ranking;
Factors affecting learning; Short answer.
General comments.
pensable part of test validity (Bachman 1990). Third, stakeholders’ perception of

TEM washback is investigated, to find out how they think they are affected by
TEM, such as their teaching and learning content, methods, etc. Fourth, attempts
have been made to explore the TEM preparation practice of teachers and students,
to identify the actual washback effects of TEM on English teaching and learning.
Last, stakeholders’ opinions are sought as to what are the factors that affect their
curriculum design, teaching or learning, and what are the factors that motivate them
to learn or lead to the pressure in their teaching. These factors are considered
because according to Wall (2000), various factors contribute to the outcome of
teaching and learning, so it is essential to weigh these factors before assessing the
washback effects of TEM.
4.3.1.2 Data Collection
In early 2012, five expert questionnaires were distributed through personal contact,
all returned and valid. The questionnaires for TEM8 administrators, teachers and
candidates were distributed in March, 2012, together with the test booklets, by spe-
cial delivery. The administrators’ questionnaire and teachers’ questionnaire were
mailed to all the 809 colleges and universities that had students take TEM8, one
copy each school for the administrator, and five copies for related teachers. These
questionnaires had been distributed and finished before TEM was administered, and
returned together with the test paper, but in a separate package. The students’
4.3 Instruments 45
questionnaire was distributed to 250 colleges and universities selected according to

university types and regions, to ensure that these institutions included both national
key universities and provincial universities in all the provinces in Mainland China.
Seventy copies of the students’ questionnaire were mailed to each selected school
and were completed in the examination room after the candidates had finished their
TEM8 test. The questionnaires were collected soon afterwards and there was no
time limit for doing the questionnaire. The questionnaires for TEM4 administrators,
teachers and candidates were distributed in a similar way in May, 2012, together
with TEM4 test booklets. The administrator’s questionnaire and teacher’s question-
naire were mailed to 869 TEM4 participating schools that year, and the candidate’s
questionnaire was mailed to 250 selected schools, with 70 copies for each school.
After the questionnaires were returned, they were carefully sorted, to single out
invalid copies. The following three types of questionnaires were considered invalid:
1. More than 20% of the questions were unfinished;
2. The same answer was selected for all items, for instance, selecting “4” for all
five-point Likert scale questions;
3. There were some questions with obviously contradictory answers, for instance,
when the answer to the question “whether you have used any TEM preparation
materials” was “No”, the option of “several TEM preparation books” was
selected for the next question “if yes, which TEM preparation book have you
used”.
The number of questionnaires distributed, returned and valid is listed below in
Table 4.3.
From Table 4.3, it can be noted that the return rate and valid rate are very high
with experts, administrators and teachers. However, for students’ questionnaire,
especially TEM8 candidates’ questionnaire, the return rate and valid rate are rela-
tively low, 69.4% and 56.5% respectively, which is probably due to the fact that the
candidates had just completed a 195-minite-long test, so some of them gave up
answering the questionnaire halfway, leaving more than 20% of the questions unfin-
ished (refer to Sect. 4.4.2.1 for further explanation).
Table 4.3 The number of questionnaires distributed, returned and valid

Questionnaire Distributed No. Returned No. (%) Valid No. (%)
EQ 5 5 (100%) 5 (100%)
AQ-4 869 789 (90.8%) 786 (99.6%)
TQ-4 4335 3758 (86.7%) 3574 (95.1%)
SQ-4 17,500 15,383 (87.9%) 11,033 (71.7%)
AQ-8 809 750 (92.7%) 724 (96.5%)
TQ-8 4045 3479 (86.0%) 3222 (92.6%)
SQ-8 17,500 12,146 (69.4%) 6863 (56.5%)
4.3.1.3 Data Analysis
The valid questionnaires were put into computer by the present researcher with the
help of six college students and two graduate students. The answers to multiple-
choice and five-point Likert-scale questions were processed by SPSS 17.0 (Statistical
Product and Service Solutions 17.0), while answers to short-answer questions were
collected, sorted, double-checked, and saved in a Microsoft Word file.
Due to its limited number, only descriptive statistical analysis were applied to
expert’s questionnaire, reporting such results as frequency and mean, while inferen-
tial statistical analysis such as factor analysis and t-test were not conducted. The
experts’ perception and evaluation of TEM and its washback will be reported in
Sect. 5.1.
As for the other six questionnaires, both descriptive and inferential analyses were
performed. In this chapter, the results of reliability test and factor analysis are
reported, while a further discussion on the views of administrators, teachers and
candidates will be provided in Chap. 5.
Reliability Test
A reliability test was run on the questionnaire data of all the objective items, includ-
ing both five-point Likert scale items and multiple-choice questions. The question-
naires were analyzed separately, and for each questionnaire, two reliability tests
were performed, one on all the objective items and the other on five-point Likert
scale items only. The results are listed in the following tables (Table 4.4).
From Table 4.4, it can be seen that Cronbach’s Alpha is above 0.8 for all ques-
tionnaires, and Cronbach’s Alpha for five-point Likert scale items only is above 0.9,
which attests to the satisfactory internal consistency of the questionnaires.
Factor Analysis
An independent factor analysis was carried out on all the five-point Likert-scale
items of each questionnaire to better explain the research findings. First, KMO and
Bartlett’s Tests were conducted, the results of which proved satisfactory for further
analysis (Table 4.5).
In factor analysis, methods of principal components and varimax rotation were
employed because they maximize variance and facilitate the interpretation of the
constructs deduced (Lu 2006: 477–483). In view of the arbitrary nature of factor
extraction, and practicality and meaningful interpretability, the following three cri-
teria were observed in data reduction: (1) the eigenvalue was greater than 1 and
there were more than 3 items in one factor; (2) factor loadings lower than 0.4 were
deleted and not counted in any factor; (3) when double loadings occurred, decisions
were made on meaningful interpretations.
4.3 Instruments 47
Table 4.4 Reliability test for AQ-4, TQ-4, SQ-4, AQ-8, TQ-8 and SQ-8
Cronbach’s Alpha N of Items
AQ-4: all objective items .869 116
Five-point Likert scale items .962 58
TQ-4: all objective items .847 130
SQ-4: all objective items .936 200
AQ-8: all objective items .903 116
TQ-8: all objective items .845 106
SQ-8: all objective items .948 203
Note: The number of items in the reliability test for five-point Likert scale items may exceed those
in the questionnaires because some multiple-choice questions with five options arranged in degree
are included
Based on the above criteria, factor analysis was conducted on the five-point
Likert scale items in each questionnaire, and the factors extracted and variance
explained are listed in Table 4.6.
Based on the three criteria mentioned above, factors were extracted from each
questionnaire and named according to the items included. The explanation of the
rotated component matrix of administrators’ questionnaires on TEM4 and TEM8
will be grouped together, because they are of similar content and layout, and will be
analyzed together in Chap. 5. The same applies to the teachers’ questionnaires and
the students’ questionnaires.
Program Administrators’ Questionnaires:
AQ-4: Seven factors have been extracted and named:
Factor 1: washback of TEM4 on English language teaching (16 items);
Factor 2: intended use and interpretation of TEM4 scores (11 items);
Factor 3: evaluation of TEM4 (8 items);
Factor 4: unintended use and interpretation of TEM4 scores (6 items);
Factor 5: attitude to unintended uses of TEM4 scores (4 items);
Factor 6: familiarity with TEM4 (5 items).
Factor 7: washback of the 2005 TEM4 revision on English language teaching (4
items);
Table 4.5 KMO and Bartlett’s tests

AQ-4
Kaiser-Meyer-Olkin Measure of Sampling .940
Adequacy.
Bartlett’s Test of Sphericity Approx. Chi-Square 15334.058
Df 1653
Sig. .000
TQ-4
Adequacy.
Df 1176
Sig. .000
SQ-4
Adequacy.
Df 2145
Sig. .000
AQ-8
Adequacy.
Df 1653
Sig. .000
TQ-8
Adequacy.
Df 1176
Sig. .000
SQ-8
Adequacy.
Df 2415
Sig. .000
Originally there were 10 factors whose eigenvalues exceeded 1. Among them

factors 8 and 9 consisted of three items only, and all the items in factor 10 co-
occurred in factor 1. So, these three factors were deleted according to the three cri-
teria mentioned above. The items co-occurred in the deleted factors were grouped
into the factors named and would be explained in these factors.
Table 4.6 Rotated component matrix for the questionnaires
AQ-4: Ten-factor rotated component matrix
Item Factor Communality
1 2 3 4 5 6 7 8 9 10
Washback on teaching speed .810 .756
Washback on teaching ideology .809 .821
Washback on teaching method .795 .787
Washback on teaching content .793 .781
Washback on teaching schedule .784 .755
Washback on teaching attitude .772 .764
Washback on teaching depth .771 .751
Washback on course design .725 .745
Washback on teaching aims .696 .705
Washback from TEM4 score allotment .630 .584 .871
Washback from TEM4 test content .613 .482 .786
Washback from TEM4 time allotment .605 .573 .856
Washback from TEM4 test method .603 .511 .785
Washback from TEM4 marking criteria .563 .527 .743
Assessing learning outcome .694 .710
Reflecting quality of teaching .692 .775
Reflecting students’ language proficiency .667 .676
Predicting future performance .660 .688
Revealing problems in teaching .635 .661
Promoting teaching reform .623 .705
Reflecting the Teaching Syllabus implementation .616 .689
Feedback to course design .606 .684
(continued)
Feedback to teaching aims .592 .650
Feedback to classroom teaching .540 .680
Promoting the Teaching Syllabus implementation .472 .639
Quality of TEM4 score allotment .776 .701
Quality of TEM4 marking criteria .729 .663
Quality of TEM4 test methods .720 .688
Quality of TEM4 time allotment .716 .626
Quality of TEM4 test content .698 .701
Alignment with the Teaching Syllabus .694 .651
Convenience of administration .636 .622
Affecting department ranking .793 .782
Affecting university ranking .782 .753
Affecting teacher evaluation .719 .708
Affecting student ranking .644 .613
Affecting student future career .570 .679
Affecting student further education .476 .692
Attitude to using TEM4 score to assess leaders .875 .813
Attitude to using TEM4 score to assess teachers .864 .789
Attitude to using TEM4 score to confer degrees .751 .591
Attitude to using TEM4 score to rank schools .707 .624
Familiarity with TEM4 Test Syllabus .791 .687
Familiarity with TEM4 time allotment .762 .646
Familiarity with the Teaching Syllabus .697 .576
Familiarity with TEM4 marking criteria .694 .609
Washback from TEM4 writing revision .778 .781
Washback from TEM4 vocabulary revision .757 .792
Washback from TEM4 listening revision .727 .743
Washback from TEM4 reading revision .534 .591
Overall washback on students .723 .773
Overall washback on teachers .470 .682 .799
Overall washback on course design .431 .679 .798
Strictness of administration .760 .748
Quality of TEM test report .676 .636
Fairness of administration .425 .563 .635
Affecting some teachers and students only .649
Familiarity with the TEM4-Oral test syllabus .716
Eigenvalue 20.93 3.81 2.82 2.57 2.36 1.63 1.50 1.36 1.24 1.11
Variance explained (%) 36.08 6.57 4.86 4.43 4.07 2.80 2.59 2.34 2.13 1.92
Accumulated variance explained (%) 36.08 42.65 47.51 51.94 56.00 58.80 61.40 63.74 65.87 67.78
AQ-8: Ten-factor rotated component matrix
1 2 3 4 5 6 7 8 9 10
Washback from TEM8 score allotment .605 .546 .828
(continued)
Washback from TEM8 time allotment .578 .564 .828
Washback from TEM8 test content .567 .722
Washback from TEM8 test methods .556 .479 .755
Reflecting the Teaching Syllabus implementation .707 .737
Revealing problems in teaching .703 .763
Reflecting quality of teaching .702 .731
Feedback to teaching aims .682 .743
Feedback to course design .680 .751
Feedback to classroom teaching .638 .415 .756
Promoting the Teaching Syllabus implementation .632 .713
Assessing learning outcome .576 .403 .716
Promoting teaching reform .546 .637
Quality of TEM8 test report .495 .507
Fairness of administration .598 .595
Affecting department ranking .775 .769
Affecting university ranking .735 .752
Affecting students’ further education .585 .626
Washback from inclusion of TEM8 General .712 .664
Knowledge
Affecting some teachers and students only .301
Attitude to using TEM8 score to assess teachers .893 .812
Attitude to using TEM8 score to assess leaders .875 .794
Attitude to using TEM8 score to confer degrees .829 .718
Attitude to using TEM8 score to rank schools .716 .585
Familiarity with TEM8-Oral Test Syllabus .535 .582
Overall washback on course design .484 .706 .810
Washback from marking criteria .536 .569 .782
Affecting students’ future career .506 .565 .663
Reflecting students’ language proficiency .473 .544 .720
Eigenvalue 20.93 3.77 2.90 2.62 2.30 2.00 1.65 1.40 1.31 1.10
Variance explained (%) 36.08 6.50 4.99 4.52 3.96 3.44 2.84 2.41 2.27 1.89
Accumulated variance explained (%) 36.08 42.57 47.57 52.08 56.05 59.49 62.33 64.75 67.01 68.91
(continued)
TQ-4: 15-factor rotated component matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Washback from TEM4 test methods .843 .799
Washback from TEM4 score allotment .830 .813
Washback from TEM4 time allotment 813 .771
Washback from TEM4 marking criteria .801 .772
Overall washback on teachers .668 .600
Convenience of administration .565 .418 .575
Quality of TEM4 test report .518 .512
Reading activities in TEM4 preparation .769 .706
classes
Translation activities in TEM4 preparation .755 .696
classes
Writing activities in TEM4 preparation .703 .684
classes
Other activities in TEM4 preparation classes .675 .574
Speaking activities in TEM4 preparation .669 .705
classes
Listening activities in TEM preparation .548 .615
classes
Difference in explaining textbook exercisesa .878 .826
Difference in practicing textbook exercises .825 .764
Difference in explaining texts .764 .679
Difference in other activities .577 .619
Difference in listening activities .825 .757
Difference in reading activities .779 .698
Difference in writing activities .746 .710
Difference in translation activities .625 .638
Difference in speaking activities .440 .620 .707
Reflecting students’ language proficiency .692 .699
Assessing learning outcome .681 .70
Promoting teaching reform .614 8
Assessing teaching quality .547 .592
(continued)
Feedback to teaching .410 .480 .638
Motivation to learn .428 .605
Familiarity with TEM4-Oral Test Syllabus .427 .602
Working on textbook exercises in TEM4 .793 .554
preparation classes
Explaining textbook exercises in TEM4 .760 .784
preparation classes
Explaining texts in TEM4 preparation classes .759 .808
Washback from TEM4 vocabulary revision .725 .715
Working on test papers in TEM4 preparation .847 .495
classes
Explaining test papers in TEM4 preparation .814 .794
classes
Difference in working on test papers .563 .471 .769
Affecting students’ future career .727 .663
Affecting students’ further education .727 .661
Affecting the organization of routine teaching .621 .586
activities
Strictness of administration .450 .637 .663
Difference in explaining test papers .500 .661
Attitude to using TEM4 score to assess .806 .727
teachers
Attitude to using TEM4 score to confer .726 .717
degrees
.658
Eigenvalue 15.7 15.7 4.6 3.3 2.7 2.3 2.0 1.8 1.5 1.3 1.3 1.2 1.1 1.1 1.1
Variance explained (%) 22.1 9.3 6.4 4.6 3.8 3.2 2.9 2.5 2.2 1.9 1.9 1.7 1.6 1.6 1.4
Accumulated variance explained (%) 22.1 31.5 38.0 42.6 46.5 49.8 52.7 55.3 57.5 59.4 61.4 63.1 64.8 66.4 67.8
a
Difference between TEM4 preparation and non-preparation classes
TQ-8: 15-factor rotated component matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Washback from TEM8 time allotment .754 .771
Washback from TEM8 test methods .753 .817
Washback from TEM8 score allotment .751 .819
(continued)
Washback from TEM8 marking criteria .663 .421 .777
Feedback to teaching .570 .404 .615
Promoting teaching reform .555 .526 .703
Assessing learning outcome .533 .460 .680
Predicting future performance .436 .426 .715
Quality of TEM8 test report .427 .423 .601
Motivation to learn .426 .659
Speaking activities in TEM8 .761 .721
preparation classes
Working on textbook exercises in .760 .811
TEM8 preparation classes
Reading activities in TEM8 preparation .759 .799
classes
Explaining textbook exercises in .746 .814
TEM8 preparation classes
Writing activities in TEM8 preparation .706 .752
classes
Explaining texts in TEM8 preparation .691 .764
classes
Listening activities in TEM8 .666 .708
preparation classes
Other activities in TEM8 preparation .649 .447 .711
classes
Translation activities in TEM8 .645 .671
preparation classes
Difference in translation activitiesb .816 .766
Difference in writing activities .773 .742
Difference in listening activities .753 .739
Difference in reading activities .744 .725
Difference in speaking activities .670 .682
Difference in explaining test papers .504 .742
Difference in working on test papers .449 .626
Familiarity with TEM8 marking .632 .698
criteria
Familiarity with TEM8-Oral Test .532 .470 .769
Syllabus
(continued)
Affecting the organization of routine .511 .557
teaching activities
Difference in working on textbook .885 .843
exercises
Difference in explaining textbook .824 .828
exercises
Difference in explaining texts .716 .729
Washback from inclusion of TEM8 .455 .659 .754
General Knowledge
Washback from TEM8 writing revision .419 .642 .728
Washback from TEM8 listening .514 .583
revision
Explaining test papers in TEM4 .752 .796
preparation classes
Working on test papers in TEM4 .724 .709
preparation classes
Assessing teaching quality .533 .596 .758
Reflecting students’ language .495 .496 .637
proficiency
Attitude to using TEM8 score to confer .863 .807
degrees
Attitude to using TEM8 score to .826 .742
evaluate teachers
Affecting students’ future career .679 .730
Affecting students’ further education .454 .517 .683
Difference in other activities .462 .515 .657
Overall washback on students .424 .557 .767
Eigenvalue 18.5 6.9 4.9 3.3 2.7 2.3 1.9 1.8 1. 1.5 1.3 1.2 1.2 1.2 1.1
Variance explained (%) 26.1 9.7 6.9 4.7 3.8 3.3 2.7 2.5 2.1 2.1 1.9 1.8 1.7 1.7 1.5
Accumulated variance explained (%) 26.1 35.8 42.8 47.6 51.4 54.7 57.4 60.0 62.2 64.3 66.2 68.1 69.8 71.5 73.1
b
SQ-4: 24-factor rotated component matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Improving .77 .70
English
proficiency
Promoting .76 .72
English learning
Improving .74 .65
learning
efficiency
Better arranging .70 .63
learning
activities
Clarifying .69 .60
learning
requirements
(continued)
Spending more .69 .59
time learning
Providing .69 .60
learning aims
Improving .68 .62
language
proficiency by
working on test
papers
Identifying .67 .57
weaknesses
Enlarging .42 .42
vocabulary by
reciting TEM4
words
Difference in .75 .71
TEM4 cloze
practice after
class
TEM4 reading
practice after
class
TEM4 grammar
practice after
class
TEM4 dictation
practice after
class
TEM4 listening
practice after
class
TEM4 writing
practice after
class
TEM4 note
writing practice
after class
reciting TEM4
vocabulary after
class
reading
grammar books
after class
Washback on .78 .74
learning
schedule
Washback on .76 .71
learning content
Washback on .73 .71
course selection
(continued)
Washback on .73 .69
learning method
Washback on .72 .69
learning attitude
Washback on .72 .65
learning
sequence
Washback on .71 .69
learning depth
Washback on .63 .67
learning aims
Overall .43 .47
washback on
students
Preference for .75 .73
speaking
reading
writing
translation
listening
other activities
text explanation
Preference for .61 .41 .70
textbook
exercise
explanation
Preference for .57 .40 .66
doing textbook
exercises
TEM4 test
paper
explanation
doing TEM4
test papers
speaking
practice after
class
free writing
practice after
class
reading practice
after class
Time spent on .63 .57
speaking before
TEM4
(continued)
Time spent on .63 .44 .70
free writing
before TEM4
reading books
before TEM4
TEM4 affecting .54 .45 .63
oral proficiency
Difference in .47 .51 .60
listening to the
radio after class
listening to the
radio before
TEM4
TEM4 affecting .66 .72
listening
proficiency
reading
proficiency
writing
proficiency
dictation
proficiency
grammar
proficiency
MCQ test taking
techniques
vocabulary
proficiency
TEM4 listening
before TEM4
TEM4 dictation
before TEM4
TEM4 reading
before TEM4
TEM4 cloze
before TEM4
TEM4 grammar
before TEM4
(continued)
Reading .70 .70
proficiency
improved in the
advanced stage
Speaking .68 .67
proficiency
improved in the
advanced stage
Writing .67 .65
proficiency
improved in the
advanced stage
Vocabulary .65 .65
enlarged in the
advanced stage
Listening .63 .67
proficiency
improved in the
advanced stage
Grammar .61 .62
proficiency
improved in the
advanced stage
Reflecting .68 .56
listening
proficiency
Reflecting .68 .65
overall language
proficiency
Reflecting .67 .57
mastery of
language
knowledge
Reflecting .66 .55
reading
proficiency
Reflecting .64 .52
writing
proficiency
Assessing .41 .59 .58
learning
outcome in the
fundamental
stage
Predicting .40 .48 .59
future
performance
Reading .78 .76
activities in
TEM4
preparation
classes
Speaking .74 .76
activities in
TEM4
preparation
classes
(continued)
Listening .69 .70
activities in
TEM4
preparation
classes
Writing .68 .69
activities in
TEM4
preparation
classes
explaining
textbook
exercisesc
doing textbook
exercises
explaining texts
speaking
activities
listening
activities
reading
activities
speaking
activities
Satisfied with .75 .59
guessing the
correct answer
to MCQ
Taking up the .75 .67
time for skill
training
Hindering .73 .63
learning
No washback .56 .59
at all
Affecting .54 .45 .53
student ranking
Time spent on .77
TEM4 writing
before TEM4
TEM4
note-writing
before TEM4
Explaining .74 .71 .62
TEM4 test
papers in TEM4
preparation
classes
(continued)
Working on .66 .57
TEM4 test
papers in TEM4
preparation
classes
working on
TEM4 test
papers
explaining
textbook
exercises
Familiarity with .74 .71
the Teaching
Syllabus
TEM4-Oral Test
Syllabus
TEM4 Test
Syllabus
Familiarity with .52 .44 .66
TEM4 marking
criteria
Explaining .79 .78
textbook
exercises in
TEM4
preparation
classes
Doing textbook .79 .77
exercises in
TEM4
preparation
classes
Explaining texts .71 .64
in TEM4
preparation
classes
Other activities .67 .69
in TEM4
preparation
classes
other activities
Translation .45 .57 .68
activities in
TEM4
preparation
classes
translation
activities
(continued)
TEM4
preparation after
class
Time spent daily .65 .62
on TEM4
preparation after
class
Mock test .64 .59
papers done
extra-curricular
TEM4 training
course
Affecting .85 .81
students’ future
career
Affecting .83 .79
students’ further
education
time allotment
score allotment
TEM4
vocabulary
before TEM4
reading
grammar books
before TEM4
Causing .40 .58 .58
pressure
Attitude to .63 .51
using TEM4
score to confer
degrees
Eigenvalue 25.1 6.8 6.1 4.3 3.6 3.0 2.7 2.4 2.3 2.1 1.9 1.9 1.7 1.6 1.5 1.4 1.3 1.2 1.2 1.1 1.1 1.0 1.0 1.0
Variance 21.3 5.8 5.2 3.6 3.1 2.6 2.3 2.0 2.0 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1.1 1.0 1.0 0.9 0.9 0.9 0.9 0.8
explained (%)
Accumulated 21.3 27.1 32.3 36.0 39.1 41.7 44.1 46.2 48.2 50.0 51.6 53.3 54.7 56.2 57.4 58.7 59.8 60.9 61.9 62.9 63.8 64.7 65.6 66.5
variance
explained (%)
c
SQ-8: 24-factor rotated component matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
writing
reading
speaking
(continued)
translation
listening
other activities
textbook
exercise
explanation
text explanation
TEM8 test
paper
explanation
doing textbook
exercises
doing TEM8
test papers
writing
activitiesd
reading
activities
listening
activities
speaking
activities
translation
activities
explaining texts
explaining
textbook
exercises
doing textbook
exercises
other activities
explaining
TEM8 test
papers
doing TEM8
test papers
(continued)
Washback on .78 .75
learning depth
Washback on .78 .75
learning
schedule
Washback on .75 .71
learning attitude
Washback on .74 .69
course selection
Washback on .73 .69
learning method
Washback on .72 .72
learning content
Washback on .72 .70
learning aims
Washback on .68 .66
learning speed
Overall .44 .54
washback on
students
TEM8 writing
practice after
classe
TEM8 reading
practice after
class
TEM8
mini-lecture
practice after
class
reading culture
books after class
TEM8 general
knowledge after
class
TEM8
translation
practice after
class
listening to the
radio after class
TEM8
error-correction
after class
(continued)
free writing
practice after
class
reading practice
after class
TEM8 listening
practice after
class
speaking
practice after
class
translation
proficiency
listening
proficiency
reading
proficiency
writing
proficiency
grammar
proficiency
note-taking
proficiency
MCQ test taking
techniques
speaking
proficiency
general
knowledge
proficiency
vocabulary
proficiency
Better arranging .74 .68
learning
activities
Clarifying .74 .67
learning
requirements
Improving .73 .67
learning
efficiency
(continued)
Promoting .70 .70
English leaning
Improving .66 .64
English
proficiency
Spending more .66 .64
time learning
Identifying .65 .58
weaknesses
Providing .62 .63
learning aims
Improving .52 .55
language
proficiency by
doing test
papers
TEM8 listening
before TEM8
TEM8 reading
before TEM8
TEM8
mini-lecture
before TEM8
TEM8
error-correction
before TEM8
TEM8 general
knowledge
before TEM8
Reading .80 .78
activities in
TEM8
preparation
classes
Writing .80 .80
activities in
TEM8
preparation
classes
Speaking .79 .78
activities in
TEM8
preparation
classes
Listening .77 .71
activities in
TEM8
preparation
classes
Translation .76 .72
activities in
TEM8
preparation
classes
(continued)
Other activities .62 .59
in TEM8
preparation
classes
Reflecting .73 .67
translation
proficiency
Reflecting .72 .64
reading
proficiency
Reflecting .71 .62
listening
proficiency
Reflecting .70 .63
writing
proficiency
Reflecting .70 .68
overall language
proficiency
Assessing .62 .63
learning
outcome
Reflecting .53 .47
general
knowledge
proficiency
Predicting .52 .63
future
performance
free writing
before TEM8
reading before
TEM8
speaking before
TEM8
listening to the
radio before
TEM8
TEM8 writing
before TEM8
Time spent .42 .51 .60
reading culture
books before
TEM8
Reading .72 .71
proficiency
improved in the
advanced stage
Speaking .71 .71
proficiency
improved in the
advanced stage
Listening .70 .73
proficiency
improved in the
advanced stage
(continued)
Writing .68 .70
proficiency
improved in the
advanced stage
Translation .61 .69
ability improved
in the advanced
stage
General .54 .61
knowledge
improved in the
advanced stage
Hindering .79 .69
language
learning
Taking up time .78 .67
for skill training
Satisfied with .71 .59
guessing the
correct answer
for MCQ
No washback at .52 .61
all
Affecting .52 .58
student ranking
the Teaching
Syllabus
TEM8-Oral Test
Syllabus
TEM8 marking
criteria
TEM8 Test
Syllabus
Doing TEM8 .75 .68
test papers in
TEM8
preparation
classes
Explaining .74 .69
TEM8 papers in
TEM8
preparation
classes
Doing textbook .74 .74
exercises in
TEM8
preparation
classes
(continued)
Explaining text .70 .75
exercises in
TEM8
preparation
classes
Explaining texts .65 .65
in TEM8
preparation
classes
TEM8 time
allotment
TEM8 score
allotment
Affecting .82 .76
students’ future
career
Affecting .80 .74
students’ further
education
TEM8
preparation after
class
The number of .62 .63
mock test
papers done
extra-curricular
TEM8 training
course
Time spent daily .48 .58
on TEM8
preparation
TEM8
translation after
class
TEM8
vocabulary after
class
TEM8
vocabulary
practice after
class
Causing .60 .58
pressure
Attitude to .73 .60
using TEM8
score to confer
degrees
Enlarging .63 .64
vocabulary by
reciting TEM8
words
Eigenvalue 27.3 8.2 5.6 4.1 3.5 3.4 2.9 2.5 2.4 2.2 2.1 1.9 1.7 1.7 1.5 1.4 1.3 1.2 1.2 1.2 1.0 1.0 1.0 1.0
(continued)
Variance 22.6 6.8 4.6 3.8 2.9 2.8 2.4 2.0 2.0 1.8 1.7 1.5 1.4 1.4 1.2 1.1 1.0 1.0 1.0 1.0 0.9 0.8 0.8 0.8
explained (%)
Accumulated 22.6 29.4 34.1 37.9 40.8 43.6 46.0 48.1 50.1 51.9 53.7 55.3 56.7 58.1 59.4 60.6 61.7 62.7 63.8 64.8 65.7 66.6 67.4 68.2
variance
explained (%)
d
e
Difference in after class learning efforts between TEM8 preparation and non-preparation stages
4.3 Instruments 91
AQ-8: Seven factors have been extracted and named:

items);
Factor 7: familiarity with TEM8 (5 items).
Altogether 10 factors whose eigenvalues exceeded 1 have been extracted.
Excluding the items that co-occurred in other factors, the items included in factor 8,
9 and 10 were less than three, so they were deleted. The items co-occurred in the
deleted factors were grouped into the factors named and explained accordingly.
According to the seven factors extracted from program administrators’ question-
naires, the further discussion in Chap. 5 will be conducted from the following two
aspects:
1. evaluation of TEM (Factors 2, 3, 4, 5 and 6 in AQ-4, Factors 2, 3, 4, 6 and 7 in
AQ-8);
2. perception of TEM washback (Factors 1 and 7 in AQ-4, Factors 1 and 5 in AQ-8).
Teachers’ Questionnaires
TQ-4: Eleven factors have been extracted and named:
Factor 3: language skill training activities in TEM4 preparation classes (5 items);
Factor 4: difference in textbook activities between TEM preparation and non-prep-
aration classes (4 items);
Factor 5: difference in skill training activities between TEM preparation and non-
preparation classes (5 items);
Factor 7: familiarity with TEM4 (5 items);
items);
Factor 9: exercise practice in TEM4 preparation classes (4 items);
Altogether 15 factors whose eigenvalues exceeded 1 have been extracted. Factors
8, 11, 13 and 14 included no more than three items, so they were deleted. Factor 15
included two items only, but it has been considered a separate factor since the two
items in it were on the teachers’ attitude to unintended uses of TEM scores, a factor
in program administrators’ questionnaires. In this sense, this factor has been
included for further explanation in Chap. 5.
TQ-8: Nine factors have been extracted and named:

Factor 2: evaluation of TEM8 and intended use and interpretation of TEM8 scores
(17 items);
Factor 3: textbook and skill training activities in TEM8 preparation classes (9 items);
Factor 4: difference in TEM preparation and skill training activities between TEM
preparation and non-preparation classes (7 items);
Factor 5: familiarity with TEM4 (5 items);
Factor 7: difference in textbook activities between TEM preparation and non-prep-
items);
However, Factors 9, 10, 12, 13, 14 and 15 included no more than three items, so they
were not explained separately. Factor 6, unintended use and interpretation of TEM8
scores, was considered a separate factor because it was treated as a separate factor
in all other questionnaires. It is the same with Factor 11, the teachers’ attitude to
unintended uses of TEM scores. Moreover, Factor 2 consisted of both teachers’
evaluation of TEM8 and the intended use and interpretation of TEM8 scores, which
might be attributed to the fact that the quality of TEM8 can be closely related to how
the scores are used. Nevertheless, these shall be explained separately in further
discussion.
Based on the factors extracted from teachers’ questionnaires, the further discus-
sion in Chap. 5 will be conducted from the following aspects:
1. evaluation of TEM (Factors 2, 6, 7, 10 and 11 in TQ-4, Factors 2, 5, 6 and 9 in
TQ-8);
2. perception of TEM washback (Factors 1 and 8 in TQ-4, Factors 1 and 8 in TQ-8);
3. TEM washback on English classroom teaching (Factors 3, 4, 5 and 10 in TQ-4,
Factors 3, 4 and 7 in TQ-8).
Students’ Questionnaires
SQ-4: Seventeen factors have been extracted and named:
Factor 1: possible positive effects of TEM4 (10 items);
Factor 2: difference in after-class TEM related learning activities between TEM
preparation and non-preparation period (9 items);
Factor 3: washback of TEM4 on English language learning (9 items);
Factor 4: preference for classroom activities (11 items);
Factor 5: non-TEM related activities in TEM4 preparation classes and their differ-
ence between TEM preparation and non-preparation classes (8 items);
Factor 6: the effect of TEM4 on the learning effects (8 items);
Factor 7: TEM4-related learning activities conducted after class (5 items);
Factor 8: language proficiency improved in the foundation stage (6 items);
4.3 Instruments 93

Factor 11: difference in skill training activities between TEM preparation and non-
preparation classes (4 items);
Factor 12: possible negative effects of TEM4 (5 items);
Factor 13: TEM related activities in TEM4 preparation classes and their difference
between TEM preparation and non-preparation classes (4 items);
Factor 14: familiarity with TEM4 documents (4 items);
Factor 15: translation and other activities in TEM4 preparation classes and their dif-
ference between TEM preparation and non-preparation classes (4 items);
Factor 16: TEM preparation activities after class (4 items);
Factor 17: attitude to unintended uses of TEM8 scores (1 item);
Altogether 24 factors whose eigenvalues exceeded 1 have been extracted. Factors
11, 14, 17, 20, 21, 22 and 23 included no more than three items, so they were
deleted. Factor 24 included one item only, but it has been considered a separate fac-
tor since the item in it were on the students’ attitude to unintended uses of TEM
scores, a factor in program administrators’ and teachers’ questionnaires. In this
sense, this factor has been included for further explanation in Chap. 5.
SQ-8: Fifteen factors have been extracted and named:
Factor 1: preference for classroom activities (11 items);
Factor 2: difference in classroom activities between TEM preparation and non-prep-
Factor 3: washback of TEM8 on English language learning (9 items);
Factor 4: difference in after-class learning activities between TEM preparation and
non-preparation period (12 items);
Factor 5: the effect of TEM8 on the learning effects (10 items);
Factor 6: possible positive effects of TEM8 (9 items);
Factor 7: TEM4-related learning activities conducted after class (5 items);
Factor 10: non-TEM related activities conducted after class (6 items);
Factor 11: language proficiency improved in the advanced stage (6 items);
Factor 12: possible negative effects of TEM8 (5 items);
Factor 13: familiarity with TEM8 documents (4 items);
Factor 14: TEM preparation activities after class (4 items);
Factor 15: attitude to unintended uses of TEM8 scores (1 item);
However, Factors 14, 15, 16, 17, 19, 20, 21, 22 and 24 included no more than three
items, so they were not explained separately. Factor 23, the students’ attitude to
unintended uses of TEM scores, has been considered a separate factor since it was
explained as a separate factor in program administrators’ and teachers’
questionnaires.
Based on the factors extracted from students’ questionnaires, the further discus-
sion in Chap. 5 will be conducted from the following aspects:
1. evaluation of TEM (Factors 9, 14 and 17 in SQ-4, Factors 9, 13 and 15 in SQ-8);

2. perception of TEM washback (Factors 1, 3 and 12 in SQ-4, Factors 3, 6 and 12 in
SQ-8);
3. TEM preparation (Factors 2, 4, 5, 7, 10, 11, 13, 15 and 16 in SQ-4, Factors 1, 2,
4, 7, 8, 10 and 14 in SQ-8);
4. Learning effects (Factors 6 and 8 in SQ-4, Factors 5 and 11 in SQ-8).
Generally speaking, the factor structure of the questionnaires complies with the
original design, with only a few exceptions. First, in teachers’ questionnaires and
students’ questionnaires, the various classroom activities, which had been designed
to be a single factor, were divided into several factors, such as TEM related activi-
ties, textbook activities and skill training activities. In this sense, some factors (e.g.
Factor 14 in SQ-4 and SQ-8) contain two items only. Although they were not listed
as a separate factor, they would be explained with other activities in further discus-
sion in Chap. 5. The same practice has been applied to the factors of “difference
between TEM preparation classes and non-preparation classes”, “after-class learn-
ing activities”, and “difference in after-class learning activities”. Second, in stu-
dents’ questionnaires, familiarity with TEM was classified as two factors—familiarity
with TEM documents and familiarity with TEM practice. There were only two
items in familiarity with TEM practice, so it was not considered a separate factor.
However, in further discussion, the items of these two factors would be grouped
together. Last, in AQ-8, the item of “quality of TEM8 test report” was grouped with
other items on “TEM8 quality” designed to elicit participants’ evaluation of TEM8
(Factor 3), but it fell into Factor 2, “intended use and interpretation of TEM8
scores”. This is probably attributed to the fact that a good test report is essential for
the intended use and interpretation of test scores, but it would still be explained with
other qualities of TEM8.
In a word, the results of factor analysis testified the design of the questionnaires
and helped simplify explanation, and the further discussion would be conducted
according to both the factor analysis results and questionnaire design.
4.3.2 Classroom Observations
The advantage of questionnaire surveys is that information can be collected from a

large number of people over a short period of time. Therefore, in the current
research, questionnaire surveys were conducted first to paint a rough picture of the
general situation throughout China. Based on the research findings of the question-
naire surveys, pertinent classroom observation plans were developed, selecting
appropriate English classes to be observed and designing relevant classroom obser-
vation schemes.
4.3 Instruments 95
4.3.2.1 English Classrooms Observed
In the current study, altogether 52 classes have been observed and each class is
around 45 min long. The following three principles were followed to select proper
English classrooms to be observed..
First, both top universities and less established schools were selected.
Questionnaire survey findings indicate that students of various language proficiency
levels have been affected differently by TEM, and that teachers from schools of dif-
ferent levels responded variedly to TEM (see Chap. 5). Therefore, in classroom
observation, special effort has been made to select English classrooms from both
universities and colleges (schools that offer a three-year program for their English
majors). However, due to the fact that TEM8 is not required for candidates taking a
three-year program, only university classrooms are observed for TEM8 related
courses.
Second, both TEM-preparation courses and non-preparation courses were
selected. As shown in the research findings of the questionnaire surveys (see Chap.
5), some schools offered TEM preparation courses, mainly comprehensive ones
teaching all language skills covered in TEM. Moreover, teachers and candidates
claimed that TEM preparation courses differed from non-preparation ones. In this
sense, both types of courses have been observed to investigate whether they do dif-
fer and whether they differ in the same way as claimed by teachers and students.
Since TEM is a comprehensive test and the TEM preparation courses observed are
often comprehensive ones, equivalent non-preparation courses should also be com-
prehensive ones which deal with all language skills. Therefore, fundamental English
and advanced English courses have been selected. Moreover, it is common practice
for some schools to offer TEM preparation in fundamental English and advanced
English courses in the semester when TEM4 and TEM8 are held or in the previous
semester. Therefore, the TEM preparation classes observed here are of two types,
one is in a separate course named TEM Preparation Course, and the other is the
TEM preparation classes delivered in fundamental English or advanced English
courses. In this sense, whether a class is considered TEM preparation class or non-
preparation class is determined by the teaching content of that class.
Last, teachers who teach both TEM preparation and non-preparation courses
were selected. It is hoped that the TEM preparation and non-preparation courses can
be delivered by the same teacher, so that teacher variance can be limited.
4.3.2.2 Teachers Observed
Based on the above principles, six teachers were selected for observation. They are
all English teachers with TEM preparation experience. They have taught both TEM
preparation and non-preparation classes, but their educational background and
teaching experience differ considerably (refer to Sect. 6.1 for detailed
information).
4.3.2.3 The Observation Scheme
To better conduct the observation, it is necessary to develop an observation scheme

so that attention can be paid to the relevant aspects of the classroom events. The
following sources have been exploited to develop an observation scheme for the
current research:
1. COLT (Communicative Orientation in Language Teaching), an observation
scheme developed by Frohlich et al. (1985) to compare “the effects of instruc-
tional differences on the development of L2 proficiency”;
2. Observation schemes used in previous washback studies (Cheng 2005; Gu 2007;
Qi 2004; Wall 2005; Watanabe 1997, and so on);
3. Related theories on washback, as well as the washback models;
4. The Teaching Syllabus, TEM Syllabuses, and other documents on English teach-
ing at the tertiary level in China;
5. Extensive consultation with experts, discussion with PhD candidates, and com-
munication with English teachers and students;
6. The researcher’s personal experience as an English teacher.
Based on the above sources, an analytic observation scheme was developed, con-
sisting of eight categories: content, organization, materials, classroom language,
teacher’s communicative features, student’s communicative features, teacher’s
feedback, and atmosphere (see Appendix II). In the first three categories, the time
consumed is calculated and recorded in terms of percentage, five-point Likert scale
questions are employed in categories four to seven, and in the last category “atmo-
sphere”, the frequency of students’ laughter is counted. A summary of the content
and format of the observation scheme is as follows:
1. Content, the teaching and learning content, such as reading practice, listening
practice, TEM writing training, and so on;
2. Organization, the way the class is organized, such as teacher lecture, student
presentation, pair work, and so on;
3. Materials, e.g. the teaching materials used, such as textbooks, TEM preparation
materials, and so on;
The items in the above three categories are calculated by time. For example, in a
45-min class, if 10 min is devoted to student presentation, then 20% will be recorded
(10/45 = 0.22, which is round down to 20%).
1. Classroom language, the language used by teachers and students, whether it is
the target language English or the mother tongue Chinese;
2. Teacher’s communicative features, how the teacher communicates with the stu-
dents, such as the authenticity of information and the length of questions;
3. Student’s communicative features, how the student communicates with the
teacher, such as the authenticity of questions and the length of answers;
4. Teacher’s feedback, how the teacher responds to students’ answers, whether it is
positive or negative, and whether the teacher focuses on meaning or on form;
4.3 Instruments 97
Categories four to seven employ five-point Likert scale to measure the fre-
quency of each activity, such as whether the teacher extremely frequently, fre-
quently, sometimes, occasionally or seldom (from 5 to 1) uses English in class.
5. Atmosphere, how often students laugh in class. This is counted by how many
times students laugh.
The observation scheme had been piloted in three English classes and revised for
three times before it was finalized.
4.3.2.4 Data Collection
Altogether 52 classes of the six teachers have been observed and it took several
semesters (from 2012 to 2014) to observe all the classes, because in some schools,
TEM preparation and non-preparation courses were not open in the same semester.
To avoid interfering in the classes, the researcher did not observe in the class-
rooms, but used digital voice recorder to record all the classes, which were analyzed
by the researcher afterwards. Of the six teachers, two agreed to have their class
video recorded once, so two individual class hours were recorded for each of them,
and the recording was transcribed to provide a clearer picture of the characteristics
of the English classes (see Appendix II for the transcript). To achieve variety, the
classes video recorded are one TEM8 preparation class and one non-preparation
class of fundamental English.
4.3.2.5 Data Analysis
The analysis was conducted mainly by coding the recorded classes with the help of
the observation scheme. The following procedures had been observed when coding
each lesson:
1. Listen to the recording once without doing any coding in the observation scheme.
Just note down on a blank piece of paper the major classroom events—what was
taught, how they were organized, and what kind of teaching materials were used.
In this way, the class was divided into several parts;
2. Listen to the recording again, using a stop watch and the time meter in the com-
puter to record the time spent on each activity. After listening, calculate the over-
all time spent on each item in the content, organization and materials, and fill in
the blanks in the first three categories;
3. Listen to the recording once more, focusing on the communicative features of
the classroom activities, such as the English used, how the teacher and students
responded to each other, and what the teacher focused on when they provided
students with feedback information;
4. Listen to the recording for the last time, check the previous information and
count how many times students laugh.
After all the classes had been coded, data were put into computer and processed
by SPSS 17.0. Findings of classroom observation will be discussed in Chap. 6.
4.3.3 Interviews
Interviews were a supplementary instrument used in this research. They were con-
ducted to facilitate the development of the questionnaires and the classroom obser-
vation scheme, to identify possible causes of conflicting research findings, and to
verify statistical results in questionnaire surveys. The main participants are teachers
(including the six teachers observed), students (including those whose English
classes had been observed), PhD candidates (most of whom were also teachers), and
school administrators. The interviews were either conducted individually (mainly
with teachers and school administrators) or in groups (mainly with students and
PhD candidates).
Four major rounds of interviews were carried out before and after the question-
naire surveys and the classroom observations. Pre-questionnaire and pre-observa-
tion interviews were conducted to facilitate instrument development, so the results
were the questionnaires and the observation scheme. The participants were asked
about their opinions on the questionnaires and the observation scheme. Therefore,
no fixed questions had been designed, and neither had the interviews been recorded.
Post-questionnaire and post-observation interviews were meant to solve puzzles,
so relevant questions had been designed before the interview (see Appendix III for
the interview questions). Moreover, the interview was recorded using a digital voice
recorder for further analysis. Key findings of the interview had been collected,
sorted, double-checked, and saved in a Microsoft Word file, which shall be dis-
cussed in Chaps. 5 and 6.
4.4 Considerations for Validity and Reliability
Washback is such a complicated phenomenon that both qualitative and quantitative

research methods have been employed to collect data in previous research studies.
In the present research, qualitative data are collected through classroom observation
transcription, interviews and the open questions in the questionnaires, and quantita-
tive data are collected through objective items in the questionnaires and classroom
observation coding schemes. In such a multifarious research, it is necessary to
establish reliability and validity of the quantitative data and credibility, transferabil-
ity, dependability and confirmability of the qualitative data (Watanabe 2004).
The 1995 UCLES (University of Cambridge Local Examinations Syndicate)
brief and resultant papers on IELTS impact recommend the following key actions
and approaches to ensuring validity and reliability of the impact research instru-
mentation and data (quoted in Hawkey 2006: 36):
4.4 Considerations for Validity and Reliability 99
1. a careful exploration of constructs and advance hypotheses by “insiders” and

“outsiders” before the drafting of instruments;
2. checks that every item contributes to the corroboration or refutation of washback
and impact hypotheses;
3. the prediction of participant responses to compare with actual responses at pilot-
ing and trialing stages, with adjustments to instruments made accordingly;
4. the use of expert and stakeholder judgments on draft instruments;
5. the comparison of draft instruments with other instruments;
6. the use of a combination of methods of data collection, approaching validation
from a number of angles with data triangulation used as a check.
In the present research, practical measures were taken in each stage to ensure the
validity and reliability of the instrumentation and the data collected.
4.4.1 The Development of Research Instruments
The following measures have been taken in the development of the research instru-
ments—the questionnaires, classroom observation scheme and interview guiding
questions.
4.4.1.1 The Questionnaires
Questionnaires are a useful tool to collect information from a large number of peo-
ple. However, it is difficult to design a valid and reliable questionnaire because the
researcher and the participants can’t communicate in person when the questionnaire
surveys are conducted. Therefore, it is of vital importance for the researcher to
expect all possibilities and try to avoid ambiguity, misunderstanding and missing
information. It does not mean that the questionnaire should be perfect, but a great
deal of effort should be devoted to make the questionnaires as valid and reliable as
possible. In this regard, the current researcher has taken the following steps in the
designing, trialing and revision of the seven questionnaires used in the research.
First, before drafting the questionnaires, a detailed analysis had been made on
the washback theories, such as the definitions, dimensions and models, to explore
the construct of washback studies.
Second, a careful review was carried out on all the available empirical studies on
washback, to gain insight from relevant questionnaire surveys conducted before.
Third, the researcher had consulted with experts and school administrators, dis-
cussed with PhD candidates and colleagues, and communicated with teachers and
students, to obtain a clear picture of the possible TEM washback.
Last, two rounds of piloting were performed among school administrators, teach-
ers and students, and the expert’s questionnaire was only piloted once due to the
limited number of experts available. Based on the results of the pilot study, a com-
parison was made between the prediction of participant responses and their actual
responses. Meanwhile, every item was checked again to discern their contribution
to the corroboration and refutation of the washback hypotheses established. Based
on the above findings, relevant revisions were made accordingly and the operational
version of the questionnaires was finalized for large scale administration.
4.4.1.2 The Classroom Observation Scheme
To explore the washback of TEM, it is necessary to obtain direct information from

the language classrooms. However, it remains a difficult problem as to which instru-
ment to use. Numerous classroom observation schemes have been developed to
describe classroom teaching and learning practice, such as Communicative
Orientation of Language Teaching Observation Scheme (Frohlich et al. 1985) and
Target Language Observation Scheme (Ullmann and Geva 1982), but they have
been designed to capture the classroom features that the researchers are interested
in. Therefore, these observation schemes are hardly applicable to the current
research context. In this sense, the present researcher has to develop a new observa-
tion scheme to examine the washback effects of TEM on classroom teaching and
learning. The following measures have been taken to ensure the validity and reli-
ability of the observation scheme.
First, before designing the observation scheme, the researcher had made a
detailed analysis of the relevant observation schemes used in previous washback
studies (Cheng 2005; Gu 2007; Qi 2004; Wall 2005; Watanabe 1997; etc.), as well
as the established observation schemes in language teaching and learning research
(Frohlich et al. 1985; Ullmann and Geva 1982; etc.).
Second, predictions had been made on the possible washback of TEM, based on
the following research findings: (1) the analysis of the Teaching Syllabus, TEM Test
Syllabuses, and other relevant documents; (2) the discussion with experts, PhD can-
didates, school administrators, teachers and students; and (3) empirical findings of
previous washback studies.
Last, after the observation scheme was drafted, it was piloted in three English
classrooms, one TEM4 non-preparation, one TEM8-non-preparation and one TEM4
preparation class. After each pilot lesson, the observation scheme was revised
accordingly, based on the research findings and the advice from the observed teach-
ers and the researcher’s PhD classmates. After three rounds of piloting and revision,
the operational draft of TEM washback observation scheme was finalized.
4.4.1.3 The Interview Questions
In the current research, quite a few interviews have been conducted to seek the opin-
ions, advice, and suggestions from experts, PhD candidates, school administrators,
teachers and students. Among all the interviews, the post-questionnaire and post-
observation interviews were conducted formally, with guiding questions prepared
beforehand. The purpose was to collect qualitative data to confirm research findings
from the questionnaire surveys and classroom observations, to seek answers to the
contradictory findings, and to obtain supplementary information when further ques-
tions need to be asked based on previous research findings. When designing the inter-
view questions, the following steps have been taken to ensure validity and reliability.
First, a detailed analysis was made on the research findings of the questionnaire
surveys and classroom observations, to locate the problems that need to be tackled.
Second, the researcher has consulted the experts, PhD candidates, school admin-
istrators, teachers and students for their opinions on the interview questions.
Last, the interview questions had been piloted twice among teachers and stu-
dents, and revised accordingly, before the final version was completed.
4.4.2 Data Collection and Analysis
Practical measures have been taken when collecting and analyzing data through
questionnaire surveys, classroom observations and interviews, to ensure validity
and reliability of the data.
4.4.2.1 The Questionnaire Surveys
Altogether seven questionnaire surveys were conducted among experts, TEM4 pro-
gram administrators, TEM8 program administrators, TEM4 teachers, TEM8 teach-
ers, TEM4 students and TEM8 students. The expert’s questionnaire was distributed
in a TEM construction conference, and no time limit was set for the survey. Four
experts returned the questionnaires during the conference, and one took it back
home to finish and mailed it to the researcher 2 weeks later. All the experts provided
their names and E-mail addresses, willing to offer additional information if needed.
The other six questionnaires were mailed to the TEM participating schools
together with the TEM test papers in 2012, but in a separate package which could
be opened before the administration of the test. One copy of program administra-
tor’s questionnaire and five copies of teacher’s questionnaire were sent to each TEM
participating school. The personnel in charge of TEM in each school were requested
to find the relevant administrators and teachers to complete the questionnaires. No
time limit was set and they may return the questionnaire any time before the admin-
istration of TEM. The student’s questionnaire was distributed to 250 schools
selected from all the TEM participating schools, based on region, school type and
school level, to ensure that the research covered a representative sample. The ques-
tionnaire survey was conducted in the test room after the TEM test was completed,
and no time limit was set. All the six questionnaires were conducted anonymously,
to ensure that the participants would feel free to voice their opinions. Moreover, if
they were unwilling to complete the questionnaires, they could just refuse or leave
the questionnaires unfinished. The blank questionnaires and questionnaires with
more than 20% of the questions unanswered would be sorted out and marked as
invalid questionnaires. This may account for the fact that the valid rate of student’s
questionnaire is comparatively low, 71.7% for TEM4 student’s questionnaire and
56.5% for TEM8 student’s questionnaire. Some candidates were unwilling to par-
ticipate in the survey after taking TEM, so they did not complete the questionnaires.
Since it was an anonymous questionnaire, they felt free to hand in a blank or incom-
plete questionnaire. This practice would, to some extent, uphold the validity of the
questionnaire surveys, because the participants were more likely to voice their true
beliefs when they were willing to complete the questionnaire.
Three criteria were followed while sorting the questionnaires (refer to Sect.
4.3.1.2), and the researcher did all the sorting by herself, because it is believed that
the sorting criteria could be better observed if it was done by the same person. This
may also contribute to the reliability of the questionnaire surveys.
The questionnaire data were put into computer by the present researcher with the
help of six college students and two graduate students. Answers to multiple-choice
and five-point Likert scale questions were processed by SPSS 17.0, while answers
to short answer questions were stored in a Microsoft Word file. The SPSS data files
and the Word files were created by the researcher, and the eight research assistants
were requested to work in pairs to enter data, one reading aloud the answer and the
other inputting data, so that their work was double-checked, which ensured the reli-
ability of data.
Moreover, a reliability test was performed on the questionnaires to ensure their
inner consistency, and factor analysis was conducted to discern their validity (due to
its limited sample, the expert’s questionnaire did not undergo these tests). The test
results indicate that the reliability and validity of the questionnaire surveys were
satisfactory (refer to Sect. 4.3.1.3 for the test results).
4.4.2.2 The Classroom Observations
According to Patton (1990), one major threat to reliability and validity of observa-
tion data is the presence of a “stranger” in the classroom—the effect of the observer
on what is observed. To reduce this negative effect, the researcher decided to record
the classes for further analysis, instead of sitting in the classrooms observing. Each
teacher observed was provided a digital voice recorder, and they were free to decide
which class to record and how many classes to record. It was suggested that they
record classes on different themes, such as writing, translation, reading, and so on,
if it was TEM preparation classes, so that the classes observed could cover a wide
range. While for non-TEM preparation classes, they were requested to record sev-
eral consecutive classes because these classes would usually cover one unit, thus
focusing on various teaching content, such as vocabulary building, text analysis and
exercise explanation. When the teachers felt ready to have their classes recorded,
they could just switch on the digital voice recorder on their desk, which would not
interfere with their classroom teaching. In this sense, the validity of the observation
data collected can be ensured.
Table 4.7 Inter-coder reliability (first round)

A B
A Pearson Correlation 1 .875a
Sig. (2-tailed) .000
N 220 220
B Pearson Correlation .875a 1
N 220 220
Correlation is significant at the 0.01 level (2-tailed)
a
Table 4.8 Inter-coder reliability (second round)

A B
A Pearson Correlation 1 .967a
N 220 220
B Pearson Correlation .967a 1
N 220 220
a
Another threat to reliability is inconsistent coding, which has been tackled in the
following three ways.
First, a four-step coding procedure was developed to standardize the coding pro-
cess, so that no important information would be missed (refer to Sect. 4.3.2.5 ).
Second, a PhD candidate majoring in English pedagogy and testing was invited
to recode five classes (10% of the total classes observed), which were randomly
selected from all the 52 classes observed. A correlation test showed that the reli-
ability was 0.875, and a further analysis of the data indicated that major differences
arose from the following three aspects: (1) the distinction between teacher-fronted
interaction and teacher-student interaction; (2) the definition of genuine informa-
tion; and (3) length of questions and responses (Table 4.7).
After extensive discussion and consultation, these three terms were further
defined. Teacher-fronted interaction would include only the teacher asking and the
whole class responding pattern. Genuine information refers to any information that
is unknown to the person who asks the question, so it would be considered a genuine
question if the teacher asks, “How do you understand …?”, while “What is the
meaning of arbitrary?” would be considered not genuine. As for the length of ques-
tions and responses, it is agreed that on the five point Likert scale, (1) means word;
(2, phrase; (3), a single sentence; (4), 2–3 sentences; and (5), more than 3
sentences.
With the agreed criteria, five more classes were randomly selected for recoding,
and the reliability increased to 0.967, which suggested the satisfactory inter-coder
consistency (Table 4.8).
Table 4.9 Intra-coder reliability

A1 A2
A1 Pearson Correlation 1 .987a
N 440 440
A2 Pearson Correlation .987a 1
N 440 440
a
Last, ten classes (20% of the total classes observed) were selected randomly for
recoding by the researcher after an interval of 1 month, and the intra-coder reliabil-
ity was 0.987, which indicated the satisfactory intra-coder consistency (Table 4.9).
4.4.2.3 The Interviews
The interviews were conducted either individually or in groups, and both telephone
interviews and face to face interviews were done, depending on the situation. To
ensure validity and reliability of interview data, the researcher took the following
measures.
First, a list of questions was prepared beforehand, but in the interview, additional
questions might be asked based on the response of the participants. Questions were
repeated and explained if the participants had any puzzles.
Second, a digital voice recorder was used to record post-questionnaire and post-
observation interviews to conduct further analysis.
Last, the participants might be interviewed again if any problems arise during
data collection and analysis.
4.4.3 Triangulation
According to Denzin (1978), triangulation refers to the combination of methodolo-

gies in the study of the same phenomenon, which means two or more methods are
to be used in a study in order to check results. This is a powerful technique that
facilitates validation of data through verification from two or more sources (Bogdan
and Biklen 2006). Therefore, in the current research, triangulation is achieved from
different approaches, such as data triangulation and methodological triangulation,
to enhance validity of the research.
4.5 Summary 105
4.4.3.1 Data Triangulation
Data triangulation refers to the use of different sources of information to increase

the validity of a research, which is well-suited for the current research since differ-
ent stakeholder groups had been investigated. Therefore, in the study, responses
from the following stakeholders would be compared: (1) experts; (2) program
administrators; (3) teachers; and (4) students. Through the comparison and analysis,
areas of agreement and divergence can be identified, which may contribute to the
validation of the study.
4.4.3.2 Methodological Triangulation
Various research methods have been employed in the present research, such as doc-
ument analysis, questionnaire surveys, classroom observations and interviews.
Therefore, data collected by different means can be compared, which are listed as
follows:
1. closed and open-ended questionnaire responses;
2. questionnaire responses and classroom observation findings;
3. questionnaire responses and interview findings;
4. classroom observation and interview findings;
5. current research findings and previous research findings collected through docu-
ment analysis.
4.5 Summary
This chapter began with a washback model employed in the current study, listing
the contributing factors and working mechanism of TEM washback. A multi-phase
and muti-method research framework was developed accordingy, employing vari-
ous research methods to investigate the washback effects of TEM on the perceptions
and practice of related experts, program administrators, teachers and students.
Then, a detailed description was provided of the research participants and instru-
ments—how the participants were selected, and how the research instruments, such
as questionnaires and classroom observation scheme, were developed, piloted,
revised and finalized.
Next, data collection and analysis procedures were explained, with the prelimi-
nary analysis results listed in tables, such as the reliability test and factor analysis
results. The research data would be further analyzed in Chaps. 5 and 6.
Last, a detailed account was provided of the measures taken to ensure validity
and reliability of the research. Washback is such a complicated phenomenon that it
is essential to bear validity and reliability in mind throughout. The research instru-
ments were validated, the data collection procedures were standardized and the
research findings from various sources were triangulated, so that the research results
are expected to be valid and reliable.
Chapter 5
Stakeholders’ Perceptions of TEM and Its
Washback
This chapter presents research findings from the questionnaire surveys. The
responses of experts, program administrators, teachers and students will be ana-
lyzed one by one, to identify their respective evaluation of TEM and its washback.
The sequence in the discussion is as follows: experts, program administrators,
teachers and students, according to the three levels presented in the educational
hierarchy (Fig. 1.1). The experts are in the top level, responsible for the Teaching
Syllabus development and test construction, thus more familiar with the intended
washback of TEM; the program administrators are in the middle, who would
develop their school syllabuses according to both the Teaching Syllabus and the
situation of their schools, so they are familiar with the washback of TEM in the
school level; teachers and students are the practitioners of the Teaching Syllabus
and TEM Test Syllabuses, so they are familiar with the washback of TEM on class-
room practice. In this view, the opinions of experts, program administrators, teach-
ers and students are to be analyzed one by one, and then a comparison would be
made to identify areas of agreement and divergence.
5.1 Experts’ Perceptions
Five experts participated in the questionnaire survey, and gave detailed answers to
the open-ended questions. Due to the limited number of participants, more verbal
account of the research findings would be presented than statistical illustrations.

https://doi.org/10.1007/978-981-13-1963-1_5
108 5 Stakeholders’ Perceptions of TEM and Its Washback
5.1.1 Demographic Information
The five experts surveyed are all university professors in English with more than
20 years of teaching experience. Two are from comprehensive universities, two
from foreign language universities and the fifth from a university of science and
technology.
Among the five experts, three have participated in the development of both the
Teaching Syllabus and the TEM Test Syllabuses, and four have participated in the
TEM test construction.
5.1.2 Experts’ Evaluation of TEM
As asserted by the experts, the TEM Test Syllabuses are based on the Teaching
Syllabus and the TEM tests are constructed according to the TEM Test Syllabuses,
so TEM can measure whether students have met the required English proficiency
specified in the Teaching Syllabus.
5.1.2.1 TEM Characteristics
Generally speaking, TEM was positively evaluated by all the experts, who approved
of, or generally approved of the design of TEM, including its content, item types
employed, time allotment, score allotment and marking criteria. Moreover, some
suggestions have been made on the improvement of TEM, which are summarized as
follows.
First, the test content of TEM, especially which of TEM8, should include more
“professional knowledge for English majors”, such as linguistics, literature and
culture of English speaking countries. This may help teachers and students develop
a keener awareness of the difference between English majors and non English
majors.
Second, the item types employed in TEM are mainly traditional, such as multiple
choice, translation and writing, so they could be appropriately modified to include
more integrative items. Besides, the test content and test method of TEM should be
more closely integrated, and appropriate test method can be selected according to
different test content.
Last, the marking criteria of TEM are well developed and easy to follow, but
more importance should be attached to its guiding role in language teaching. For
instance, the marking criteria of writing should be able to guide the teaching of writ-
ing for English majors.
5.1 Experts’ Perceptions 109
5.1.2.2 Use and Interpretation of TEM Scores
Validity involves the quality of test use and interpretation, or the extent to which the
inferences or decisions made on the basis of test scores are meaningful, appropriate
and useful (AERA, APA & NCME 1985: 9). Therefore, it is essential to investigate
the use and interpretation of TEM scores.
According to the TEM8 Test Syllabus, the intended use and interpretation of the
test scores are:
1. to check and promote the implementation of the Teaching Syllabus;
2. to measure students’ language proficiency and to evaluate the overall language
teaching standard of a university;
3. to provide feedback to language teaching and learning.
However, due to the increasing social recognition of TEM, the test results have
been used to perform other functions like conferring bachelor’s degree, ranking
schools and evaluating teachers.
According to the experts, the unintended use of TEM scores should not be
encouraged because they will lead to some negative washback. First, to link TEM
certificates to the awarding of bachelor’s degree degrees may encourage test-ori-
ented learning, and some test takers may even take the risk of cheating because the
certificate means a lot to them. Next, teaching effects may depend on many factors,
so it is unjustifiable to use TEM results as the sole criterion to evaluate teaching.
Last, to rank schools based on TEM results may lead to undesirable competition
between schools, which is harmful to the healthy development of the schools.
5.1.3 Experts’ Perception of TEM Washback
All the experts agreed that TEM has exerted positive washback on teaching and learn-
ing. TEM can check and promote the implementation of the Teaching Syllabus; TEM
provides feedback information to teachers and students, who can adjust their teaching
and learning accordingly; and TEM may help improve teaching and promote reform.
If TEM is to better play its positive role in teaching and learning, the test reports
should be improved, as most experts pointed out. They suggested that more infor-
mation should be provided in the test reports, to include not only a rank, such as
Excellent, Good, Pass and Fail, but also the total score and scores for each section.
If possible, verbal description of the language ability for each score band should be
provided, so that teachers and candidates can better interpret the test results.
Moreover, the number of students in each score band can be provided to give stu-
dents a rough idea of how they rank in the test population. In a word, the purpose of
providing a test report is to facilitate teaching and learning, so the teachers’ and
candidates’ needs should be considered.
However, it is also likely that TEM may have some negative effects, such as inter-
fering in the teaching plan and promoting test taking technique training. Nevertheless,
negative effects can be avoided if TEM test and its scores are properly used.
5.1.4 Factors Influencing Teaching
It is agreed that TEM exerts washback on teaching and learning. However, is it a

decisive factor compared with other factors involved in teaching? According to the
experts, the top three factors were curriculum design, teacher quality and student
efforts, while no one considered TEM a decisive factor in teaching.
5.2 Program Administrators’ Perceptions
Program administrators are the stakeholders in the school level, who will set the
teaching aims and design the curriculum in their respective schools, based on the
Teaching Syllabus and other related factors, such as the language proficiency of the
students, the available teaching staff and resources, and perhaps, the TEM
Syllabuses. Therefore, it is of significance to survey their opinions on the validity of
TEM and TEM washback, to find out the effects of TEM on the English curriculum
in different schools.
Altogether 1510 program administrators responded to the questionnaires, including

724 TEM4 program administrators and 786 TEM8 program administrators. Among
them, some are deans of the English department at their respective universities,
some are directors in charge of English language teaching in departments of foreign
languages, and the rest are also leaders, responsible for English language teaching
in the foundation or advanced stage (i.e. the first 2 years or the last 2 years of the
undergraduate program) (Table 5.1).
From Table 5.1, it can be concluded that most program administrators surveyed
were well educated, with a master or PhD degree, and they were experienced teach-
ers as well, with the majority having more than 15 years of teaching experience.
Therefore, their opinions on TEM and its washback could provide us with illuminat-
ing insight into this issue.
5.2.2 Program Administrators’ Evaluation of TEM
Program administrators are often responsible for designing school curricula and
setting teaching objectives for English majors in their schools, so they should be
more concerned about the Teaching Syllabus and TEM. As a result, they are
expected to make sensible evaluation of TEM.
5.2 Program Administrators’ Perceptions 111
Table 5.1 Demographic information of program administrators

Position:
Dean of English Director in charge Other
department of English teaching
TEM4 27.3% 51.1% 21.6%
TEM8 33.4% 47.1% 19.5%
Gender:
Male Female
TEM4 45.1% 54.9%
TEM8 49.9% 50.1%
Teaching experience:
0–5 years 6–10 years 11–15 years More than
15 years
TEM4 3.5% 11.0% 17.9% 67.6%
TEM8 2.4% 8.1% 15.6% 73.9%
Work experience as a leader:
15 years
TEM4 49.8% 31.2% 10.8% 8.2%
TEM8 45.4% 34.6% 12.0% 8.5%
Degree:
Bachelor Master PhD Other
TEM4 20.1% 61.6% 17.2% 1.2%
TEM8 21.9% 54.4% 22.7% 1.0%
University type:
Comprehensive University of Language Normal Other
university science and university university
technology
TEM4 38.8% 26.0% 4.9% 15.5% 14.9%
TEM8 40.4% 27.5% 4.2% 14.9% 12.9%
Note: the percentage of each part may add up to 100.1% or 99.9%, because it is round to the near-
est one decimal point
5.2.2.1 Familiarity with TEM
Before investigating their evaluation of TEM, the current research first surveyed the
program administrators on their familiarity with TEM, such as the Test Syllabus,
marking criteria, and so on.
Most program administrators surveyed were familiar with TEM, as indicated in
Table 5.2. Over 90% of the participants were familiar with the Teaching Syllabus
and the TEM Test Syllabus. However, the marking criteria were less known, famil-
iar to around 60% of the participants, and the Oral Test Syllabus was the least
known, familiar to only around 50% of the participants. Comparatively, TEM4
seems to enjoy a higher familiarity rate, especially in terms of the Oral Test Syllabus
and the marking criteria.
Table 5.2 Familiarity with TEM

TEM4 TEM8
Familiar Neutral Unfamiliar Familiar Neutral Unfamiliar
Teaching syllabus 96.3% 3.5% 0.2% 96.5% 2.8% 0.7%
Test syllabus 93.6% 5.8% 0.7% 92.7% 5.9% 1.4%
Oral test syllabus 57.8% 22.3% 19.9% 51.3% 21.3% 27.4%
Time allotment 90.7% 6.9% 2.4% 87.8% 9.9% 2.3%
Marking criteria 72.6% 16.6% 10.9% 65.4% 19.9% 14.7%
Compared with TEM reaching an annual test population of over 200,000, the
number of the Oral test takers is much smaller, about 10,000–20,000 every year,
which means that many universities do not have students taking the test. So, it natu-
rally follows that only half of the program administrators surveyed were familiar
with the oral test syllabus. However, the finding deserves our close attention. As
TEM is supposed to assess test takers’ language proficiency specified in the Teaching
Syllabus, and speaking is an indispensable part of it, the relatively low familiarity of
the TEM-Oral test is not conducive to promoting positive washback. Therefore, it is
necessary to popularize TEM-Oral, to make it better known to teachers and students
so as to promote a balanced development in students’ language proficiency.
Marking criteria usually embody a test’s purpose and focus, and tests, especially
criterion-referenced tests like TEM, can hardly achieve expected goals if their mark-
ing criteria are unfamiliar to teachers and students. For instance, a scoring scheme
combining features of holistic and analytic marking is employed to grade TEM
writing scripts. The scheme focuses on language use as well as ideas and arguments.
Familiarity with the scoring scheme can help teachers set course objectives and plan
their lessons in line with the writing requirements in the Teaching Syllabus, because
the scoring scheme incorporates these writing requirements. In this respect, regular
communication between test constructors and test users, teachers in particular, is of
much importance. Various ways of communication can be employed, such as pro-
viding detailed marking criteria with samples, offering video training sessions, so as
to foster or intensify positive washback on teaching and learning.
5.2.2.2 Evaluation of TEM Test Quality
Another area of inquiry in the questionnaire is program administrators’ evaluation

of TEM, including alignment with the Teaching Syllabus, test content, test method
and test administration. As is shown in Table 5.3, TEM was positively evaluated by
most program administrators, with a satisfaction rate of over 75% for all the items.
Among the 10 items in the questionnaire, test content received the highest satisfac-
tion rate whereas test report had the lowest one. Moreover, of all the items, marking
criteria received the highest rate of “Not clear”, which is consistent with the previ-
ous finding of its unfamiliarity to program administrators. Comparatively, TEM4
had a higher satisfaction rate than TEM8, except for the item of “test report”.
Table 5.3 Evaluation of TEM

TEM4 TEM8
Satisfy Not clear Dissatisfy Satisfy Not clear Dissatisfy
Alignment with the 93.9% 5.2% 0.9% 92.2% 6.4% 1.4%
teaching syllabus
Content 97.0% 1.9% 1.1% 95.7% 3.0% 1.2%
Item type 94.7% 3.7% 1.5% 93.8% 4.0% 2.2%
Time allotment 91.0% 6.1% 2.8% 87.5% 8.8% 3.7%
Score allotment 91.4% 6.8% 1.8% 90.4% 7.7% 1.9%
Convenient administration 89.4% 7.8% 2.8% 80.7% 10.9% 8.4%
Fair administration 87.9% 8.8% 3.3% 85.5% 8.8% 5.6%
Strict administration 91.7% 5.1% 3.2% 90.2% 5.3% 4.5%
Test report 78.5% 11.7% 9.8% 80.9% 9.6% 9.5%
As is shown in Table 5.3, the overall satisfaction rate is fairly high, with five
items above 90%. Among these five items, four are concerned with the test itself:
alignment with the Teaching Syllabus, test content, item type, and score allotment;
one is related to test administration, strictness of administration. No matter whether
these five items refer to internal or external factors, a high satisfaction rate seems to
suggest that TEM, as a criterion-referenced test, has satisfactory validity.
Feedback from the questionnaire not only provides test developers with achieve-
ments of the test, but also helps them identify possible areas of revision. As is men-
tioned above, convenience of test administration for TEM8 had received a
comparatively low satisfaction rate. From participants’ responses to the short-
answer question, it becomes clear that the problem lies with the first task in TEM8
listening section. The first task combines note taking with gap filling. And the invig-
ilators’ manual has stipulated a series of steps to be followed in test administration,
which some program administrators said were “quite complicated”. As TEM8 is a
large-scale high-stakes test, a minor mistake might cause the listening section to be
improperly delivered, which could then result in unfairness. Thus, a review of the
test invigilation procedures is essential with a view to improving practicality and
thus maintaining validity.
Moreover, the satisfaction rate for test report, both TEM4 and TEM8, is low, just
around 80%. However, to make better use of the feedback information of TEM, a
detailed test report is essential. TEM test report is just a certificate, dividing the test
takers into excellent, good and pass (those who fail the test will get no certificates).
The test takers can check their overall score in their corresponding schools, but the
scores for each section are not provided. In this sense, the information provided in
TEM test report is quite limited.
As shown in Table 5.4, most program administrators considered it necessary to
report not only a composite score, but also section scores and ability description.
Moreover, information of students’ ranking in the test population was considered
important by around half of the program administrators, and some of them wanted
Table 5.4 Information to be provided in the test report

TEM4 TEM8
Necessary Unnecessary Necessary Unnecessary
Composite score 78.4% 21.6% 78.9% 21.1%
Section score 89.7% 10.3% 85.7% 14.3%
Ability description 67.9% 32.1% 66.6% 33.4%
Rank in the test population 57.3% 42.7% 54.5% 45.5%
Other information 2.1% 97.9% 2.4% 97.6%
Table 5.5 Use and interpretation of TEM scores

TEM4 TEM8
Agree Not clear Disagree Agree Not clear Disagree
Reflecting syllabus 83.1% 10.7% 6.2% 79.6% 13.3% 7.1%
implementation
Promoting syllabus 84.4% 11.6% 4.1% 83.3% 12.0% 4.7%
implementation
Promoting teaching reform 79.5% 13.7% 5.0% 79.0% 12.7% 8.3%
Revealing problems 80.0% 12.2% 7.8% 77.7% 13.3% 9.0%
&achievements in teaching
Reflecting teaching quality 72.4% 15.6% 11.9% 72.0% 16.0% 12.0%
Reflecting students’ language 80.4% 13.6% 6.0% 77.8% 13.6% 8.6%
proficiency
Assessing students’ learning 86.3% 9.5% 4.2% 78.7% 12.1% 9.2%
outcome
Predicting students’ future 63.5% 24.2% 12.3% 63.6% 20.2% 16.2%
performance
Feedback to course design 77.0% 13.7% 9.3% 77.7% 13.3% 9.0%
Feedback to teaching aim 83.6% 11.4% 5.0% 78.6% 11.3% 10.1%
Feedback to classroom teaching 81.3% 11.8% 6.8% 78.7% 13.2% 8.1%
other information, such as ranking of the schools, strengths and weaknesses reflected
in the tests, and so on. Therefore, it is desirable to improve the test report of TEM,
to provide more information, so that TEM can play a better role in teaching and
learning. Just as an expert pointed out, when drafting a test report, importance
should be attached to “the relation between scores and teaching, the feedback a test
report can provide for teaching and learning”.
As indicated in Table 5.5, around 80% of the program administrators believed that
TEM scores can reflect and promote syllabus implementation at the university level,
and it can measure students’ language proficiency. Meanwhile, TEM scores can
provide feedback to course design and classroom teaching, thus promoting teaching
Table 5.6 Influence of TEM scores

TEM4 TEM8
Affect university ranking 61.1% 25.9% 13.0% 60.5% 25.1% 14.4%
Affect department ranking 58.8% 23.0% 18.3% 57.5% 25.1% 17.4%
Affect teacher evaluation 39.6% 27.4% 33.0% 35.2% 24.9% 39.8%
Affect student ranking 41.8% 26.3% 31.9% 37.5% 24.2% 38.4%
Affect employment 79.8% 11.6% 8.6% 86.2% 8.4% 5.4%
Affect further education 67.7% 18.2% 14.2% 72.0% 16.0% 12.0%
reform. As to whether TEM scores can reflect teaching quality and students’ future
performance, fewer program administrators held a positive view. Therefore, it was
doubtful whether TEM results could be used as a criterion for teacher evaluation
and job application.
Table 5.6 provides information on the impact of TEM scores. As can be seen in
the table, TEM scores seem to affect students’ future career and education much
more than rankings of various kinds.
It can be concluded from the above data that TEM has, to some extent, achieved
its intended purpose as is specified in the Test Syllabus. In this sense, TEM scores
are used and interpreted in the intended way. Of all the items in Table 5.5, “predict-
ing students’ future performance” received the lowest percentage point (63.5% and
63.6%). This has, from a different perspective, attested to the intended use of TEM,
which, as a criterion-referenced test, is designed to evaluate learning outcome
according to predetermined criteria, rather than predict what students can do in the
future. However, as Table 5.6 indicates, TEM scores have a much greater impact on
students’ future career and future education, especially the former. This has aroused
concern from TEM test developers, for TEM is not designed to predict students’
performance in their future career. Using TEM scores to make employment deci-
sions may incur problems.
Besides its unintended use in job recruitment, TEM scores were believed to
affect school ranking and teacher evaluation (Table 5.6). However, were these prac-
tices common in the participating universities? According to Table 5.7, the most
common practice is to relate TEM4 score to the conferring of bachelor’s degree
degree, which was selected by one third of the program administrators. While for
the other uses, only about 10% of the program administrators selected “yes”, claim-
ing that TEM scores were used to rank schools, evaluate teachers or confer degrees
in their schools, which indicates that these are not common practices. However,
program administrators were almost equally divided in their attitudes towards the
four practices, as is shown in Table 5.8.
Two points came up in Tables 5.7 and 5.8: (1) unintended use of TEM scores in
participating universities was not common, which is conducive to maintaining and
promoting positive washback; (2) more than half of the program administrators
surveyed were not against using TEM scores, especially TEM4 scores, in unin-
tended ways, which poses potential hazards in misuse of test scores.
Table 5.7 Unintended use of TEM scores

TEM4 TEM8
Yes No Not clear Yes No Not clear
Conferring degrees 33.5% 65.1% 1.4% 12.4% 86.9% 7.0%
Evaluating teachers 14.7% 83.7% 1.6% 8.3% 91.4% 3.0%
Evaluating leaders 18.6% 78.4% 3.0% 13.2% 84.7% 2.1%
Ranking schools 14.6% 59.4% 26.0% 11.8% 57.0% 31.2%
Table 5.8 Attitudes towards unintended uses of TEM scores

TEM4 TEM8
Agree Neutral Disagree Agree Neutral Disagree
Evaluating leaders 52.1% 17.8% 30.0% 42.1% 14.6% 43.3%
Ranking schools 48.0% 20.6% 41.1% 30.4% 17.5% 52.1%
Therefore, test developers are expected to “anticipate the common uses of certain
kinds of tests, and the potential consequences of such use, even if these uses are not
explicitly advocated” (Kane 2001: 338). It is advisable that TEM test developers
identify possible uses and interpretations of TEM scores and provide guidelines on
how decisions about scores are to be made.
5.2.3 Program Administrators’ Perception of TEM Washback
It is generally assumed that tests may exert influence on teaching and learning, and
the higher the stakes, the more intense its washback. The existing research on wash-
back effects indicate that washback effects do exist, and they may take different
forms in different settings or among different stakeholders. Moreover, a test is more
likely to influence teaching content than teaching method (Cheng 2005; Gu 2007;
Wall 1996).
Table 5.9 provides program administrators’ perception of TEM washback.
Generally speaking, their views on washback were positive, though their opinions
varied on some items.
From Table 5.9, it can be inferred that most program administrators held a posi-
tive attitude towards TEM, especially its overall washback on program design and
(classroom) teaching. With regard to other items, TEM was perceived to have
exerted more positive influence on setting teaching aims (85.2% and 83.5%) than
scheduling courses (75.0% and 71.4%); and teaching method (27.6% and 31.0%)
Table 5.9 Views on TEM washback

TEM4 TEM8
Positive No washback Negative Positive No washback Negative
Overall washback 86.8% 11.6% 1.5% 84.5% 14.6% 0.9%
on program design
on teaching
Teaching aims 85.2% 13.6% 1.1% 83.5% 15.7% 0.8%
Course design 78.3% 20.1% 1.7% 75.7% 23.2% 1.1%
Teaching schedule 75.0% 22.3% 2.7% 71.4% 26.5% 2.1%
Teaching content 78.2% 18.9% 2.8% 76.1% 22.2% 1.7%
Teaching method 68.6% 27.6% 3.8% 67.2% 31.0% 1.8%
Teaching ideology 68.4% 27.1% 4.6% 70.6% 26.4% 2.9%
Teaching attitude 72.9% 25.6% 1.6% 73.0% 25.7% 1.3%
Teaching depth 75.6% 21.5% 2.9% 75.8% 22.1% 2.1%
Teaching speed 66.3% 30.0% 3.7% 61.2% 35.3% 3.5%
Table 5.10 Washback exerted by different aspects of TEM

TEM4 TEM8
Test content 83.4% 15.3% 1.3% 82.7% 16.3% 0.9%
Test method 81.7% 16.8% 1.5% 78.3% 20.1% 1.5%
Score allotment 71.0% 27.6% 1.5% 64.7% 33.7% 1.5%
Time allotment 65.6% 32.7% 1.7% 60.5% 37.9% 1.6%
seemed to be influenced less than teaching content (18.9% and 22.2%), which com-
plies with previous research findings (Cheng 2005; Gu 2007; Wall 2005). In addi-
tion, TEM also seemed to have impact on teaching ideology and attitude in a positive
direction. On the whole, the statistics have shown that TEM has had positive effects
on teaching, but the intensity of its effect varied across different aspects of the
teaching context.
A further investigation was made to find out which aspect of TEM exerted more
washback effects (Table 5.10).
According to most program administrators, various aspects of TEM exerted posi-
tive washback on course design, and the influence of test content and method sur-
passed that of the other aspects. However, marking criteria, an aspect that should
play a guiding role, were not considered important, with around 30% of the program
administrators selecting “no washback”. Possible reason is that some program
administrators were not familiar with the marking criteria (see Table 5.2).
5.2.4 Washback on Course Design
Table 5.9 also indicates that around 75% of the program administrators held a posi-
tive view on TEM8 washback on course design, which could further be manifested
in two aspects: introduction of TEM related courses and TEM preparation courses,
as the following tables show.
TEM4 consists of six parts—dictation, listening, grammar and vocabulary, cloze,
reading and writing, and TEM8 also consists of six parts—listening, reading, gen-
eral knowledge, proofreading, translation and writing. According to the Teaching
Syllabus, the recommended courses for students in the fundamental and advanced
stage are:
1. Fundamental stage: fundamental English, phonetics, listening, speaking, read-
ing, writing, grammar, and culture of English speaking countries;
2. Advanced stage: advanced English, writing, translation, oral interpretation, aca-
demic writing, linguistics and literature.
As is indicated in Table 5.11, TEM related courses seem to be more often intro-
duced than non-related courses, especially for TEM8 non-related courses such as
academic writing and interpretation, which may be viewed as a sign of TEM wash-
back on teaching as well. Moreover, in the fundamental stage, skill training courses
(listening, reading, speaking and writing) were more often introduced than courses
on language knowledge, such as grammar and phonetics, which may indicate that
the focus of college English teaching is language use rather than language
knowledge.
TEM underwent some substantial revisions in 2005. In TEM4, the requirement
on listening, writing and vocabulary was raised, and the item of fast reading was
deleted; while in TEM8, the requirement on listening, reading and writing was
raised and a new component, General Knowledge, was introduced to assess stu-
dents’ knowledge of linguistics, literature and culture. The current research thus
makes an initial investigation into the influence of TEM revisions on course design,
as shown in Table 5.12.
As for TEM4 revision, it seems that the raised requirement exerted more influ-
ence on course design than the absence of a particular test component. A possible
reason is that fast reading is only a kind of reading skill, so the deletion of fast read-
ing would not influence the introduction of the reading course. While for TEM8
revision, the newly introduced component, General Knowledge, was expected to
exert more influence on course design. However, our research findings indicate that
listening and writing revisions seemed to have affected course design to a greater
extent. This phenomenon was explained by some program administrators in the
short-answer question. Special courses on listening and writing had been introduced
in some institutions for juniors and seniors because these skills were generally sup-
posed to be more difficult for students. For example, many institutions offered
advanced listening courses, even though listening is not a compulsory subject for
the advanced stage in the Teaching Syllabus. Thus, higher test requirements
Table 5.11 Courses introduced by TEM participating universities
TEM4
Fundamental Phonetics Listening Speaking Reading Writing Grammar
English
5.2 Program Administrators’ Perceptions
Not introduced 3.0% 10.2% 2.7% 3.4% 5.9% 4.1% 12.5%

Introduced 97.0% 89.8% 97.3% 96.6% 94.1% 95.9% 87.5%
TEM8
Advanced Writing Translation Interpretation Academic writing Linguistics Literature Culture
English
Not introduced 4.1% 4.5% 12.7% 16.9% 24.1% 8.3% 5.4% 7.7%
Introduced 95.9% 95.5% 87.3% 83.1% 75.9% 91.7% 94.6% 92.3 %
Note: Culture (of English Speaking Countries) is included in TEM8 because it is the content domain of 3–4 questions in the General Knowledge Section of
TEM8, despite the fact that it is a compulsory course for the sophomore year
119
Table 5.12 Washback from TEM revisions

TEM4
Listening Fast reading Writing Vocabulary
Not influenced 6.2% 17.9% 4.7% 6.3%
Not clear 9.0% 14.4% 8.6% 8.1%
Influenced 84.9% 67.7% 86.7% 85.6%
TEM8
Listening Reading Writing General Knowledge
Not influenced 8.5% 17.4% 10.1% 12.6%
Not clear 11.8% 13.4% 9.4% 11.0%
Influenced 79.8% 69.2% 80.4% 76.4%
prompted universities to continue to focus on skill training, which is an encouraging

sign of TEM8 washback on course design. As Table 5.12 shows, participants’ per-
ception of the General Knowledge component varied. Responses to the short-answer
question indicate that some admitted there had been a revision to their school cur-
riculum accordingly to introduce related courses; some believed what was tested
was too easy and not worth preparing; still others claimed that linguistics, culture
and literature had long been part of their curriculum, and thus there was no need to
revise the existing curriculum. In a word, TEM has exerted some influence on
course design, but different schools responded in different ways.
As for TEM preparation courses, they were offered by some schools, but mostly
in the form of lectures, as is indicated by Table 5.13. It seems that TEM4 prepara-
tion courses were more often introduced than TEM8 preparation courses, and the
length of preparation is longer. However, for both TEM4 and TEM8, when special
training courses were offered, more than half lasted less than 20 h. So, they covered
no more than half a semester. Thus, in terms of test preparation, TEM has had some
mild effect on course planning.
5.2.5 Individual Differences
As proposed by many researchers, the washback effects of a test may vary from
participant to participant, due to their different characteristics, such as educational
background and work experience (Alderson and Wall 1993; Green 2007). In this
sense, program administrators’ perception of TEM washback is further analyzed to
identify individual differences. Since program administrators are often responsible
for course planning, the washback effects on course design were further analyzed,
and it turned out that the washback effects perceived by TEM4 program administra-
tors varied significantly—the overall washback on course design differed among
program administrators with different educational background and teaching
experience.
Table 5.13 TEM preparation courses

TEM4
Preparation Less than More
courses 10 h 10–20 h 20–30 h 30–40 h than 40 h
No 18.1%
Several lectures 36.3%
Overall training 33.5% 40.9% 26.3% 13.4% 12.1% 7.3%
Skill training 16.0% 38.9% 33.7% 11.6% 8.5% 7.3%
Grammar & Vocabulary 4.3% 46.8% 31.2% 8.2% 7.4% 6.3%
TEM8
Preparation Less than More
courses 10 h 10–20 h 20–30 h 30–40 h than 40 h
No 27.1%
Skill training 15.4% 55.2% 26.0% 7.6% 6.3% 4.8%
General knowledge 8.1% 69.3% 17.1% 5.4% 2.9% 5.4%
training
Note: The percentage may exceed 100%, because participants could select more than one answer
Table 5.14 TEM washback perceived by different program administrators

Overall washback perceived by program administrators with different teaching experience
(X2 = 8.394; p = .039)
0–5 years 5–10 years 10–15 years >15 years
Positive 96.2% 94.0% 89.0% 84.7%
Neutral 0 4.8% 9.5% 13.7%
Negative 3.8% 1.2% 1.5% 1.6%
Overall washback perceived by program administrators with different educational background
(X2 = 9.010; p = .029)
Bachelor Master Doctor Others
Positive 88.4% 87.0% 84.0% 77.8%
Neutral 10.9% 11.7% 13.0% 11.1%
Negative 0.6% 1.3% 3.0% 11.1%
As shown in Table 5.14, program administrators with a longer teaching experi-

ence and higher education took a more neutral perspective of the washback effects
of TEM4 on course design. Perhaps they were more confident and had clearer ideas
as to how teaching should be conducted for English majors, so they were affected
less by TEM4.
However, no significant difference was identified among TEM8 program admin-
istrators, which is possibly due to the fact that TEM8 is considered to play a less
important role in college English teaching (see Table 5.15), so the washback it
exerts is less intense, and so is the difference of the washback perceived by the pro-
gram administrators.
122
Table 5.15 Factors affecting English language teaching

Teacher Student Student Leader Course Teaching Teaching Teacher
quality effort quality attention design facility material income Research TEM
5
TEM4 70.9% 80.8% 58.0% 61.9% 36.7% 42.8% 28.9% 22.4% 20.8% 22.0%
TEM8 64.3% 61.0% 69.0% 40.1% 20.8% 9.6% 4.3% 22.4% 8.3% 2.5%
Stakeholders’ Perceptions of TEM and Its Washback
5.3 Teachers’ Perceptions 123
The above discussion shows that TEM has more positive washback effects on
English language teaching than negative effects, and it seems to have affected those
courses recommended in the Teaching Syllabus that are more related to test content
domain. Moreover, TEM has had some impact on the continued delivery of skill
training courses in some universities, even though the skill/competency is not a
compulsory subject for the advanced stage. However, is TEM a decisive factor com-
pared with other factors involved in language teaching?
From Table 5.15, it can be noted that program administrators consider student
quality, teacher quality, student effort and leader attention the most important four
factors. In contrast, the influence of TEM was limited. Comparatively, TEM4 played
a more important role in college English teaching, acknowledged by 22.0% of the
program administrators, while TEM8 was considered important by only 2.5% of the
program administrators surveyed. In other words, although TEM has exerted some
washback on English language teaching, its effect is limited, as compared with
other factors. This finding is desirable because the relationship between teaching
and testing should be “that of partnership”—a good test should be “supportive of
good teaching” and “exert a corrective influence on bad teaching” (Hughes 2003:
2). In this view, the role TEM plays in English teaching should be facilitative rather
than decisive, so it is natural that TEM was not considered an important factor by
most program administrators. In addition, this is also in line with the experts’ opin-
ion, who believed that the top three factors affecting English teaching and learning
effects should be curriculum design, teacher quality and students efforts, while no
one has selected TEM as an important factor.
5.3 Teachers’ Perceptions
Teachers are the practitioners of the Teaching Syllabus and their school curricula.
Meanwhile, their teaching activities are influenced by TEM. Therefore, question-
naire surveys were conducted among TEM4 and TEM8 teachers, to investigate their
evaluation of TEM, perception of TEM washback, classroom teaching practice, and
perception of the factors influencing teaching for English majors.
Altogether 6796 teachers participated in the questionnaire surveys, including 3574

TEM4 teachers who teach students in the foundation stage and 3222 TEM8 teachers
who teach students in the advanced stage. They may not have taught TEM preparation
courses, but the students they have taught would take TEM4 or TEM8 (Table 5.16).
Table 5.16 Demographic information of teachers

Gender:
Male Female
TEM4 25.0% 75.0%
TEM8 28.3% 71.7%
Title:
Professor Associate professor Lecturer Teaching Other
assistant
TEM4 3.1% 22.3% 58.7% 14.7% 1.2%
TEM8 4.3% 26.6% 56.0% 12.3% 0.8%
Teaching experience:
15 years
TEM4 28.0% 34.2% 18.9% 19.0%
TEM8 25.0% 32.0% 19.6% 23.4%
Degree:
Bachelor Master PhD Other
TEM4 15.8% 77.7% 5.9% 0.6%
TEM8 14.7% 76.1% 8.7% 0.4%
University type:
Comprehensive University of science Language Normal Other
university and technology university university
TEM4 43.9% 20.3% 6.4% 15.9% 13.5%
TEM8 44.3% 23.6% 5.7% 14.9% 11.6%
From Table 5.16, it can be concluded that the teachers surveyed constitute a rep-
resentative sample, covering teachers from various types of universities, with differ-
ent educational background and teaching experience. The academic title of the
teachers ranged from teaching assistant to professor, but with lecturers being the
majority, and the academic degree of the teachers ranged from Bachelor to Doctor,
with Master being the most common. There are new teachers with no more than
5 years of teaching experience, and also experienced teachers with more than
15 years of experience. In terms of academic title and degree, TEM8 teachers
seemed to be slightly higher than TEM4 teachers, and TEM8 teachers had a longer
teaching experience. The number of female teachers far outweighs that of male
teachers, both for TEM4 and TEM8.
The wide coverage of the sample ensures the representativeness of the research
findings, and meanwhile, a comparison can be made between teachers with differ-
ent characteristics.
5.3.2 Teachers’ Evaluation of TEM
Teachers are the practitioners of the Teaching Syllabus and their school curricula,
and TEM is one of the methods used to measure their teaching effects. Therefore,
they may have developed their unique perceptions of TEM and its washback. In this
Table 5.17 Familiarity with TEM

TEM4 TEM8
Test syllabus 81.8% 14.6% 3.7% 72.9% 20.4% 5.6%
Oral test syllabus 39.6% 29.1% 31.3% 33.6% 27.9% 38.5%
Time allotment 81.5% 14.6% 3.9% 73.2% 19.9% 6.9%
sense, teachers were surveyed on their familiarity with TEM, evaluation of TEM,
and their use of TEM results.
Before investigating their evaluation of TEM, the current research first surveyed the
teachers on their familiarity with TEM, such as the Teaching Syllabus, the Test
Syllabus, and so on. Whether teachers are familiar with TEM, to some extent, is
determined by whether they are teaching TEM related courses, whether they are
preparing students for TEM, and whether they are concerned about the test.
The majority of the teachers surveyed were familiar with the Teaching Syllabus,
the TEM Test Syllabus and TEM time allotment, as indicated in Table 5.17. Around
80% of the participants claimed to be familiar with the Teaching Syllabus and the
TEM Test Syllabus. However, the marking criteria were less known, familiar to
63.5% of the TEM4 teachers and 49.3% of the TEM8 teachers, and the Oral Test
Syllabus was the least known, familiar to only one third of the participants (39.5%
and 33.6%). Both TEM4 and TEM8 teachers were familiar with the Teaching
Syllabus, but TEM4 seemed to enjoy a higher familiarity rate, especially in terms of
the Test Syllabus and the marking criteria.
Teachers’ questionnaire findings coincide with that of the program administra-
tors’ questionnaires, and the possible causes are the same: the small test population
of TEM Oral Tests and the lack of communication between test developers and test
users.
5.3.2.2 Evaluation of TEM Test Quality
Since most teachers were familiar with TEM, it is likely for them to make sensible
judgment on the quality of TEM; hence, another area of inquiry in the questionnaire
is teachers’ evaluation of TEM, including alignment with the Teaching Syllabus,
test content, test method and test administration. As is shown in Table 5.18, TEM
was positively evaluated by most teachers, with a satisfaction rate of over 75% for
all the items. Among the 10 items in the questionnaire, test content received the
highest satisfaction rate whereas test report had the lowest one. Moreover, of all the
items, marking criteria received the highest rate of “Not clear”, which is consistent
Table 5.18 Evaluation of TEM

TEM4 TEM8
Alignment with the 90.6% 7.5% 1.9% 89.5% 8.9% 1.7%
teaching syllabus
Content 91.7% 5.5% 2.7% 92.0% 5.8% 2.3%
Item type 90.6% 6.6% 2.9% 90.7% 6.8% 2.5%
Time allotment 86.9% 8.4% 4.7% 83.7% 10.7% 5.6%
Score allotment 88.6% 8.7% 2.7% 87.0% 10.5% 2.6%
Convenient administration 86.1% 9.9% 3.9% 75.4% 14.6% 10.0%
Fair administration 84.0% 10.9% 5.0% 79.0% 12.7% 8.2%
Strict administration 90.1% 6.9% 3.0% 86.6% 8.9% 4.5%
Test report 74.3% 14.9% 10.8% 75.5% 17.0% 7.6%
Table 5.19 Information to be provided in the test report

TEM4 TEM8
Composite score 65.5% 34.5% 62.6% 37.4%
Section score 81.4% 18.6% 78.5% 11.5%
with the previous finding of its unfamiliarity to teachers. Comparatively, TEM4 had
a higher satisfaction rate than TEM8, except for the item of “test report”.
As shown Table 5.18, the items of test report and convenient administration for
TEM8 received the lowest satisfaction rate, which complies with the findings of the
program administrators’ questionnaires. Possible reasons are analyzed in Sect.
5.2.2.2.
Moreover, teachers’ opinions were sought as to what should be included in the
test report.
As shown in Table 5.19, what teachers wanted most is the section score, which
can reflect students’ strengths and weaknesses, such as whether they are better at
reading or listening. In this sense, learning activities can be organized to cater to the
needs of the students.
Test characteristics include not only test content and method, but also the use and
interpretation of test results. In this view, the current research attempted to investi-
gate teachers’ perception of how TEM scores should be used, as well as how the
score were actually used in their respective schools. As shown in Table 5.20, more
Table 5.20 Use and interpretation of TEM scores

TEM4 TEM8
Feedback to teaching 80.0% 13.1% 6.8% 75.3% 15.7% 9.0%
Assessing teaching quality 68.5% 16.3% 15.1% 64.5% 19.1% 16.5%
Promoting teaching reform 73.0% 15.6% 11.4% 67.3% 19.3% 13.5%
Reflecting students’ 77.7% 14.1% 8.1% 74.8% 15.9% 9.4%
language proficiency
Assessing students’ 82.7% 11.6% 5.7% 79.0% 13.0% 7.9%
learning outcome
Predicting students’ future 57.7% 25.4% 16.8% 58.1% 25.6% 16.3%
performance
Motivating students to learn 80.0% 12.6% 6.5% 76.4% 14.7% 8.9%
Table 5.21 Influence of TEM scores

TEM4 TEM8
Affect teacher evaluation 33.4% 25.7% 40.8% 24.8% 25.3% 49.9%
Affect student ranking 35.0% 34.5% 30.4% 32.7% 32.0% 35.3%
Table 5.22 Unintended uses of TEM scores

TEM4 TEM8
Yes No Not clear Yes No Not clear
teachers agreed on the use of TEM score to assess students learning outcome (82.7%
and 79.0%) and provide feedback to teaching (80.0% and 75.3%) than to predict
students future performance (57.7% and 58.1%). It suggests that TEM has been
used in the intended way—to measure students’ language proficiency and to pro-
vide feedback information to facilitate English teaching. As to whether TEM results
can be used in job application, “predict students’ future performance”, most teach-
ers were doubtful, which is the same as the program administrators.
Table 5.21 provides information on the impact of TEM scores. As can be seen in
the table, TEM scores seem to affect students’ future career and further education
much more than teacher evaluation and student ranking.
Further investigation was made into the actual practice in different universities—
how TEM scores were used and the teachers’ attitudes towards such uses.
From Tables 5.22 and 5.23, it can be noted that TEM scores were seldom used to
evaluate teachers, but one third of the schools linked TEM4 score to the bachelor’s
degree degree. However, teachers’ attitudes towards these uses deserve our atten-
tion: more teachers agreed to these uses than not.
Table 5.23 Attitudes towards unintended uses of TEM scores

TEM4 TEM8

TEM4 TEM8
on teaching
on learning
Teaching aims 84.6% 11.8% 3.6% 82.9% 15.3% 1.7%
Course design 75.1% 21.4% 3.5% 73.2% 24.9% 1.9%
Teaching schedule 73.7% 19.8% 6.5% 70.1% 26.8% 3.1%
Teaching content 76.6% 16.9% 6.6% 76.2% 20.5% 3.2%
Teaching method 68.8% 22.5% 8.7% 67.6% 28.0% 4.4%
Teaching ideology 66.0% 24.5% 9.5% 67.0% 28.2% 4.9%
Teaching attitude 71.8% 24.0% 4.1% 71.5% 26.2% 2.2%
Teaching depth 71.9% 19.8% 8.3% 76.1% 20.0% 3.9%
Teaching speed 63.3% 24.4% 12.4% 62.3% 32.0% 5.7%
Therefore, it is advisable that TEM test developers identify possible uses and
interpretations of TEM scores and provide guidelines on how decisions about scores
are to be made.
5.3.3 Teachers’ Perception of TEM Washback
Table 5.24 is about teachers’ perception of TEM washback. Generally speaking,

their views on washback were positive, though their opinions varied on some items.
From Table 5.24, it can be concluded that most teachers held a positive attitude
towards TEM, especially its overall washback on learning, with more than 90% of
the teachers claiming that TEM exerted positive washback on learning. As for teach-
ing, most teachers agreed that TEM exerted positive washback on the overall teach-
ing and teaching aims, while TEM seemed to affect teaching method, ideology,
attitude and speed less, as indicated by the number of teachers selecting “no wash-
back”. This finding complies with the previous research findings (Cheng 2005; Gu
2007; Wall 2005) and the program administrators’ perception.
Teachers’ perception of TEM washback was further explored to find out which
aspect of TEM exerted more washback effects (Table 5.25).
Table 5.25 Washback exerted by different aspects of TEM

TEM4 TEM8
Test content 82.2% 14.4% 3.3% 80.4% 17.9% 1.6%
Test method 78.2% 17.8% 4.1% 77.3% 20.8% 1.9%
Score allotment 71.7% 25.6% 2.7% 67.7% 30.5% 1.9%
Time allotment 68.5% 28.3% 3.2% 65.1% 32.9% 2.0%
From Table 5.25, it can be inferred that various aspects of TEM exerted more
positive than negative effects on teaching. Test content and method seem to have
exerted more influence than score allotment, time allotment and marking criteria,
which is similar to the perception of program administrators. However, it deserves
our attention that the marking criteria failed to achieve the expected effects. Marking
criteria should have played a guiding role in English teaching, but it failed to do so
since some teachers were not familiar with them (see Table 5.17).
5.3.4 Washback on Classroom Teaching
According to Alderson and Wall (1993), a test will influence teaching, such as its
content and method, rate and sequence, degree and depth, as well as teachers’ atti-
tude towards teaching. Therefore, in the present study, teachers were surveyed on
the washback of TEM on their classroom teaching practice.
5.3.4.1 TEM Preparation Classes
One possible washback of TEM on teaching is the introduction of TEM preparation

courses, which may differ from non-preparation courses in both the teaching con-
tent and the teaching method. In this sense, teachers were surveyed on whether they
have prepared students for TEM, and if yes, how these courses were delivered, and
whether they were different from non-preparation courses. Table 5.26 summarizes
whether teachers have prepared students for TEM and how long they have spent on
TEM preparation.
From Table 5.26, it can be inferred that TEM4 preparation was more often intro-
duced than TEM8 preparation, since around half (45.0%) of the teachers claimed
that they had never prepared students for TEM8, while only 25% of the teachers
selected “no” for TEM4 preparation. For those who had prepared students for TEM,
around one fourth (27.0% and 27.4%) employed the format of lectures, while for
those who prepared students for TEM in their classrooms, most of them spent no
more than 10 h on TEM preparation, which indicates that the washback of TEM on
course design is not intense.
Table 5.26 TEM preparation experience

TEM4
TEM Less More
preparation than 10 h 10–20 h 20–30 h 30–40 h than 40 h
No 25.3%
Skill training 26.5% 50.6% 29.8% 11.7% 4.3% 3.6%
Grammar & 11.4% 56.4% 25.2% 10.0% 5.2% 3.2%
Vocabulary
TEM8
Preparation Less More
courses than 10 h 10–20 h 20–30 h 30–40 h than 40 h
No 45.0%
Skill training 15.9% 60.2% 24.1% 9.5% 3.4% 2.7%
General 5.5% 76.3% 14.5% 6.1% 1.8% 1.4%
knowledge
training
Furthermore, questions were asked to find out how teachers taught in TEM prep-
aration classes, the major activities conducted in classes.
Activities conducted in TEM preparation classes were varied, because in most
schools TEM preparation was carried out in fundamental English or advanced
English classes, rather than in a separate course. From Table 5.27, it can be inferred
that TEM preparation, such as having students work on the test paper and explain
the exercises in the test paper, is often conducted in classes, especially for TEM4.
However, textbooks were still used and skill training activities were often carried
out. It seems that TEM8 preparation was much less intense than TEM4 preparation,
with only one third of the teachers claiming that test preparation activities were
frequently conducted in class.
From Table 5.28, it can be inferred that TEM would have some washback on the
teaching content. More time would be devoted to working on the test paper and
explaining it. Also, TEM related skills would be practiced more, such as listening,
reading and writing, while non-related skills would be practiced less, such as trans-
lation for TEM4 and speaking. It seems that most activities had been intensified in
TEM preparation classes, which, according to findings from the interviews, may
have been attributed to the tight schedule, for teachers had to finish the textbook and
prepare students for TEM at the same time. As a result, more activities would be
conducted in the classes.
Table 5.27 TEM preparation classes

TEM4 TEM8
Frequent Sometimes Seldom Frequent Sometimes Seldom
Work on test papers 55.1% 26.6% 19.3% 37.2% 34.7% 28.0%
Explain test papers 52.9% 29.3% 17.9% 35.1% 34.8% 30.2%
Explain texts 62.9% 20.7% 16.4% 65.3% 19.6% 15.1%
Work on textbook exercises 59.4% 21.2% 19.4% 57.6% 24.5% 18.0%
Explain textbook exercises 57.2% 22.8% 20.0% 52.3% 27.5% 20.2%
Listening activity 61.0% 20.4% 18.6% 54.6% 26.5% 18.8%
Speaking activity 53.7% 24.9% 21.4% 46.5% 30.2% 23.3%
Reading activity 64.0% 21.2% 14.9% 63.3% 23.2% 13.5%
Writing activity 57.8% 25.4% 16.8% 52.7% 30.9% 16.5%
Translation activity 48.2% 27.1% 24.7% 55.8% 29.6% 14.7%
Other activity 42.9% 32.5% 24.5% 34.2% 33.9% 31.9%
Table 5.28 Difference between TEM preparation and non-preparation classes

TEM4 TEM8
More Same Less More Same Less
Explain texts 23.7% 35.9% 30.4% 27.1% 46.1% 26.9%
Other activity 31.1% 41.9% 27.0% 34.8% 37.1% 28.2%
5.3.4.2 Washback from TEM Revision
TEM underwent some substantial revisions in 2005. Since many teachers have more
than 10 years of teaching experience, several questions were designed to find out
how TEM revision has affected them in classroom teaching. (See Sect. 5.2.4 for the
content of revision).
It can be noted from Table 5.29 that as for TEM4 revision, it seems that the raised
requirement (listening, writing and vocabulary) exerted more influence on course
design than the deleted item (fast reading), which is the same as program adminis-
trators’ opinion.
While for TEM8, the newly added item, General Knowledge, seems to have
exerted more influence on English teaching. Moreover, listening and writing revi-
sion seemed to influence English teaching more than the revision in reading. As a
Table 5.29 Washback from TEM revisions

TEM4
Listening Fast reading Writing Vocabulary
Not influenced 4.8% 18.7% 4.1% 5.1%
Not clear 11.1% 17.5% 10.3% 9.5%
Influenced 84.0% 63.8% 85.6% 85.4%
TEM8
General
Listening Reading Writing Knowledge
Not influenced 5.2% 15.5% 5.0% 3.9%
Not clear 13.2% 17.9% 12.3% 10.3%
Influenced 81.6% 66.6% 82.7% 85.8%
new item, it is natural that General Knowledge will affect teachers more, because
they had to familiarize themselves with both its content and method, so that they can
better prepare students for the test. According to the findings from the interviews
and the open questions in the questionnaires, the reason for listening and writing to
exert more influence is that these two parts were very difficult, so teachers were
more concerned about their revision.
The current research covers teachers from various universities with different back-
ground, so an investigation was made to find out whether their perceptions of TEM
washback differ (Table 5.30).
Several Samples Tests were done and difference among teachers with different
educational and teaching background was found to be significant, while there
seemed to be no significant difference among teachers from various universities. As
shown in Table 5.30, those affected less by TEM (selecting “no washback”) were
teachers with higher titles, longer teaching experience, or higher degrees. Moreover,
they seemed to take a more critical attitude towards TEM (selecting “negative”).
Possible reasons are: (1) better-educated and more experienced teachers tend to be
more confident about their teaching, so it is natural that they would be affected less
by TEM; (2) the longer one teaches, the more problems they may identify in the
educational system, including TEM, so they are more likely to hold a negative atti-
tude towards TEM; and (3) more teachers with bachelor’s degree degrees were
found to be negatively influenced by TEM8, which is probably due to the high dif-
ficulty level of the test.
Table 5.30 TEM washback on teaching perceived by different teachers

TEM4 TEM8
No No
Positive washback Negative Positive washback Negative
Professor 83.4% 11.0% 5.5% 74.2% 21.3% 4.4%
Associate Prof. 80.6% 14.8% 4.6% 79.0% 17.8% 3.1%
Lecturer 85.4% 10.1% 4.5% 85.1% 13.1% 1.8%
Teaching Assist. 88.5% 9.3% 2.1% 87.7% 11.0% 1.3%
Other 90.0% 7.5% 2.5% 75.0% 16.7% 8.4%
TEM washback on teaching perceived by teachers with different titles (TEM4: X2 = 23.417,
p = .000; TEM8: X2 = 40.568, p = .000)
0–5 years 88.4% 8.8% 2.8% 88.1% 10.8% 1.0%
6–10 years 85.8% 10.0% 4.1% 85.2% 13.1% 1.7%
11–15 years 83.9% 11.0% 5.2% 82.3% 15.3% 2.5%
>15 years 78.3% 16.2% 5.5% 76.9% 19.1% 4.0%
TEM washback on teaching perceived by teachers with different teaching experience (TEM4:
X2 = 31.797, p = .000; TEM8: X2 = 51.992, p = .000)
Bachelor 85.0% 11.2% 3.9% 81.2% 14.8% 4.0%
Master 85.3% 10.8% 4.0% 84.8% 13.5% 1.7%
PhD 78.2% 14.6% 7.3% 72.7% 23.6% 3.7%
Other 73.7% 15.8% 10.6% 73.2% 23.1% 0.0%
TEM washback on teaching perceived by teachers with different degrees (TEM4: X2 = 13.096,
p = .004; TEM8: X2 = 20.038, p = .000)
Generally speaking, most teachers agreed that TEM had exerted some positive
effects on English teaching. However, is TEM an important factor that may deter-
mine their teaching effects?
From Table 5.31, it can be noted that TEM4 and TEM8 teachers’ responses were
quite similar, considering student quality, student effort and teacher income the
three most important factors. In contrast, the influence of TEM is limited, consid-
ered important by no more than 10% of the teachers (8.1% and 6.3%). In this view,
teachers’ view coincided with that of the program administrators, and it is the desir-
able role of TEM, to be facilitative rather than decisive in college English
teaching.
As a result, TEM was not considered a major source of pressure by most teach-
ers, as indicated in Table 5.32.
As indicated in Table 5.32, teachers’ pressure mainly came from their sense of
responsibility and their work load, while TEM preparation and being evaluated
based on TEM results would not lead to pressure for most teachers.
Table 5.31 Factors affecting English teaching

Student Student Teacher Leader Course Teacher Class Teaching
quality effort income attention design quality size facility Research TEM
TEM4 62.0% 57.5% 49.8% 44.1% 42.5% 37.9% 40.2% 29.4% 26.7% 8.1%
TEM8 61.2% 51.1% 42.0% 41.8% 21,3% 36.7% 34.2% 25.8% 23.6% 6.3%
Table 5.32 Sources of pressure for teachers

Teacher’s Too many Students’ TEM Evaluation
responsibility classes non-cooperation preparation based on TEM Other
TEM4 64.2% 41.9% 13.6% 10.7% 6.8% 5.7%
TEM8 65.4% 39.2% 12.1% 10.3% 6.6% 4.9%
5.4 Students’ Perceptions
Altogether 17,896 TEM candidates participated in the questionnaire surveys,

including 11,033 TEM4 candidates and 6863 TEM8 candidates. The following part
will discuss the students’ evaluation of TEM, their perception of TEM washback,
and their TEM preparation practice.
TEM4 candidates were mainly second-year English majors and TEM8 candidates
were English majors in their fourth year. However, there were also test takers major-
ing in other fields, such as law, economics, and so on, but they often had English as
their minor. Moreover, there were candidates majoring in other foreign languages,
such as Japanese, French, and so on. The candidates were from universities of all
types, such as comprehensive universities, universities of science and technology,
language universities, and so on (Table 5.33).
Based on the figures listed in Table 5.33, the following findings can be
summarized.
First, the number of female test takers far surpasses that of male test takers,
which might be attributed to the common phenomenon that there are more girls
majoring in English.
Second, TEM4 is mainly taken by second-year students, but quite a few third-
year students take the test, because for those who have failed their TEM4 in the
second year, they can take TEM4 again in their third year. Likewise, for those who
have failed TEM8 in the fourth year, they can take TEM8 the next year, perhaps
after their graduation. However, the number of TEM8 test takers in their fifth year
is much smaller than that of TEM4 test takers in their third year. This is not because
TEM8 has a much higher pass rate, but that some candidates who fail TEM8 for the
first time do not take the test again after their graduation.
5.4 Students’ Perceptions 135
Table 5.33 Demographic information of students

Gender
Male Female
TEM4 15.2% 84.8%
TEM8 16.4% 83.6%
Grade
First year Second year Third year Fourth year Fifth year
TEM4 1.6% 76.3% 21.8% 0.3% 0
TEM8 0.1% 0.2% 2.9% 96.5% 0.3%
Major
English major English minor Other
languages
TEM4 79.8% 17.3% 2.8%
TEM8 92.5% 6.7% 0.8%
University type
Comprehensive University of science Language Normal Other
university and technology university university
TEM4 38.6% 15.5% 17.2% 22.4% 6.3%
TEM8 36.4% 17.1% 14.9% 25.8% 5.9%
Last, more candidates of other majors take TEM4 than TEM8, which is possibly
due to the fact that TEM8 is not a required test for non-English majors in most
schools.
5.4.2 Students’ Evaluation of TEM
Test takers’ opinions are seldom sought as to whether a test is valid or not (Cheng
2008, 2011; Cohen 2006; Hamp-Lyons 2000a, b). However, research in language
assessment has demonstrated strong evidence of test validation from multiple stake-
holder perspectives (Cheng 2011; Moss et al. 2006). Therefore, it is essential to
investigate test takers’ perception in the validation of TEM. In this view, test takers’
evaluation of TEM is explored in the questionnaire surveys, and the following part
will report their familiarity with TEM and their evaluation of TEM.
Before investigating test takers’ evaluation of TEM, it is necessary to find out how
familiar they are with TEM, such as TEM Test Syllabus, TEM test administration,
and so on.
As indicated in Table 5.34, TEM candidates were more familiar with the admin-
istration of the test, such as its time allotment and score allotment. Meanwhile, they
Table 5.34 Students’ familiarity with TEM

TEM4 TEM8
TEM test syllabus 54.8% 29.9% 15.4% 57.0% 30.4% 12.6%
TEM-oral test syllabus 32.5% 31.7% 35.8% 32.4% 32.2% 35.4%
TEM marking criteria 47.9% 32.2% 19.9% 39.3% 37.3% 23.5%
TEM time allotment 69.6% 20.4% 10.0% 69.1% 22.2% 8.8%
TEM score allotment 66.5% 20.3% 13.2% 64.8% 20.8% 14.5%
were more concerned about the TEM Test Syllabus than the Teaching Syllabus. This
conforms to our expectation because candidates care more about the test content (as
listed in the Test Syllabus), time allotment and score allotment, so that they can bet-
ter perform in the test; while the Teaching Syllabus is none of their concern since it
is meant to guide teaching for English majors.
It should also be noted that the candidates were not so familiar with TEM mark-
ing criteria, especially TEM8 marking criteria. 47.9% of the TEM4 candidates
claimed to be familiar with TEM4 marking criteria and 39.3% of the TEM8 candi-
dates were familiar with TEM8 marking criteria. This might be attributed to the fact
that more subjective items are employed in TEM8, such as translation and writing,
which account for 40% of the total score, while there is no translation in TEM4 and
the writing part only accounts for 25% of the total. As proved by the interview find-
ings, most candidates surveyed claimed that they were not so concerned about the
marking criteria, especially that of writing and translation, because they either “had
no access to them” or “had difficulty understanding them”. Moreover, candidates
seemed to believe that the marking of writing and translation was “teachers’ busi-
ness”, while they “had no say” in this matter. This may also account for the fact that
translation and writing were less practiced in TEM preparation after class (see
Table 5.45). Candidates may feel their efforts in vain because they can’t get prompt
feedback on their performance in translation and writing.
As shown in Table 5.34, TEM-Oral Test Syllabus was the least familiar to test
candidates, 32.5% for TEM4 candidates and 32.4% for TEM8 candidates. Possible
explanation is that the test taking population of TEM oral tests (TEM4-Oral and
TEM8-Oral) is much smaller than TEM written tests (TEM4 and TEM8). No more
than 10% of the written test takers would take the oral tests—this makes speaking
an untested ability in TEM for most English majors. As a result, speaking profi-
ciency might be neglected in teaching and learning.
5.4.2.2 Evaluation of TEM
Candidates’ opinions are sought as to whether TEM can assess their language abili-
ties validly, both general language ability and individual language skills.
Table 5.35 Students’ evaluation of TEM

TEM4 TEM8
Overall language proficiency 59.2% 22.6% 18.2% 53.8% 27.4% 18.8%
Learning achievement 57.4% 21.0% 21.6% 53.5% 23.8% 22.7%
Predict future performance 34.6% 26.2% 39.2% 38.1% 26.5% 35.5%
Listening 72.2% 13.8% 14.1% 65.6% 17.3% 17.1%
Reading 69.9% 17.9% 12.3% 68.2% 17.9% 13.8%
Writing 63.3% 18.8% 17.9% 67.2% 19.4% 13.4%
Grammar&vocabulary 71.1% 15.9% 12.9%
(TEM4)
Translation (TEM8) 68.6% 19.6% 11.9%
General knowledge (TEM8) 66.0% 18.1% 15.9%
Table 5.35 shows that most TEM candidates agreed that the individual sections
in TEM were able to assess their language ability, such as the listening part can
validly assess their listening ability. However, they seemed to be less satisfied
with the role of TEM in assessing their overall language ability and learning
achievement. There are two possible reasons. First, the overall language profi-
ciency may include more than what is assessed in TEM. For instance, most TEM
candidates would not take TEM oral test, so oral proficiency, an indispensable
part of overall language ability, is not assessed in TEM. Second, learning achieve-
ment in college involves much more than language ability, so it can hardly be
assessed by a single test.
As to whether TEM can predict future performance, opinions of candidates vary.
Around one third of the candidates believed that it could predict their future perfor-
mance, one third were not sure, and the rest believed that it could not serve such a
function. As stated in the test syllabus, the purpose of TEM is to measure whether
candidates have met the required language proficiency specified in the Teaching
Syllabus. Therefore, TEM is not designed to predict candidates’ future performance,
such as whether they are competent for a certain job or whether they are qualified
for a bachelor’s degree. In this question, more TEM8 candidates (38.1%) agreed on
the predictive role of TEM than TEM4 candidates (34.6%), which might be attrib-
uted to the fact that TEM8 is taken at the end of the fourth year when candidates are
about to graduate, so TEM8 results are more likely to be used to predict candidates
performance in their future jobs.
Whether a test can achieve its intended washback depends, to a large degree, on
the feedback information it provides, and test report is a basic means of providing
information. Therefore, candidates’ opinions are sought as to what information
should be provided in the TEM test report.
From Table 5.36, it can be inferred that TEM candidates wanted more informa-
tion than what is provided in the present TEM test report. The scores of each section
are considered necessary by most candidates, even more important than the com-
posite score. One possible reason is that candidates would be able to identify their
Table 5.36 Information to be provided in TEM test report

TEM4 TEM8
Composite score 56.5% 43.5% 61.5% 38.5%
Section score 61.0% 39.0% 63.0% 37.0%
Table 5.37 Use and interpretation of TEM results

Whether TEM score is related to Attitudes to relating TEM score
degree to degree
Yes Not clear No Agree Neutral Disagree
TEM4 59.5% 15.5% 25.0% 44.4% 19.3% 36.3%
TEM8 16.0% 11.9% 72.1% 44.9% 21.6% 33.6%
TEM4 TEM8
Affect ranking 34.4% 32.8% 32.9% 26.9% 29.0% 44.1%
strengths and weaknesses based on the section scores. Moreover, some candidates
believed it necessary to describe the language proficiency for each score band, so
that they would know what their marks truly mean. Only one fourth of the test takers
believed it necessary to provide their rank in the test population, which suggests that
candidates are against the competition entailed in TEM.
5.4.2.3 Use and Interpretation of TEM Results
Whether test scores are used in the intended way may, to some extent, affect the
validity of a test. However, in test use and interpretation, test takers often play a
passive role, powerless as to how test scores should be interpreted and used.
Nevertheless, it is still important to investigate how they believe the test scores are
used and how they are affected by it, so that a comparison can be made between the
beliefs of various stakeholders (Table 5.37).
According to the candidates, TEM4 scores (59.5%) were more often related to
conferring bachelor’s degree degrees than TEM8 scores (16.0%). However, TEM4
and TEM8 candidates held similar attitudes towards relating TEM scores to confer-
ring bachelor’s degree degrees, half agree and half disagree.
Besides conferring bachelor’s degrees, TEM results may also affect candidates’
school rank, future employment and further education. The candidates’ responses
suggest that TEM results have affected their future employment more than school
rank, especially for TEM8 candidates who were to graduate soon.

TEM4 TEM8
Positive Neutral Negative Positive Neutral Negative
Learning aims 76.8% 15.2% 8.0% 76.1% 16.8% 7.1%
Course selection 63.8% 26.3% 9.9% 62.8% 29.5% 7.9%
Learning plan 68.9% 18.8% 12.4% 69.7% 20.1% 10.2%
Learning content 70.8% 18.1% 11.1% 70.9% 20.7% 8.4%
Learning method 68.1% 20.6% 11.3% 66.0% 24.0% 10.1%
Learning attitude 72.3% 17.5% 10.1% 71.5% 20.0% 8.5%
Learning depth 69.5% 18.5% 12.0% 71.4% 19.8% 8.8%
Learning speed 68.0% 20.4% 11.6% 68.1% 22.5% 9.4%
5.4.3 Students’ Perception of TEM Washback
According to Alderson and Wall (1993), a test will affect what learners learn and
how learners learn, such as the learning content, method, rate and sequence, and
degree and depth. Therefore, the candidates were first surveyed on whether they
were positively or negatively affected by TEM.
As indicated in Table 5.38, most TEM candidates believed that TEM has exerted
more positive washback on their English learning, especially the overall washback
of TEM. Moreover, it seems that TEM has helped candidates set their learning aims,
so most of them were quite positive about the washback of TEM on their learning
aims. Besides, TEM seems to affect the learning content more than the learning
method, with more candidates selecting neutral to the question of the washback of
TEM on the learning method. What is least affected by TEM is course selection,
with around 30% of the candidates choosing “Neutral”, which is possibly due to the
fact that most TEM related courses are compulsory courses, such as fundamental
English and advanced English.
A further investigation is made on the candidates’ attitudes towards the possible
positive and negative washback of TEM—whether they agreed that TEM would
generate washback effects, and the findings are listed below.
From Table 5.39, it can be concluded that a lot more candidates agreed on the
positive washback than the negative washback. More than 70% of the candidates
agreed that TEM would provide them with learning aims, so that they would spend
more time learning English. Moreover, preparing for TEM could help them clarify
the learning requirements and identify their weaknesses, so that they could better
arrange their learning tasks, allocating time for each skill training. Besides, most
candidates believed that their vocabulary had been enlarged and their learning effi-
ciency improved. As a result, 78.9% of the TEM4 candidates and 72.7% of the
TEM8 candidates agreed that TEM had promoted their language learning.
Table 5.39 Positive and negative washback of TEM

TEM4 TEM8
Increase pressure 81.1% 9.8% 9.1% 82.0% 10.1% 7.9%
Provide learning aims 75.3% 13.9% 10.8% 71.6% 16.0% 12.4%
Promote learning 78.9% 11.9% 9.1% 72.7% 15.5% 11.8%
Improve learning efficiency 68.4% 17.6% 13.9% 61.7% 22.1% 16.2%
Spend more time learning 82.2% 9.9% 7.9% 76.3% 13.0% 10.7%
Clarify learning requirement 64.5% 20.3% 15.3% 59.9% 21.8% 18.3%
Help better arrange skill training 64.1% 19.3% 16.6% 62.7% 19.6% 17.7%
Help identify weaknesses 79.1% 12.8% 8.1% 74.4% 14.5% 11.1%
Improve English proficiency 74.4% 14.7% 10.9% 72.1% 16.5% 11.5%
Improve English through mock 66.7% 18.1% 15.3% 65.4% 19.9% 14.6%
tests
Enlarge vocabulary 85.4% 8.4% 6.3% 75.5% 14.5% 10.0%
Interfere with regular learning 40.3% 20.5% 39.2% 34.8% 20.9% 44.3%
Interfere with skill training 51.6% 18.6% 29.8% 43.2% 20.6% 36.2%
Satisfy with identifying the correct 38.6% 19.6% 41.8% 40.6% 21.8% 37.5%
answer for MCQ
However, it is also possible that TEM will exert some negative washback on
candidates. For example, preparing for TEM may interfere with their regular learn-
ing practice; they would focus more on test paper work than language skill training;
and they may lower their requirement, satisfied with identifying the correct answers
to the multiple choice questions, instead of seeking the reasons. As indicated in
Table 5.39, such effects did exist, with around 40% of the candidates claiming to be
affected in a negative way.
5.4.4 TEM Preparation
According to Hughes (1993), a test will influence the participants, the processes and
the products. TEM may first affect the perceptions and attitudes of the students, and
then, their learning activities will differ, so will their learning products. In this sense,
a further investigation was made into students’ learning processes and learning
effects, to find out how their in-class and after-class learning activities are affected
by TEM and what kind of learning effects have been achieved.
5.4.4.1 In-Class Learning
TEM preparation courses are provided by some schools and TEM preparation activ-
ities are conducted in some classes, so students’ learning in classes may differ from
that of the non-preparation stage. In this sense, the students were surveyed on their
in-class learning experience, to find out possible differences.
Table 5.40 TEM preparation courses

TEM4
TEM Less More
preparation than 10 h 10–20 h 20–30 h 30–40 h than 40 h
No 28.4%
Skill training 25.2% 41.9% 29.0% 15.4% 7.0% 6.7%
Grammar & 14.6% 48.0% 25.3% 13.3% 7.4% 6.0%
Vocabulary
TEM8
Preparation Less More
courses than 10 h 10–20 h 20–30 h 30–40 h than 40 h
No 45.9%
Skill training 18.2% 48.9% 27.7% 12.7% 5.6% 5.0%
General 8.8% 60.2% 19.8% 10.5% 5.1% 4.5%
knowledge
training
Table 5.41 Activities conducted in TEM preparation classes

TEM4 TEM8
Explain texts 57.7% 20.3% 22.0% 46.4% 22.8% 30.8%
Other activity 30.2% 30.9% 37.8% 30.1% 31.2% 38.8%
From Table 5.40, it can be noticed that TEM4 preparation was more often intro-
duced than TEM8 preparation, since around half (45.0%) of the TEM8 students
selected “no preparation”, while only 28.4% of the students selected “no” for TEM4
preparation. When TEM preparation courses were offered, most of them would last
no more than 20 h. This roughly coincides with program administrators’ and teach-
ers’ responses.
Furthermore, questions were asked to find out the major activities conducted in
classrooms (Table 5.41).
Table 5.42 Difference between TEM preparation and non-preparation classes

TEM4 TEM8
Explain texts 35.5% 30.8% 33.8% 35.8% 38.5% 25.6%
Other activity 36.2% 37.8% 25.9% 35.1% 38.0% 26.9%
According to the students, various learning activities were conducted in TEM

preparation classes, including test preparation, textbook learning, and language skill
training. However, activities not directly related to TEM seemed to receive less atten-
tion, such as translation (33.5%) in TEM4 preparation classes and speaking (43.7%)
in TEM8 preparation classes. Meanwhile, TEM8 preparation seemed to be less
intense than TEM4 preparation, both in test paper practice (45.6% for TEM8 and
57.5% for TEM4) and test paper explanation (36.8% for TEM8 and 52.2% for TEM4).
From Table 5.42, it can be inferred that TEM would have some washback on the
learning content. Students would spend more time working on the test papers and
listening to teachers’ explanation. Meanwhile, TEM related skills would be prac-
ticed more, such as listening, reading and writing, while non-related skills would be
practiced less, such as translation for TEM4 and speaking. It seems that TEM prepa-
ration classes were more tightly organized, with more activities conducted, which is
in line with the teachers’ response.
Moreover, students’ preference was also investigated. It seems that more than
half of the students preferred all kinds of learning activities, and skill training activi-
ties enjoyed a slightly higher preference rate. The learning activity with the highest
preference rate was listening and the lowest, working on textbook exercises.
Generally speaking, students’ preference was in line with the frequency of activities
conducted in class (Table 5.43).
5.4.4.2 After-Class Learning
It is common practice that students would spend some time preparing for the test
before the exam, so an investigation was made into students’ after-class learning
activities before TEM, to seek possible differences between preparation and non-
preparation stages (Table 5.44).
Table 5.43 Students’ preference for classroom activities

TEM4 TEM8
Like Neutral Dislike Like Neutral Dislike
Explain texts 53.2% 28.0% 18.7% 52.5% 27.7% 19.8%
Other activity 54.4% 30.6% 15.0% 57.4% 29.4% 13.1%
Table 5.44 After-class preparation efforts

Length of TEM preparation
No preparation 1 month Half a semester One semester Even longer
TEM4 7.5% 43.2% 38.0% 8.1% 3.3%
TEM8 7.6% 61.1% 21.9% 6.3% 3.0%
TEM preparation time every day
No preparation <1 h 1–2 h 2–3 h Even longer
TEM4 7.6% 22.5% 32.1% 22.2% 15.6%
TEM8 9.7% 18.0% 20.9% 20.8% 30.6%
TEM mock tests
No preparation <5 copies 5–10 copies 10–20 copies Even more
TEM4 4.7% 20.3% 41.7% 26.9% 6.3%
TEM8 7.6% 26.3% 43.4% 18.5% 4.1%
Extracurricular TEM training course
No preparation <10 h 10–20 h 20–30 h Even longer
TEM4 82.7% 6.0% 7.2% 2.9% 1.3%
TEM8 88.9% 4.6% 4.5% 1.4% 0.7%
Table 5.44 describes the general situation of TEM preparation. It can be inferred
that most students spent some time on TEM preparation, but not intense. The major-
ity of them would prepare for no more than half a semester and they would spend
no more than 2 h every day on TEM preparation. They would work on the mock test
papers, but no more than 20 copies, and very few of them would take any extracur-
ricular training courses to prepare for TEM.
Next, the students were surveyed on the specific activities that they were engaged
in while preparing for TEM, and whether difference exists between TEM prepara-
tion and non-preparation stages.
Table 5.45 Learning activities conducted for TEM preparation after class
TEM4 TEM8
TEM listening tasks 62.3% 24.6% 13.1% 54.4% 28.9% 16.7%
Listening to the radio 43.3% 28.9% 27.8% 25.8% 29.1% 45.1%
TEM4 dictation/ TEM8 61.9% 24.9% 16.0% 41.1% 30.3% 28.6%
mini-lecture
TEM reading tasks 52.1% 28.5% 19.5% 42.2% 29.0% 18.9%
Reading English books 23.3% 26.4% 50.3% 24.9% 31.0% 44.0%
TEM4 grammar and 54.6% 25.5% 19.9% 53.3% 26.8% 19.9%
vocabulary/TEM8
general knowledge
Reading grammar 31.4% 30.6% 38.0% 35.4% 30.7% 33.9%
books/Reading books
on linguistics, literature,
culture
TEM4 cloze/TEM8 51.6% 29.3% 19.1% 36.5% 33.7% 29.8%
translation
TEM4 note writing/ 27.7% 29.5% 42.8% 50.8% 27.9% 21.3%
TEM8 proofreading
TEM writing tasks 30.8% 30.8% 38.4% 24.3% 29.9% 45.7%
Writing in English 17.3% 20.6% 62.2% 15.6% 22.4% 62.0%
Oral English practice 25.0% 27.5% 47.4% 20.0% 25.5% 54.4%
Reciting new words 52.2% 26.5% 22.3% 34.5% 29.6% 35.8%
As indicated in Table 5.45, students were engaged in various learning activities

while preparing for TEM. However, TEM related learning activities seemed to be
more often conducted, such as TEM reading, TEM listening, and so on. Moreover,
objective items were more frequently practiced than subjective items. For example,
the multiple choice questions in listening, reading, grammar & vocabulary, and gen-
eral knowledge part were claimed to be more frequently done than translation and
writing.
Table 5.46 shows how candidates’ learning activities in the TEM preparation
stage differed from that in the non-preparation stage. It seems that the majority of
the students worked harder before TEM, spending more time on most of the learn-
ing activities, especially those related to TEM, such as TEM listening, TEM read-
ing, TEM grammar and vocabulary, and TEM general knowledge. It is worth noting
that speaking, a skill not measured for most candidates, was practiced less by around
one third of the candidates, the highest rate among all the activities for both TEM4
and TEM8 (31.0% of the TEM4 candidates and 30.1% of the TEM8 candidates
selected “less”, claiming that this skill was less practiced in the preparation stage).
Table 5.46 Difference between TEM preparation and non-preparation stages

TEM4 TEM8
TEM listening tasks 75.5% 16.7% 7.8% 71.0% 20.8% 8.2%
Listening to the radio 42.0% 39.4% 18.5% 40.8% 43.2% 15.9%
TEM4 dictation/ TEM8 73.8% 18.3% 7.9% 59.7% 28.4% 11.8%
mini-lecture
TEM reading tasks 68.1% 21.2% 10.7% 65.5% 23.6% 10.9%
Reading English books 33.3% 37.0% 29.7% 36.4% 39.2% 24.3%
TEM4 grammar and vocabulary/ 66.0% 21.5% 12.5% 66.3% 22.8% 10.9%
TEM8 general knowledge
Reading grammar books/ 51.8% 31.5% 16.7% 49.1% 32.5% 18.2%
Reading books on linguistics,
literature, culture
TEM4 cloze/TEM8 translation 68.5% 20.9% 10.5% 57.2% 29.7% 13.1%
TEM4 note writing/TEM8 52.9% 30.8% 16.2% 66.3% 22.6% 11.0%
proofreading
TEM writing tasks 55.4% 30.0% 14.5% 47.4% 35.5% 17.0%
Writing in English 29.9% 41.1% 29.1% 29.4% 45.6% 25.0%
Oral English practice 32.2% 36.8% 31.0% 30.1% 39.8% 30.1%
Reciting new words 66.1% 21.8% 12.1% 50.4% 30.7% 18.9%
5.4.5 Learning Effects
From the above findings, we may come to the conclusion that TEM would motivate
students to learn—they would spend more time learning and their learning activities
seemed to be intensified, both in and after class. However, would extra efforts lead
to better learning effects?
As shown in Table 5.47, on the whole, TEM has facilitated candidates improving
their English proficiency. Students’ vocabulary seemed to improve more, with
around 75% of the candidates claiming that TEM has helped enlarge their vocabu-
lary. Moreover, various English skills of the candidates have been strengthened,
especially those measured in TEM, such as listening, reading and writing.
Comparatively, those skills not covered in TEM seemed to improve less, such as
speaking, with only 40% of the candidates claiming that TEM has helped them
improve their speaking proficiency. One possible reason is that only 5–10% of the
TEM written test candidates would have the opportunity to take TEM oral tests.
Another proficiency that has received a low “strengthen” rate is “techniques in
answering multiple choice questions”—only around 50% of the candidates claimed
that TEM has helped improve their ability to figure out the correct answer to mul-
tiple choice questions. This is desirable because the purpose of TEM preparation is
to help candidates improve their language proficiency, rather than test taking
techniques.
Table 5.47 Washback of TEM on learning effects

TEM4 TEM8
Strengthen Same Weaken Strengthen Same Weaken
Vocabulary 80.0% 14.6% 5.4% 74.2% 21.0% 4.8%
Grammar 76.1% 18.3% 5.6% 57.0% 35.9% 7.0%
Listening 73.7% 19.8% 6.4% 68.6% 25.3% 6.1%
Reading 69.3% 23.8% 7.0% 67.4% 26.3% 6.4%
Writing 64.0% 27.5% 8.5% 56.0% 36.3% 7.6%
Speaking 41.2% 40.9% 17.9% 38.0% 46.1% 15.9%
Techniques in 55.5% 34.1% 10.5% 48.9% 40.5% 10.6%
multiple choice
questions
TEM4 dictation/ 73.5% 19.0% 7.5% 59.8% 33.6% 6.6%
TEM8 translation
General knowledge 75.9% 17.9% 6.2%
(TEM8)
Note taking while 62.9% 28.8% 8.2%
listening (TEM8)
Table 5.48 Progress in fundamental/advanced stage

Fundamental Advanced
Strengthen Same Weaken Strengthen Same Weaken
Listening 70.7% 14.1% 5.2% 74.7% 18.5% 6.8%
Reading 72.1% 19.7% 8.2% 71.9% 20.5% 9.2%
Writing 71.4% 20.1% 8.6% 65.8% 25.0% 9.2%
Speaking 65.8% 25.8% 8.3% 60.6% 26.6% 12.7%
Grammar (fundamental)/ 69.6% 19.5% 11.0% 72.7% 20.0% 7.4%
Translation (advanced)
Vocabulary 78.8% 12.9% 8.4% 78.0% 14.9% 7.2%
(fundamental)/General
knowledge (advanced)
Table 5.48 summarizes candidates’ evaluation of their learning effects in the

fundamental and advanced stages in college. Most students claimed that their lan-
guage proficiency has improved, but speaking proficiency, the skill not measured in
TEM for most candidates, seems to improve less, which is consistent with the find-
ings in Table 5.47.
According to Alderson and Wall (1993), tests will have washback effects for some
learners, but not for others. In this view, a further investigation was made into the
washback effects of TEM on various students, to identify possible factors that may
affect the washback of TEM (Table 5.49).
Table 5.49 TEM washback on students with different characteristics

TEM4 TEM8
Male 78.3% 11.5% 10.2% 77.5% 12.7% 9.8%
Female 83.2% 8.0% 8.8% 83.0% 10.0% 7.0%
TEM washback on learning perceived by candidates with different gender (TEM4: X2 = 12.844,
p = .000; TEM8: X2 = 16.731, p = .000)
TEM8
Positive No washback Negative
Excellent 76.5% 15.5% 8.1%
Good 85.3% 8.8% 5.9%
Pass 83.6% 9.3% 7.1%
Fail 74.9% 14.7% 10.4%
TEM washback on learning perceived by candidates with different language proficiency (TEM8:
X2 = 42.110, p = .000)
Several Samples Tests were conducted to identify possible differences between

male and female candidates, as well as candidates with various language proficiency
levels. It is noted that male and female candidates differed significantly in their
perception of TEM washback on learning. It seems that male candidates were
affected less by TEM, with more candidates selecting “no washback”, and female
candidates seemed to be more positive about the washback of TEM.
As for the difference between candidates with various language proficiency lev-
els, only TEM8 candidates were investigated, because most of them had taken
TEM4 by then and their TEM4 scores could be used to predict their current lan-
guage proficiency. From the above table, it can be inferred that candidates with high
and low language proficiency (candidates who got Excellent and Fail in TEM4)
seemed to be affected less by TEM, while candidates in the middle seemed to take
a more positive attitude towards TEM washback. This finding is in line with Green’s
model (2007), in which he proposed that difficult but attainable test would exert the
most intense washback.
5.4.7 Factors Influencing Learning
From the above analysis, it can be inferred that TEM has exerted some washback on
the learners, and the positive washback seemed to outweigh the negative washback.
However, is TEM an important factor that may determine their learning effects?
From Table 5.50, it can be concluded that the role TEM played in determining
candidates learning effects was very limited. Both TEM4 and TEM8 candidates
believed that their learning effects were affected by their own characteristics and the
teachers’ quality. This is consistent with the view of program administrators and
teachers, which further proves that TEM plays a facilitative rather than decisive role
in teaching and learning.
Table 5.50 Factors affecting English learning

Teacher Course Teaching Textbook Class
Interest Attitude Efforts quality design facility used size TEM Other
TEM4 68.2% 72.0% 65.9% 64.6% 47.0% 33.7% 35.3% 19.2% 5.7% 0.3%
TEM8 64.2% 68.6% 63.0% 56.4% 42.2% 26.3% 28.7% 16.1% 4.4% 0.1%
Table 5.51 Major aims for learning

Improve Prepare for Obtain No clear
one’s quality Interest further education credits Pass TEM aim Other
TEM4 56.4% 48.4% 61.1% 15.5% 37.9% 5.1% 1.1%
TEM8 56.9% 46.4% 61.8% 11.3% 32.6% 5.1% 0.4%
As a result, TEM was not considered a major learning purpose by most candi-
dates, as shown in Table 5.51.
As indicated in Table 5.51, candidates’ major aim for learning is to prepare for
their future career and further education. From this point of view, TEM is important
to some extent, because many candidates believed that TEM results would affect
their future career and education (see Table 5.37), which may account for the fact
that around one third of the candidates considered passing TEM one of their aca-
demic aims.
5.5 Triangulation
A major weakness of questionnaire surveys is that they are subjective—the partici-

pants’ response may be based on their own perceptions and interpretations.
Therefore, it is necessary to triangulate questionnaire data to ensure their validity. In
this view, data triangulation and methodological triangulation have been conducted
to verify the responses of experts, program administrators, teachers and students,
and to corroborate data collected through various methods, such as closed questions
and open-ended questions in the questionnaire, as well as interviews.
5.5.1 Data Triangulation
Questionnaire surveys have been conducted among participants from various lev-
els—experts from the nation level, program administrators from the school level,
and teachers and students from the classroom level. Therefore, a comparison is
made between their responses to the same questions.
5.5 Triangulation 149
Program administrators, teachers and students were all surveyed on their familiarity
with TEM (see Tables 5.2, 5.17, and 5.34), and the following findings can be sum-
marized, based on a comparison between their responses. First, program adminis-
trators were more familiar with TEM than teachers, and teachers had a better
knowledge than students. Second, program administrators and teachers were more
concerned about the Teaching Syllabus and the Test Syllabus, while students were
more familiar with the structure and testing procedures of TEM. Third, compared
with TEM4, TEM8 was less familiar to program administrators and teachers, on
such items as the Test Syllabus, marking criteria and time allotment. However, stu-
dents of TEM4 and TEM8 showed little difference in their familiarity with the test.
Last, the Oral Test Syllabus and the marking criteria were less familiar as compared
with other items, especially to program administrators and teachers; while students’
response showed that many of them were not so familiar with the Teaching Syllabus.
These findings were consistent with our expectation. As school curriculum
designers, program administrators should have a better knowledge of the Teaching
Syllabus and TEM, while students should be more concerned about the practice of
the test than how teaching should be conducted in classrooms.
Moreover, the Oral Test Syllabus and the marking criteria were less familiar not
only to students, but also to teachers and program administrators, which seems to
suggest that information had been inadequately provided and thus more communica-
tion between test constructors and test users was necessary. Furthermore, the TEM-
Oral test should be popularized, so that more importance would be attached to it.
5.5.1.2 Evaluation of TEM
The quality of TEM were surveyed among all the participants, but different ques-
tions were asked based on their respective characteristics. A comparison between
their responses may lead to the following conclusions (see Tables 5.3, 5.18, and
5.35).
First, generally speaking, participants’ opinions towards TEM were fairly posi-
tive, and program administrators seemed to be more satisfied with TEM than teach-
ers, and teachers more satisfied than students.
Second, compared with other items, the test report of TEM received a low satis-
faction rate, both from program administrators and from teachers. Moreover, all
participants agreed that more information should be provided in TEM test reports,
especially the section scores which were considered necessary by most participants
(see Tables 5.4, 5.19, and 5.36). This consensus of opinion seems to suggest that
adequate reform should be adopted to modify the TEM test report, to provide more
information so that TEM can better play its facilitative role in teaching and
learning.
Last, compared with TEM4, test administration for TEM8 was not as satisfac-
tory, as unanimously agreed by program administrators and teachers. Therefore, it
may indicate that the invigilation procedure of TEM8 should be simplified, so that
it can be better followed by both teachers and program administrators.
According to the experts who were designers of the Teaching Syllabus and TEM
Test Syllabus, the intended use of TEM scores were those listed in the Test Syllabus,
while other uses, such as conferring degrees, evaluating teachers and ranking
schools, can hardly be justified or should not be promoted.
However, such unintended uses, especially those of TEM4 scores, were sup-
ported by around half of the program administrators and teachers and over 40% of
the students (see Tables 5.8, 5.23 and 5.37). This seems to indicate that test users
had not been adequately informed of the purpose of TEM, which would pose poten-
tial hazards in misuse of test scores.
Another point for triangulation is the actual use of TEM scores in different
schools. Program administrators, teachers and students were surveyed on how TEM
scores were used in their schools, and it was noted that their responses to the same
question varied to some extent. As to whether TEM scores were used to confer
bachelor’s degree degrees, the responses of program administrators and teachers
were quite similar, while students’ response showed a marked difference, especially
for TEM4 scores. Around 30% of the program administrators and teachers claimed
that TEM4 scores were used to confer bachelor’s degree degree in their schools,
while around 60% of the students believed that TEM4 scores were used for this
purpose (see Tables 5.7, 5.22 and 5.37). Several reasons may account for this mis-
match. First, students were not well informed of how TEM4 scores were used, so
they took it for granted that the result of such an important test should be related to
degree conferment. Second, the teachers may deliberately tell students that TEM4
scores would be used in degree conferment, so that students would attach more
importance to the test and work harder. Third, program administrators’ and teach-
ers’ questionnaires were distributed to all TEM participating institutions, while stu-
dents’ questionnaire was only conducted in 250 schools selected, so there might be
some variance caused by sampling. Last, the subjective nature of the questionnaire
means that there might be possible misunderstanding or misinterpretation, which
may also lead to possible errors in data collection and analysis. In a word, the mis-
match has indicated that participants’ perception of the same question may vary, so
it is important to triangulate data and to employ various research methods.
5.5.1.4 Introduction of TEM Preparation Courses
An investigation was made to find out whether TEM preparation courses were intro-
duced and how they were conducted. The questions were slightly different among
different target groups: program administrators were expected to describe how TEM
preparation courses were introduced in their school curriculum; teachers were to
describe what kind of courses they had taught to prepare students for TEM; and
students were asked about the TEM preparation courses they had taken (see Tables
5.13, 5.26 and 5.40). Their responses varied in the following aspects.
First, the percentage of program administrators selecting “no preparation” is
much lower than that of the teachers and students. This is quite reasonable, since not
all English teachers would be assigned to teach TEM preparation courses and not all
students would take TEM preparation classes, especially when they are conducted
in a lecture.
Second, compared with program administrators, fewer teachers and students
selected “several lectures”, which might be attributed to the same reason listed
above.
Third, the percentage of program administrators and students selecting “overall
training” was quite similar, while fewer teachers selected it. This seemed to indicate
that most students would take the TEM preparation course which offered them over-
all training. However, as for teachers, only a few of them would, perhaps, have the
chance to teach this course.
Last, the number of teachers and students selecting “skill training” and “vocabu-
lary and grammar training” or “general knowledge training” outweighs that of the
program administrators. This probably suggests that TEM preparation activities
were conducted on courses that were not intended for TEM preparation. For exam-
ple, in a listening class, the teacher may train the students for TEM, employing
certain TEM tasks, even though it is a normal listening course, instead of a TEM
preparation course. In this sense, TEM may have washback on not only TEM prepa-
ration classes, but also non-preparation classes.
5.5.1.5 Activities Conducted in TEM Preparation Classrooms
Both teachers and students were surveyed on how TEM preparation classes were
conducted and how they differed from non-preparation classes. Since the teachers
and students were not from the same class, it is natural that their responses may vary
to some extent. However, some common messages can be reinforced by their
responses (see Tables 5.27, 5.28, 5.41 and 5.42).
On one hand, various activities were conducted in TEM preparation classes,
rather than just focusing on test papers. Activities based on textbooks or aiming at
skill training were still conducted, and it seems that they took up as much time as
TEM preparation activities. This is a desirable finding, since TEM preparation
should, by no means, become the sole purpose of classroom teaching.
On the other hand, TEM preparation classes seemed to be more intensive than
non-preparation classes, with teachers and students spending more or the same
amount of time on most activities. The activities were not only working on test
papers, but also skill training tasks, while the time spent on textbooks was not
changed much. This indicates that TEM preparation activities were integrated with
other classroom tasks.
5.5.1.6 Factors Influencing Teaching and Learning
As for the factors influencing teaching and learning, the responses of experts, pro-
gram administrators, teachers and students varied to some extent, because of their
different status. For experts, the important factors were curriculum design, teacher
quality and student effort; program administrators considered teacher quality, stu-
dent effort and quality and leader attention the most important factors; teachers
rated student quality and effort, teacher income and leader attention the most impor-
tant; and to students, their interest, attitude, effort and teacher quality were the most
important factors (see Tables 5.15, 5.31 and 5.50).
From the above analysis, it could be concluded that the effect of teaching was
mainly determined by teachers and students, and the curriculum would also contrib-
ute to it. However, the role TEM played seemed to be quite limited, as unanimously
agreed by the participants.
5.5.2 Methodological Triangulation
To help explain the quantitative data collected in the questionnaire surveys, an open-
ended question was asked in each questionnaire and supplementary interviews were
conducted among relevant participants, to collect qualitative data to corroborate
research findings.
5.5.2.1 Closed vs. Open-Ended Questionnaire Responses
In each questionnaire, an open-ended question was asked to seek participants’ gen-

eral comments on TEM and its washback. Their responses were collected, summa-
rized and saved in a Microsoft word file. It has been noted that most participants
held a positive attitude towards TEM and its washback, which is in line with the
results yielded from closed questions.
Besides, some participants raised problems that they had encountered in TEM
preparation and administration, which accounted for some of the closed question
findings. Some of the important findings are summarized as follows.
First, some program administrators complained that none of the teachers in their
schools had participated in TEM marking, so they had little knowledge of the mark-
ing criteria. This may account for the low familiarity rate with the marking criteria,
not only for program administrators, but also for teachers and students (see Tables
5.2, 5.17 and 5.34).
Second, some program administrators and teachers considered TEM8 adminis-
tration too complicated, especially the listening task of “mini-lecture”. This is an
integrative task combining note taking while listening with gap filling, and the
invigilators’ manual has stipulated a series of steps to be followed in test administra-
tion, which some program administrators and teachers said were “quite compli-
cated”. This is a possible reason for the low satisfaction rate for the item of
“convenient administration” (Tables 5.3 and 5.18).
Third, TEM8 was considered “too difficult” by quite a few program administra-
tors, teachers and students, which may account for the fact that TEM8 seemed to
exert less washback than TEM4 (see Tables 5.5, 5.9, 5.20, 5.24, 5.38 and 5.39). As
Green pointed out in his washback model (2007), the intensity of washback was
determined by the importance and difficulty level of the test, with difficult but still
attainable test exerting the most intense washback. In this sense, the washback of
TEM8 is less intense than that of TEM4 because TEM8 is unattainable to some test
takers. Meanwhile, this may also account for the research findings that TEM4
results were claimed to be more often related to degree conferment, teacher evalua-
tion and school ranking (see Tables 5.6, 5.7, 5.21, 5.22, and 5.37). Moreover, TEM4
preparation courses seem to be more widely introduced (Tables 5.13, 5.26, and
5.40).
Apart from the above issues, participants also made quite a few suggestions on
the reform of TEM and teaching, which will be summarized in Chap. 7.
5.5.2.2 Questionnaire vs. Interview Findings
In the current research, interviews were conducted among program administrators,

teachers and students, to find out their general comments on TEM and its washback,
to verify questionnaire findings, and to resolve puzzles that emerged from question-
naire findings.
Interview participants’ attitudes towards TEM seemed to vary greatly, some
quite positive and some negative. However, generally speaking, positive opinions
outweigh negative ones. Meanwhile, program administrators seemed to be more
positive than teachers, and TEM4 candidates seemed to be more positive than TEM8
candidates. This is in line with questionnaire findings.
Nevertheless, some participants pointed out possible disadvantages of TEM,
problems they had encountered concerning TEM, as well as the negative washback
of TEM on their teaching and learning. These remarks have provided answers to
some of the issues that emerged from the questionnaire surveys, and in a way, cor-
roborated questionnaire findings. The major interview findings are summarized
below.
First, some participants were not familiar with TEM marking criteria. In the
interviews, not only students but also teachers and program administrators claimed
that they were not clear as to how TEM subjective items were marked. It seems that
teachers and program administrators all had access to the TEM marking criteria, but
they still claimed to be unfamiliar with them because according to some of them, “it
is far from enough just to read the marking criteria, if you don’t know how they are
operationalized”. Therefore, just to offer a written copy of the marking criteria is far
from enough, and to familiarize participants with the marking criteria involves
much more work: to introduce the marking procedures, to provide sample scripts,
and so on. As for students, they were not interested in the marking criteria because,
according to some of them, “anyway, it is impossible for me to mark my own essay

since I can hardly identify my own mistakes”, so “it is the teacher’s business to mark
compositions and translation”. This may, to some extent, also explain the question-
naire finding that students practiced writing and translation less after class, since
they could have no feedback as to whether it was well done or not.
Second, interview results showed that some of the students considered TEM
preparation classes more condensed and fruitful, and some teachers also claimed
that they were often in a rush in TEM preparation classes. This may account for the
questionnaire finding that most activities were intensive in TEM preparation classes.
Last, interview findings seem to suggest that many participants, both teachers
and students, considered TEM8 mini-lecture the most difficult part, so relevant
preparation had to be made, thus leading to the questionnaire finding that TEM8 had
exerted more influence on the course design and teaching of listening.
In addition, participants offered many suggestions on the reform of TEM, and on
teaching and learning, which shall be discussed in Chap. 7.
5.6 Summary
This chapter has reported research findings from four questionnaire surveys among
experts, program administrators, teachers and students. These large-scale surveys
aimed at eliciting participants’ perceptions of TEM and its washback, which can be
summarized as follows.
First, TEM enjoyed a high familiarity rate with the participants: the program
administrators and teachers were familiar with the current national Teaching
Syllabus and the TEM Test Syllabus, and the students were familiar with TEM test
content and testing procedures. However, they were less familiar with TEM-Oral
test and the marking criteria.
Second, participants’ evaluation of TEM was more positive than negative, and
the program administrators seemed to hold a more favorable opinion than teachers,
and teachers more positive than students. The test content and method of TEM
received a higher satisfaction rate than its administration, and the test report seemed
to be the least satisfactory item.
Third, the washback of TEM was considered to be more positive than negative,
and it has achieved most of the purposes specified in the Test Syllabus.
Fourth, decisions based on TEM were mostly as intended, but there were also
unintended uses such as ranking schools, evaluating teachers, conferring degrees,
and recruiting employees. Moreover, participants’ attitudes towards these unin-
tended uses varied, with around half holding a supportive view.
Fifth, TEM preparation was conducted in most institutions, but half in the form
of lectures. Moreover, most TEM preparation courses would last for no more than
half a semester, which indicates that TEM preparation was by no means intensive.
5.6 Summary 155
Sixth, various activities were carried out in TEM preparation classes, but TEM
related activities seemed to be more frequently conducted. Besides, there seemed to
be a heavy teaching and learning load in TEM preparation classes.
Seventh, students spent more time learning English before TEM and TEM was
believed to help them improve their English proficiency, especially those skills
assessed in TEM.
Last, the important factors affecting teaching and learning were quite similar for
various participants, such as student effort and teacher quality. Moreover, it is unan-
imously agreed that the role TEM played was quite limited.
Apart from the above findings, open questions and interviews were employed to
collect qualitative data to corroborate questionnaire findings.
Though tentative to some extent, these findings have offered a relatively clear
profile of TEM and it washback, and will thus have implications for further research,
as well as the reform of TEM and teaching.
Chapter 6
Classroom Practice and the Washback
of TEM
Based on the questionnaire findings presented in the previous chapter, a classroom

observation scheme was developed to analyze TEM preparation and non-preparation
classes, to discern possible differences which may be attributed to the washback of
TEM.
This chapter will report relevant research findings from the classroom observa-
tions, namely, the teaching content, organization, materials used, language used in
classrooms, the communicative features of teachers and students, feedback from
teachers, and the classroom atmosphere. Moreover, a comparison will be made
between TEM preparation and non-preparation classes, to identify variance caused
by TEM. Furthermore, comparisons will be made between classes delivered by dif-
ferent teachers, classes conducted among students of different language proficiency
levels, and classes from schools of different types, to distinguish other factors that
may contribute to the variance between different classes.
6.1 The Teachers Observed
Six teachers were selected for observation. They were all English teachers with
TEM preparation experience. Their educational background and teaching experi-
ences are listed in Table 6.1.
As shown in Table 6.1, of the six teachers observed, half had MA degree and the
other half, PhD degree. Four of them were lecturers and the other two were associ-
ate professors. They were all experienced teachers with a teaching experience of
around 10 years.

https://doi.org/10.1007/978-981-13-1963-1_6
158 6 Classroom Practice and the Washback of TEM
Table 6.1 Background information of the teachers observed

A B C D E F
Gender Female Male Female Female Female Female
Degree Master PhD Master PhD PhD Master
Title Lecturer Lecturer Lecturer Associate Professor Associate Professor Lecturer
Teaching 10–15 5–10 5–10 10–15 10–15 5–10
years
6.2 The English Classes Observed
Altogether 52 classes were observed and each class was around 45 min long. It
would be ideal if TEM preparation and non-preparation courses could be delivered
by the same teacher, so that teacher variance could be limited. However, due to
some practical reasons, two out of the six teachers observed had only TEM8 prepa-
ration courses recorded. In the following semester when they were supposed to
teach advanced English, one (Teacher E) was assigned to teach American Literature,
so Teacher E had no advanced English classes to be observed; and the other (Teacher
F) was transferred from the ordinary university where she worked to a key univer-
sity, where she was assigned to teach fundamental English. Since fundamental
English was, in terms of teaching content, a comprehensive English course, thus
compatible with TEM8 preparation courses. Therefore, her fundamental English
classes were also observed. As for the rest of the teachers observed, they all taught
compatible TEM preparation courses and non-preparation courses, that is, both
TEM4 preparation course and fundamental English, or both TEM8 preparation
course and advanced English. Table 6.2 lists the number of classes taught by them.
Moreover, classes from various schools were observed, to investigate not only
top universities, but also less established schools (Table 6.3).
Last, both TEM-preparation courses and non-preparation courses were surveyed.
Since TEM is a comprehensive test and the TEM preparation courses observed are
often comprehensive ones, equivalent non-preparation courses should also be com-
prehensive in that they deal with all language skills. Therefore, fundamental English
and advanced English courses were selected as the non-preparation courses to be
observed. Moreover, it is common practice for some schools to offer TEM prepara-
tion in fundamental English and advanced English courses in the semester when
TEM4 and TEM8 are held or in the previous semester. Therefore, the TEM
preparation classes observed here are of two types, one is in a separate course named
TEM Preparation Course, and the other is the TEM preparation classes delivered in
fundamental English or advanced English courses. In this sense, whether a class is
considered TEM preparation class or non-preparation class is determined by the
teaching content of that class. The number of TEM preparation and non-preparation
classes observed is listed in Table 6.4, together with the time when these courses
were offered.
From Table 6.4, it can be noted that TEM4 preparation was often included in
fundamental English courses to second-year students who were to take TEM4 at the
6.3 Characteristics of English Classroom Practice 159
Table 6.2 No. of classes taught by each teacher

A B C D E F
TEM4 preparation 7 0 4 0 0 0
Fundamental English 5 0 6 0 0 6
TEM8 preparation 0 6 0 1 3 6
Advanced English 0 5 0 3 0 0
Table 6.3 No. of classes observed from different schools

Key university Ordinary university College
TEM4 18 0 10
TEM8 7 17 0
Note: Key universities are those listed in the “211 Project”, a project aiming at establishing 100 top
universities in China in the twenty-first century
Table 6.4 No. of TEM preparation and non-preparation classes observed

TEM4 preparation TEM8 preparation
Separate Fundamental TEM4 Separate Advanced TEM8
course English non-preparation course English non-preparation
First year 0 0 6 0 0 0
Second year 0 11 11 0 0 0
Third year 0 0 0 0 1 8
Fourth year 0 0 0 15 0 0
end of the fourth semester; while TEM8 preparation courses were mostly offered in
a separate course to fourth-year students who were to take TEM8 at the beginning
of the eighth semester. One TEM8 preparation class was offered to the third-year
students in advanced English courses. Does it mean that TEM8 preparation would
run on for a year? A further interview with the teacher (Teacher D) shows that it was
not because TEM8 preparation would run on for two semesters. To have TEM8
preparation classes in the third year was simply because no advanced English
courses were to be offered in the fourth year, so TEM8 preparation had to be
advanced.
6.3 Characteristics of English Classroom Practice
According to Alderson and Wall (1993), a test will influence teaching and learning,
such as the content and method, rate and sequence, degree and depth, and the atti-
tude towards teaching and learning. Therefore, in the observation, the classroom
activities were closely analyzed, to discern the content and method involved, as well
as the classroom atmosphere.
Table 6.5 Teaching content of all the classes observed

Minimum Maximum Mean Std. Deviation
General knowledge .00 .00 .00 .00
Reading .00 100.00 (6) 31.63 41.28
Listening .00 10.00 .38 1.67
Writing .00 20.00 .57 3.07
Speaking .00 70.00 2.88 10.81
Translation .00 90.00 3.46 13.95
Integrated skills .00 .00 .00 .00
Grammar & vocabulary .00 85.00 10.09 22.67
TEM listening .00 45.00 1.25 6.33
TEM dictation .00 10.00 .19 1.38
TEM mini-lecture .00 100.00 (1) 3.36 15.67
TEM vocabulary .00 100.00 (2) 8.07 23.74
TEM reading .00 100.00 (1) 6.53 22.10
TEM cloze .00 60.00 3.46 12.81
TEM writing .00 100.00 (5) 13.65 30.76
TEM general knowledge .00 100.00 (4) 7.88 26.86
TEM proofreading .00 100.00 (1) 2.88 14.39
TEM translation .00 80.00 3.65 15.69
6.3.1 Teaching Content
The teaching content of both the TEM preparation and non-preparation classes has
been analyzed, and it is noted that these classes covered a wide range, encompass-
ing various language knowledge and language skills, as well as all the test content
of TEM.
Table 6.5 summarizes the teaching content of all the classes observed. The per-
centage of time devoted to each task was calculated and reported. For example, in a
45-min class, if the teacher spent 20 min explaining new words in the textbook, then
45 would be recorded in the column of grammar and vocabulary (20/45 = 0.44,
which is round up to 45%, with 5%being the minimum unit for calculation).
Accordingly, the numbers listed in the above table are the minimum and the maxi-
mum percentage of time devoted to each task of all the classes observed, as well as
the percentage of time devoted to each task on average. The number in brackets in
the column of “Maximum” is the number of classes devoted solely to a single task.
For example, reading is the only skill tackled in 6 classes. Moreover, while calculat-
ing the average percent of time spent on each task, TEM preparation and non-
preparation classes were calculated together because it is likely that some TEM
tasks may be dealt with in non-preparation classes, and vice versa.
In this sense, the following findings can be summarized from the above table.
Firstly, the classes observed had a wide coverage, which encompassed all the
tasks listed except two: general knowledge and integrated tasks. General knowledge
refers to the special knowledge for English majors, such as linguistics, English
l iterature and culture of the English speaking countries. It has been covered in TEM
preparation classes in the form of TEM preparation, but not handled in non-
preparation classes. This might be attributed to the fact that the non-preparation
classes observed were Fundamental and advanced English courses, which mainly
dealt with intensive reading; while general knowledge would be taught in some
separate courses such as linguistics, English literature (British or American), and
culture of English speaking countries. Since these courses were available in most
schools surveyed (see Table 5.11), the teachers of fundamental and advanced English
would not spend time on them. However, according to the Teaching Syllabus, the
purpose of teaching and learning for English majors is not only to improve their
English proficiency, but also to broaden their horizon and enhance their cultural
awareness. As clearly stated in the course description of advanced English (NACFLT
2000), this is a comprehensive English course which aims at widening students’
knowledge scope, deepening their understanding of life and culture, and cultivating
their ability to appreciate literature. Therefore, it might be helpful if teachers could
spend some time introducing related cultural and literary knowledge in class.
Another task not involved is integrated task. In the current observation scheme, inte-
grated tasks mainly refer to dictation and cloze tasks, tasks that have their equiva-
lents in TEM. A possible reason why these tasks were not covered is that these tasks
were quite time consuming so they were seldom done in class time.
Secondly, non-preparation classes covered a wide range of language knowledge
and language skills, but reading is by far the most frequently tackled skill. This find-
ing is in line with our expectation because the courses observed were Fundamental
and advanced English, which mainly dealt with intensive reading skills. Besides
reading, knowledge of grammar and vocabulary also received much attention in
class, accounting for more than 10% of the class time on average. Less emphasis
was laid on the rest of the skills, such as listening, writing, speaking and translation,
which account for no more than 5% of the class time on average. Comparatively,
translation and speaking tasks were more often conducted, because teachers may
ask students to translate certain sentences in the textbook to check their understand-
ing, and speaking activities may involve various tasks, such as reading aloud, stu-
dent presentation, discussion and so on. However, listening tasks are quite difficult
to conduct, especially in classrooms not equipped with audio facilities, so they are
often done in a separate listening course. As for writing, this is a time-consuming
task, so it is seldom conducted within the class.
Thirdly, all TEM tasks were covered in the classes observed, and the time spent
on each task on average seemed to be more even, as compared with the non-
preparation tasks. This is mainly due to the fact that most TEM preparation classes
observed were comprehensive ones, which were supposed to cover all items of
TEM. However, comparatively, the writing task received most attention, with 5
classes tackling this task, and general knowledge seemed to be the other firm topic
for TEM preparation; while TEM listening seemed to be the least emphasized.
Several reasons may account for this finding. First, general knowledge, as mentioned
by Teacher F in the TEM preparation class, is comparatively easy among all TEM8
items, so it was believed that a brief preparation would help students score higher in
Table 6.6 Teaching materials used in class

Textbook TEM test paper TEM mock test Other
Yes 55.8% 51.9% 51.9% 26.9%
No 44.2% 48.1% 48.1% 73.1%
this part. Next, writing received more attention because it is an item included in both
TEM4 and TEM8, and it is common for teachers to brief its marking criteria and
comment on students’ writing assignment in class. Finally, TEM listening received
less attention not because it was unimportant, but that it was difficult to practice in
classroom without audio facilities, and as questionnaire findings indicate, some
schools had offered a separate advanced listening course to cope with this task.
Last, it is noted that activities conducted in some classes were quite monotonous,
covering one task only, and in this respect, TEM preparation classes seemed to be
more focused than non-preparation classes. Among all the 27 TEM preparation
classes, 14 focused on one skill only, while among 25 non-preparation classes, only
6 focused on one task only. This seems to suggest that non-preparation classes offer
a wider range of teaching content than TEM preparation classes.
In addition, a brief pre-observation interview was conducted with the teachers to
find out the teaching materials they would use in the classes observed.
Table 6.6 shows that various teaching materials had been used in the classes
observed, including not only textbooks and TEM test papers, but also TEM mock
tests and other materials such as the authentic materials from newspapers, novels,
and so on. This is a desirable trend since the various materials used may help stu-
dents broaden their horizon and extend their knowledge of English.
6.3.2 Teaching Methods
The following aspects have been observed while investigating the teaching methods
employed, namely, the activities organized, the use of English by both teachers and
students, and the features of teacher-student interaction.
6.3.2.1 Classroom Activities Organized
Classroom observation shows that various activities had been organized, such as
teacher lecture, student presentation, individual seat work, teacher-student interac-
tion, and so on. The percentage of time devoted to each activity is counted and
reported in the following table. For example, in a 45-min class, if the student spent
10 min making presentation, then 20 will be recorded in the column of student pre-
sentation (10/45 = 0.22, which is round down to 20%, with 5% being the minimum
calculation unit). Moreover, in the column of “Maximum”, the number of classes
adopting a single activity is reported, so 4 in “teacher lecture” means 4 classes
employed only one type of activity—teacher lecture.
Table 6.7 Classroom activities organized in all the classes observed

Minimum Maximum Mean Std. Deviation
Teacher lecture 10.00 100.00 (4) 76.92 19.97
Student presentation .00 50.00 4.61 10.37
Individual seat work .00 70.00 5.19 12.16
Pair work .00 .00 .00 .00
Group work .00 55.00 1.17 7.71
Teacher-fronted interaction .00 50.00 3.03 8.72
Teacher-student interaction .00 50.00 9.42 8.94
Table 6.8 English used in classrooms observed

>80% 60–80% 40–60% 20–40% <20% Mean
Teachers’ use of English 38.5% 7.7% 13.5% 34.6% 5.8% 3.38
Students’ use of English 22.9% 20.8% 4.2% 25.0% 27.1% 2.87
From Table 6.7, it can be inferred that teacher lecture was the dominant activity
in classrooms, which took up more than three fourths of the class time. Moreover,
there were 4 classes in which teacher lecture was the only type of activity con-
ducted. Among the other activities, teacher-student interaction was the most fre-
quent one, which accounted for around 10% of the class time. The above findings
seem to indicate that teachers still played a dominant role in the classroom and the
time for students to perform was quite limited, so the classrooms were still teacher-
centered, rather than student-centered.
6.3.2.2 English Used in Classrooms
Both teachers and students were observed on how frequently they spoke the target
language—English. Five-point Likert scale was employed to measure the fre-
quency: 5 will be recorded if they speak English for more than 80% of the time, and
4 means from 60 to 80%, and likewise.
From Table 6.8, it can be noted that teachers used English more frequently than
students, but both teachers and students varied vastly among themselves in the use
of English. On average, teachers used English a little more often than not, while
students were just the opposite. First language can play a facilitative role in second
language classrooms; however, teachers’ excessive use of the first language will
deprive students of the opportunity to get enough English input, and students will
lose the opportunity to practice English if they overuse their first language in the
classrooms. According to Macaro (2005), 10–15% constituted an acceptable thresh-
old for the use of the first language.
Table 6.9 Features of teacher-student interaction

Authenticity of questions and responses
Authentic Neutral Pseudo
Teachers’ question 4.2% 14.6% 81.2%
Teachers’ response 33.3% 12.5% 54.2%
Students’ question 66.7% 8.3% 25.0%
Students’ response 4.2% 52.1% 43.8%
Length of questions and responses
Word Phrase Sentence Sentences Paragraph
Teachers’ question 0 0 31.3% 56.3% 12.5%
Teachers’ response 0 0 54.2% 6.3% 39.6%
Students’ question 0 27.1% 72.9% 0 0
Students’ response 4.2% 27.1% 58.3% 10.4% 0
Teachers’ feedback
Never Seldom Sometimes Frequent Always
Positive feedback 0 0 87.0% 13.0% 0
Negative feedback 0 13.0% 87.0% 0 0
Feedback on meaning 0 20.8% 56.3% 0 22.9%
Feedback on form 0 8.3% 66.7% 2.1% 22.9%
6.3.2.3 Features of Teacher-Student Interaction
Teacher-student interaction took up around 10% of the class time, so further inves-
tigation was made to discern the communicative features of their interaction,
namely, the authenticity of the questions asked, the length of questions and
responses, as well as the way teachers offered feedback (Table 6.9).
The authenticity of questions and responses is mainly determined by the unpre-
dictability of the information provided. An authentic question seeks relatively
unpredictable information, that is, “information not easily anticipated in that a wider
range of information can be given”, and its corresponding response would also be
considered authentic (Watanabe 1997: 303). In this sense, it would be considered an
authentic question if the teacher asks, “How do you understand …?”, while “What
is the meaning of arbitrary?” would be considered pseudo. The frequency of authen-
tic questions asked and responses given would be calculated subsequently. The
teacher’s questions would be coded as “authentic” if authentic questions were asked
frequently; while “neutral” means around half of the questions asked were authentic
and “pseudo” means most questions asked were not authentic. As for the length of
questions and responses, five point Likert scale was employed, with 1 referring to
word; 2, phrase; 3, a single sentence; 4, 2–3 sentences; and 5, more than 3 sen-
tences. Teachers’ feedback was also measured in a five-point Likert scale, with 1–5
referring to the different frequency band listed in Table 6.9.
The following findings can be summarized based on Table 6.9.
First, in terms of authenticity, classroom communication was not quite authentic,
especially for teachers’ questions. It was noted that most questions asked by t eachers
6.4 TEM Preparation Classes vs. Non-preparation Classes 165
Table 6.10 Frequency of No. of classes

students’ laughter Frequency of laughter observed Percent
0 34 65.4%
1 7 13.5%
2 5 9.6%
3 5 9.6%
4 1 1.9%
were about the meaning of a certain word or sentence, or the answer to a certain
exercise question, the aim of which was just to check whether students’ understand-
ing was consistent with the key, instead of seeking any unknown information. It
seems that teachers preferred to ask these closed display questions, that is, questions
with one possible answer that the teacher has already known. In this way, teachers
may avoid unpredictability of students’ responses, so that they can have a better
control of the class (Smith and Higgins 2006). However, students’ questions were
more authentic, usually about the puzzles that they had encountered while
learning.
Second, teachers’ questions and responses were often in a sentence or a cluster
of sentences while students’ questions and responses were much shorter, often in
the form of phrases or a single sentence. This might be caused by the students’ lim-
ited language ability, and another possible reason is that the questions asked by
teachers were often closed display ones, which “elicit short, mechanical responses”
(Ho 2005: p.298). In this sense, it seems that more efforts should be made to encour-
age students to say more, so that they would have more opportunities to practice
their English.
Last, as to teachers’ feedback, it seems that teachers gave feedback sometimes,
and their feedback was more positive than negative. Moreover, their feedback
seemed to focus more on form than meaning.
6.3.2.4 Classroom Atmosphere
The frequency of laughter is often counted as an indicator of the classroom atmo-

sphere (Watanabe 1997; Gu 2007), and it is noted that students did not laugh a lot in
classrooms. As shown in Table 6.10, in most classes, students did not laugh even
once.
6.4 TEM Preparation Classes vs. Non-preparation Classes
A comparison has been made between TEM preparation classes and non-preparation
classes, and it is noted that differences exist in the following aspects.
Table 6.11 No. of skills (knowledge) covered in each class

1 2 3 4 5
TEM4 preparation 45.5% 27.3% 18.2% 9.1% 0
TEM8 preparation 56.3% 12.5% 6.3% 0 25%
Fundamental English 35.3% 52.9% 5.9% 5.9% 0
Advance English 0 50.0% 50.0% 0 0
6.4.1 Teaching Content
It is natural that the teaching content of TEM preparation classes would be different
from non-preparation classes, since the former would be more TEM oriented, focus-
ing on the test content of TEM. In addition, TEM preparation classes seemed to
either more focused or more diverse.
From Table 6.11, it can be noted that around half of the TEM4 and TEM8 prepa-
ration classes focused on one skill only, while most non-preparation classes covered
two to three skills. This seems to suggest that TEM preparation classes were more
focused. However, 25% of the TEM8 preparation classes covered five skills, sur-
passing all non-preparation classes, which, on the other hand, seems to suggest that
TEM preparation classes were more diverse. A close observation of the TEM prepa-
ration classes shows that it was because TEM preparation classes were mainly of
two types, one was organized by section, so each class focused on one section of
TEM, and the other was comprehensive preparation usually in the form of test paper
analysis, so the teacher would try to cover all the skills measured in TEM.
The above finding further proves that TEM will affect the teaching content, as
well as the organization of the teaching content. The skills or knowledge taught in a
single TEM preparation class may not facilitate or reinforce each other, unlike what
is taught in a non-preparation class. To illustrate, the listening and reading activities
conducted in non-preparation classes were usually related, focusing on the same
topic; while the listening and reading activities conducted in TEM-preparation
classes were often separate tasks on different topics listed in a test paper. In this
sense, the teaching content of non-preparation classes seems to be better
organized.
6.4.2 Teaching Methods
A comparison has been made between the teaching methods used in TEM prepara-
tion and non-preparation classes, and it is found that they differed significantly in
the classroom activities organized, English language used in class, but no significant
difference has been identified in the features of teacher-student interaction and the
classroom atmosphere.
Table 6.12 Mean time devoted to each activity (percentage of time)

Teacher lecture (F = 3.77, p = 0.016)
N Mean Std. Deviation Std. Error Minimum Maximum
TEM4 preparation 11 84.54 12.73 3.84 60.00 100.00
TEM8 preparation 16 85.62 18.96 4.74 30.00 100.00
Fundamental English 17 67.05 21.79 5.28 10.00 90.00
Advanced English 8 70.00 16.25 5.74 40.00 85.00
Student presentation (F = 3.297, p = 0.028)
TEM4 preparation 11 1.36 2.33 .70 .00 5.00
TEM8 preparation 16 .00 .00 .00 .00 .00
Fundamental English 17 9.11 12.40 3.00 .00 40.00
Advanced English 8 8.75 17.06 6.03 .00 50.00
Teacher-student interaction (F = 6.622, p = 0.001)
TEM4 preparation 11 8.63 7.10 2.14 .00 20.00
TEM8 preparation 16 3.43 3.96 .99 .00 10.00
Fundamental English 17 11.76 6.10 1.48 .00 25.00
Advanced English 8 17.50 14.88 5.26 .00 50.00
Note: N: the number of classes of each type
6.4.2.1 Classroom Activities Organized
One-Way ANOVA has been employed to compare the means of the four indepen-
dent groups of classes: TEM4 preparation classes, TEM8 preparation classes, fun-
damental English and advanced English, and the research findings indicate that they
differed significantly (significance level: 0.05) in the mean time devoted to teacher
lecture, student presentation and teacher-student interaction.
Table 6.12 shows that teacher lecture took up around 85% of the class time in
TEM preparation classes, while in non-preparation classes, around 70% of the time
was devoted to it. As for student presentation, it was seldom conducted in TEM
preparation classes (1.36% for TEM4 preparation classes and 0 for TEM8 prepara-
tion classes). Moreover, teacher-student interaction took place less frequently in
TEM preparation classes than in non-preparation classes. All the above findings
seem to suggest that TEM preparation classes were more teacher-centered, with
more time devoted to teacher talk and less time to student practice.
Interviews with teachers and students have identified the following reasons for
this variance.
First, the limited time of TEM preparation classes. As suggested by the question-
naire findings (Tables 5.13, 5.26, and 5.40), most schools offered a TEM prepara-
tion course of around 10 class hours, which, according to the teachers interviewed,
was not enough. Therefore, to complete this demanding teaching task, the teachers
lectured a lot in class to cover all the teaching content, thus leaving little time for
students to practice.
Second, teachers and students’ understanding of test preparation. The interview

findings show that some teachers believed that the purpose of test preparation was
to help students perform better in the test, so to speak, to get a higher score in
TEM. Moreover, this attitude was shared by many students. In this sense, some of
them, especially students, considered it necessary for teachers to lecture more in
class, so that they can get more information about TEM.
Third, teachers and students’ attitude towards classroom practice. According to
interview findings, some participants, especially students, seemed to consider
teacher lecture superior to student practice. As one student said, “It is desirable that
teachers lectured more in TEM preparation classes. We can practice after class by
ourselves, so the precious class time would not be wasted on student practice.” This
opinion is echoed by the questionnaire finding (see Table 5.43) that more students
preferred explaining test papers to working on test papers, both for TEM4 and
TEM8.
6.4.2.2 English Used in Class
In TEM preparation classes, teachers seemed to lecture more, but which language
did they use more often, the first language Chinese or the target language English?
A comparison was made and it was noted that significant difference exists between
TEM preparation and non-preparation classes (Table 6.13).
Five-point Likert scale has been employed to measure the frequency of English
used in class: 5 means over 80% of the class was delivered in English, 4 means
60–80%, 3 means 40–60%, 2 means 20–40%, and 1 means less than 20%. From the
above table, it can be concluded that English was used much more frequent in non-
preparation classes than in TEM preparation classes. In non-preparation classes,
especially advanced English, English was used most of the time (above 80%); while
in TEM preparation classes, especially TEM4 preparation classes, Chinese was
used more often than not.
The interviews with teachers and students show that both teachers and students
preferred Chinese to English in a TEM preparation class, because for teachers, it
was more efficient to explain in Chinese and for students, they claimed to have a
better understanding of TEM when delivered in Chinese.
As for students’ use of English, a similar pattern has been identified (Table 6.14).
Table 6.13 English used by teachers in TEM preparation and non-preparation classes
TEM4 preparation 11 2.00 .00 .00 2.00 2.00
TEM8 preparation 16 2.68 1.25 .31 1.00 5.00
Fundamental English 17 4.17 1.18 .28 2.00 5.00
Advanced English 8 5.00 .00 .00 5.00 5.00
(F = 21.051, p = 0.000)
Table 6.14 English used by students in TEM preparation and non-preparation classes
TEM4 preparation 10 1.40 .51 .16 1.00 2.00
TEM8 preparation 13 1.76 1.09 .30 1.00 5.00
Fundamental English 17 3.82 1.23 .29 1.00 5.00
Advanced English 8 4.50 .53 .18 4.00 5.00
(F = 25.160, p = 0.000)
It seems that students used English less frequently than teachers, but the same
pattern of use existed between TEM preparation and non-preparation classes, with
students using English more frequently in non-preparation classes.
6.4.2.3 Features of Teacher-Student Interaction
A comparison has been made on the features of teacher-student interaction in TEM

preparation and non-preparation classes, but no significant difference has been iden-
tified. However, difference does exist between classes of different learning stages
(classes in the fundamental stage and advanced stage) and classes delivered by dif-
ferent teachers, which shall be explained later in Sect. 6.5.
6.4.2.4 Classroom Atmosphere
Students’ laughter has been used as an indicator of classroom atmosphere and a

comparison has been made between TEM preparation and non-preparation classes.
However, no significant difference has been identified.
To sum up, the major difference between TEM preparation and non-preparation
classes lies in the different teaching content and methods employed: more TEM
related knowledge would be taught, teachers dominated the class more, and more
Chinese was used as the media for instruction. However, whether these are positive
or negative washback of TEM remains unsettled. On one hand, the major aim of a
short-term TEM preparation course is to familiarize students with TEM, so it is
natural that the course focuses on the knowledge of TEM, which is delivered by
teacher lecture in Chinese, a more efficient way to impart the knowledge, according
to both teachers and students. In this sense, the washback of TEM on the preparation
classes can be termed positive, but only in the micro context, as defined in 4.1.3.4.
On the other hand, as an English course, its ultimate goal should be to help students
improve their language proficiency, and the TEM preparation course is no excep-
tion. In this sense, it would bring some undesirable effects on language teaching and
learning if the first language was overused in class, which should not exceed 15%,
as suggested by Macaro (2005). Moreover, teacher-centered classroom setting
would also lead to some negative effects. Therefore, the washback effects of TEM
might be negative in the macro context (see 4.1.3.4). In a word, TEM preparation
courses can hardly help students improve their language proficiency, although it
may familiarize students with TEM so that they will perform better in the test.
6.5 Classroom Practice Affected by Other Factors
According to the questionnaire survey findings, teaching and learning was affected
by many factors, such as teacher quality, student effort, student quality, and so on,
while the role played by TEM was quite limited (Tables 5.15, 5.31, 5.50). In this
sense, a further analysis has been made to discern differences caused by other fac-
tors, namely, differences caused by teacher difference, school difference (key uni-
versity, ordinary university and college), and student difference (whether they were
first-year, second-year, third-year or fourth-year students). The significant differ-
ences are reported in the following part.
6.5.1 English Used in Class
Further analysis seems to indicate that TEM was an overwhelming factor affecting
the language used in class, since the significant differences located in other factors
were, to some extent, still attributed to it.
As shown in Table 6.15, significant difference existed between classes delivered
to students of different grades. However, a close observation of the data reveals that
English was more frequently used in first-year and third-year classes simply because
these mainly consisted of the fundamental English and advanced English classes,
while most TEM preparation classes were delivered in the second and fourth year.
In this sense, this finding further attests to the washback effects of TEM on the lan-
guage used in class.
From Table 6.16, it can be noted that Teacher D used English most frequently
and Teacher E used English least frequently. A further look at the courses they
taught would clarify this finding. Teacher E had only three TEM8 preparation
classes observed, so her use of English in class was minimum, less than 20%; while
Teacher D had three advanced English classes and one TEM8 preparation class
Table 6.15 English used by teachers in classes of different grades

First year 6 5.00 .00 .00 5.00 5.00
Second year 22 2.86 1.24 .26 2.00 5.00
Third year 9 5.00 .00 .00 5.00 5.00
Fourth year 15 2.53 1.12 .29 1.00 4.00
(F = 17.799, p = 0.000)
Note: N: the number of classes of each grade
6.5 Classroom Practice Affected by Other Factors 171
Table 6.16 English used by different teachers in class

A 12 3.25 1.54 .44 2.00 5.00
B 11 3.54 1.43 .43 2.00 5.00
C 10 2.40 .51 .16 2.00 3.00
D 4 5.00 .00 .00 5.00 5.00
E 3 1.00 .00 .00 1.00 1.00
F 12 4.25 .96 .27 2.00 5.00
(F = 7.202, p = 0.000)
Note: N: the number of classes taught by each teacher
Table 6.17 English used by teachers in classes from different schools

Key university 25 3.68 1.67 .33 1.00 5.00
Ordinary university 17 3.52 1.23 .29 2.00 5.00
College 10 2.40 .51 .16 2.00 3.00
(F = 3.186, p = 0.050)
Note: N: the number of classes of each type of university
observed, and moreover, her TEM8 preparation class was taught in the advanced
English course, so it retained the style of non-preparation classes. As a result, all the
classes of Teacher D were delivered in English (over 80% of the class time for all
her classes). This finding, like the previous one, may serve as a further proof of the
washback of TEM on the language that teachers use in class.
Another factor that may have contributed to the difference in teachers’ use of
language in class is the school type, that is, whether the class was from a key uni-
versity, ordinary university, or college.
As shown in Table 6.17, the level of school seemed to have some effect on the
English used by teachers in class, with teachers from key universities speaking
English most frequently. This is possibly attributed to differences in teacher and
student quality in various schools. However, since the p value is 0.050, which is
exactly the threshold value for significant difference in the one-way ANOVA test,
the inference should be made with prudence.
6.5.2 Features of Teacher-Student Interaction
No significant difference has been detected between TEM preparation and non-
preparation classes. However, further analysis shows that some differences exist
between classes delivered by different teachers in different schools.
Table 6.18 shows that significant difference existed in almost all communicative
features of teacher-student interaction in the classes delivered by each teacher. This
proves that every teacher had their own teaching method, and their teaching method
will not be affected by TEM significantly, since no significant difference has been
located between TEM preparation and non-preparation classes. This finding is, to
Table 6.18 Communicative features of classes delivered by different teachers

Sum of squares df Mean square F Sig.
Authenticity of Between groups 12.07 5 2.41 5.511 .001
teachers questions Within groups 18.40 42 .43
Total 30.47 47
teachers’ responses Within groups 26.66 42 .63
Total 125.25 47
Length of teachers’ Between groups 12.03 5 2.40 13.898 .000
questions Within groups 7.27 42 .17
Total 19.31 47
Length of teachers’ Between groups 26.31 5 5.26 12.511 .000
responses Within groups 17.66 42 .42
Total 43.97 47
students’ questions Within groups 30.63 42 .72
Total 142.66 47
students’ responses Within groups 24.10 42 .57
Total 50.97 47
Length of students’ Between groups 6.06 5 1.21 14.905 .000
questions Within groups 3.41 42 .08
Total 9.47 47
Length of students’ Between groups 6.69 5 1.33 3.449 .011
responses Within Groups 16.30 42 .38
Total 23.00 47
Teachers’ positive Between groups 2.49 5 .49 7.304 .000
feedback Within groups 2.72 40 .06
Total 5.21 45
Teachers’ negative Between groups 2.49 5 .49 7.304 .000
feedback Within groups 2.72 40 .06
Total 5.21 45
Teachers’ feedback Between groups 64.48 5 12.89 27.871 .000
on meaning Within groups 19.43 42 .46
Total 83.91 47
Teachers’ feedback Between groups 38.46 5 7.69 18.102 .000
on form Within groups 17.84 42 .42
Total 56.31 47
some extent, in line with the research findings of previous washback studies—a test
will influence what teachers teach more than how they teach (Cheng 2005; Gu 2007;
Wall 2005; and so on).
Moreover, it has been noted that the level of school would also have some effect
on teacher-student interaction (Table 6.19).
6.5 Classroom Practice Affected by Other Factors 173
Table 6.19 Communicative features of classes from different schools

Sum of df Mean F Sig.
squares square
teachers Within groups 18.60 45 .41
questions Total 30.47 47
teachers’ Within groups 77.47 45 1.72
responses Total 125.25 47
Length of Between groups 10.46 2 5.23 26.620 .000
teachers’ Within groups 8.84 45 .19
Length of Between groups 14.90 2 7.45 11.530 .000
teachers’ Within groups 29.07 45 .64
students’ Within groups 37.70 45 .83
students’ Within groups 27.03 45 .60
Descriptives
N Mean Std. Std. Minimum Maximum
Deviation Error
Authenticity of Key 23 2.30 .76 .15 1.00 4.00
teachers’ Ordinary 15 1.86 .63 .16 1.00 3.00
questions College 10 1.00 .00 .00 1.00 1.00
Total 48 1.89 .80 .11 1.00 4.00
Authenticity of Key 23 3.60 1.40 .29 2.00 5.00
teachers’ Ordinary 15 3.00 1.55 .40 1.00 5.00
responses College 10 1.00 .00 .00 1.00 1.00
Total 48 2.87 1.63 .23 1.00 5.00
Length of Key 23 4.21 .51 .10 3.00 5.00
Total 48 3.81 .64 .09 3.00 5.00
Length of Key 23 4.39 .89 .18 3.00 5.00
Total 48 3.85 .96 .13 3.00 5.00
(continued)

students’ Ordinary 15 4.20 1.47 .38 1.00 5.00
Total 48 3.83 1.74 .25 1.00 5.00
students’ Ordinary 15 2.53 .74 .19 1.00 3.00
Total 48 2.35 1.04 .15 1.00 5.00
Note: N: number of classes in each category
Table 6.20 Laughter in classes of different grades

First year 6 2.00 1.41 .57 .00 4.00
Second year 22 .72 1.20 .25 .00 3.00
Third year 9 .11 .33 .11 .00 1.00
Fourth year 15 .46 .74 .19 .00 2.00
(F = 4.602, p = 0.007)
Note: N: number of classes of each grade
It seems that teachers from different schools adopted different questioning skills,
with teachers from key universities asking the most authentic and lengthy questions.
This may be attributed to the differences in teacher and student quality in different
schools, since teachers with a better command of English are, to some extent, less
reluctant to ask authentic questions that would seek unknown information, ques-
tions that may add to the unpredictability of the class, which would pose some chal-
lenge to the teacher. On the other hand, students with higher language proficiency
would be more capable of answering these open lengthy questions.
6.5.3 Classroom Atmosphere
Laughter has been used as an indicator of classroom atmosphere, and it is noted that
no significant difference exists between TEM preparation and non-preparation
classes, classes delivered by different teachers, and classes delivered in different
schools. However, classes of different grades are found to be significantly
different.
The Table 6.20 shows that first-year and second-year students laughed more fre-
quently than students in the third and fourth year. This seems to suggest that class-
room atmosphere declined as students entered the advanced stage. Interview
findings supported this view, as some teachers complained that students’ passion for
learning dwindled with the progression of their college education. Moreover, some
students also complained that classes became less and less “meaningful”. In this
view, it seems that more efforts should be made to improve classes, so that students’
interest in learning can be sustained.
6.6 Triangulation
To ensure the validity of classroom observation data, methodological triangulation

has been employed to corroborate data collected through various methods, such as
questionnaire surveys, interviews, and document analysis of the previous studies.
Classroom observation findings show that the major differences between TEM
preparation and non-preparation classes are (1) the teaching content of TEM prepa-
ration classes was more TEM oriented, focusing on the skills and knowledge mea-
sured in TEM; (2) TEM preparation classes were more teacher-centered, with more
time spent on teacher lecture; (3) Chinese was more frequently used in TEM prepa-
ration classes as the media of instruction and communication, by both teachers and
students. Moreover, it has been noted that various factors contributed to the variance
in classroom practice, such as teachers’ individual differences, different levels of
schools, and differences among students.
Triangulation of the above findings will be presented in the following part.
6.6.1 lassroom Observation Findings vs. Questionnaire

C
Findings
In the questionnaire surveys, both teachers and students answered questions on their
practice in TEM preparation classrooms and the factors affecting teaching and
learning, which can be employed to corroborate classroom observation data.
6.6.1.1 eaching Content in TEM Preparation and Non-preparation

T
Classes
According to classroom observation findings, TEM preparation classes would be

more TEM oriented, covering skills and knowledge measured in TEM. This is in
line with the questionnaire findings (Tables 5.27, 5.28, 5.41, and 5.42), which
showed that more time would be spent on TEM test papers in class and the TEM
related skills, such as reading and listening, would be more frequently practiced
than non-related skills, such as speaking and translation (for TEM4).
Moreover, questionnaire findings seem to suggest that teaching activities in TEM
preparation classes have been intensified, which can be echoed by the classroom
observation finding that teaching content in some TEM preparation classes is more
diverse, covering more skills and knowledge than non-preparation classes.
6.6.1.2 Other Factors Influencing Classroom Practice
Questionnaire findings indicate that teaching and learning was affected by many
factors, such as teacher quality, student quality, student effort, and so on (Tables
5.15, 5.31, and 5.50), while the role played by TEM was quite limited. This has
provided support for the classroom observation finding that classroom practice,
especially the features of teacher-student interaction, was affected by many other
factors, rather than TEM.
Furthermore, the factors identified in the questionnaire surveys actually coincide
with those in the classroom observation. According to the questionnaire findings,
teaching and learning was affected by such factors as leader attention, teacher qual-
ity, student quality, student effort, student interest (the opinions of program admin-
istrators, teachers and students varied slightly because of their different perspectives).
These factors were, in fact, equivalent to those identified in the classroom observa-
tion. For example, factors affecting the features of teacher-student interaction
include teachers’ individual differences and the level of schools, which, actually,
encompass such factors as teacher quality, student quality. In this sense, classroom
observation data could be corroborated by questionnaire findings.
6.6.2 Classroom Observation Findings vs. Interview Findings
Interviews were conducted among relevant teachers and students to verify class-
room observation findings, and it seems that the interview data collected are quite
supportive, providing answers to most of the puzzles that emerged in the analysis of
the observation data. Corroboration has been offered in the analysis of observation
data in Sects. 6.3, 6.4 and 6.5, and the following is a summary.
First, it has been noted that teachers lectured more in the classes observed, and
interview findings offered several possible reasons, such as the limited time of the
TEM preparation classes, teachers’ and students’ attitudes towards teacher lecture
and student practice (see Sect. 6.4.2.1).
Second, in the analysis of the observation data, teachers and students were found
to use Chinese more often in TEM preparation classes than non-preparation classes,
and interview data suggest that Chinese was preferred by both teachers and students
because most of them believed that it was more effective and efficient to communi-
cate in the mother tongue.
Last, in the classes observed, students in the fundamental stage were found to
laugh more often than those in the advanced stage, which, according to interview
findings, could possibly be attributed to the declining interest and passion that stu-
dents had for language learning.
6.7 Summary 177
6.6.3 Classroom Observation Findings vs. Document Analysis
As suggested by the findings of some previous washback studies (Cheng 2005; Gu

2007; Wall 2005; and so on), tests will influence teaching content more than teach-
ing methods, which has also been noted in the current observation. For instance,
TEM preparation classes were oriented towards the test, covering various sections
in TEM, while non-preparation classes focused more on textbooks, mainly text
comprehension, which proves that teaching content varied a lot between TEM prep-
aration and non-preparation classes. However, in terms of teaching methods, teach-
ers were found to lecture more and use Chinese more, but no significant difference
has been identified in the features of teacher-student interaction between TEM prep-
aration and non-preparation classes. This shows that TEM has had some influence
on the teaching methods, instead of teachers’ teaching style which might be affected
by some other factors.
6.7 Summary
This chapter has presented and discussed classroom observation findings, and the
research results can be summarized as follows.
First, the teaching content of TEM preparation classes varied greatly from that of
the non-preparation classes, with TEM preparation classes focusing more on test
content.
Second, the organization of TEM preparation classes differed from that of the
non-preparation classes in that more teacher lectures were conducted.
Third, the language used for class instruction and communication in TEM prepa-
ration classes was, in some cases, different from that of the non-preparation classes,
since TEM preparation classes were more often delivered in Chinese and the stu-
dents also tended to use Chinese more frequently.
Fourth, no significant difference was identified in the features of teacher-student
interaction between TEM preparation and non-preparation classes, but variance
existed among teachers, as well as different schools.
Fifth, teacher lecture accounted for around 70% of the class time in non-
preparation classes and around 85% of the class time in non-preparation classes
(Table 6.12), which seemed to suggest that the classroom was still dominated by
teachers, be it TEM preparation or non-preparation classes (Table 6.12).
Sixth, research findings of the features of teacher-student interaction showed that
teacher’s questions were not authentic and students’ responses were quite limited.
In this sense, it seems that more efforts should be made to improve the questioning
techniques of teachers so that students would be encouraged to communicate more.
Last, laughter seemed to occur more frequently among students in the fundamen-
tal stage than those in the advanced stage, and moreover, students did not laugh
much in either type of classes, with four times being the maximum. It seems to sug-
gest that more efforts should be made to enliven the classroom atmosphere, so that
students’ interest and passion for language learning could be aroused.
In addition to the above findings, pre-observation interviews and post-observation
interviews were conducted to collect supplementary information about the classes,
and to seek answers to the problems that emerged in the observation process and the
data analysis process. The data collected from various approaches have corrobo-
rated each other.
Chapter 7
Conclusions and Implications
7.1 Summary of the Major Findings
The present research is a multi-phase and multi-method study on the washback

effects of TEM. Questionnaire surveys have been conducted among experts, pro-
gram administrators, teachers and students; relevant classroom observations have
been carried out, based on the questionnaire findings; and interviews have been held
all through, in every stage, to collect supplementary information. The major find-
ings are presented below according to participant or activity focus.
7.1.1 Washback Effects Perceived by Experts
All the five experts surveyed held a positive attitude towards both TEM and its
washback, believing that TEM was reasonably designed and it would facilitate and
promote teaching and learning for English majors. In addition, they were all against
the unintended use and interpretation of TEM results, questioning the justification
of them. Moreover, they offered some suggestions on the reform of TEM, such as
its test content, test method, test report, and so on, which shall be listed in Sect. 7.2,
together with the suggestions from the program administrators, teacher and
students.
7.1.2 Washback Effects Perceived by Program Administrators
The program administrators were, in general, quite positive about TEM and its
washback, but their attitudes differed in some aspects. Their evaluation of TEM test
report and the convenience of TEM8 administration seemed to be less favorable

https://doi.org/10.1007/978-981-13-1963-1_7
180 7 Conclusions and Implications
than that of the other aspects of TEM quality, but TEM4 received a slightly higher
satisfaction rate. Moreover, their attitude towards the unintended use and interpreta-
tion of TEM results varied vastly, with proponents slightly more than opponents.
According to the program administrators, TEM has exerted some influence on their
curriculum design, but different practices were adopted in different schools. TEM
preparation was offered in many schools, but mostly in the form of lectures. TEM4
preparation seemed to be more often offered than that of TEM8, but neither was
intense, lasting for no more than one semester. Compared with other factors such as
teacher quality and student efforts, TEM had limited effects on teaching.
Furthermore, it was noted that program administrators’ perceptions of TEM
would be affected by their personal factors, such as their teaching experience and
educational background.
7.1.3 Washback Effects Perceived by Teachers
Teachers’ perceptions of TEM and its washback were, generally speaking, quite
positive, although they seemed to be less favorable than the program administrators.
Like the program administrators, they were less satisfied with the test report and the
convenience of TEM8 administration. Moreover, their attitudes towards the unin-
tended use and interpretation of TEM results also varied, with half agreeing and half
disagreeing. According to the teachers, TEM would affect their teaching aims and
content more than the teaching method and ideology, and their TEM preparation
classes varied from the non-preparation classes mainly in the teaching content, as
the practice and explanation of TEM test papers would be more frequently con-
ducted. However, the role TEM played in their teaching was quite limited, as com-
pared with that of the other factors such as student quality and student efforts.
In addition, teachers’ personal factors, such as their teaching experience and
educational background, were found to affect their perception of TEM washback.
7.1.4 Washback Effects Perceived by Students
Students’ perceptions of TEM and its washback were more positive than negative,
especially their perceptions of the test content of TEM and the overall washback of
TEM. More than half of the students believed that TEM4 score was used to confer
bachelor’s degree degrees, and their attitude towards this practice was more sup-
portive than repellent.
TEM would influence the learning practice of students, both in class and after
class. In class, the TEM related content was more often learnt, such as test paper
explanation, and after class, students would spend more time doing exercises on the
skills measured in TEM, such as multiple-choice listening and reading exercises.
However, TEM preparation would not last for a long time, with most students
7.2 Implications for the Reform 181
spending no more than half a semester and the time spent daily seldom exceeded
3 h. The positive washback of TEM on the learning effects have been acknowledged
by most students, who claimed that TEM has helped enhance their English profi-
ciency, especially the language knowledge that can be improved in a short time,
such as their vocabulary, grammar knowledge and general knowledge on English
linguistics, literature and culture. However, compared with other factors such as
their learning efforts and interest, TEM had limited effects on students’ learning.
Besides, students’ individual differences, such as gender and language profi-
ciency, were found to affect their perception of TEM washback.
7.1.5 Washback Effects on English Classrooms
TEM has exerted some influence on the classroom practice, such as teaching con-
tent and teaching methods. The teaching content of TEM preparation classes
focused more on the measured language knowledge and skills in TEM, and the
classes were more teacher-centered, with teacher lecture being the dominant teach-
ing format. Moreover, in TEM preparation classes, Chinese was more often used,
both by teachers and by students.
However, besides TEM, many other factors were found to affect classroom
teaching and learning, such as teachers’ individual differences and students levels of
language proficiency.
7.2 Implications for the Reform
The aim of a washback study can be twofold. The fundamental one is to describe
this complicated phenomenon—what and how a test affects teaching and learning
and what are the factors contributing to the washback effects; and the ultimate aim
is to promote positive washback and evade negative washback, that is, to revise the
test so that it will produce more positive washback and to improve teaching and
learning so that the positive washback of a test can be fully realized.
In this sense, the following part will discuss the reform of TEM and teaching and
learning for English majors, based on the research findings of the current washback
study.
7.2.1 Implications for TEM Reform
As indicated by the perceptions and evaluation of TEM by experts, program admin-

istrators, teachers and students, efforts could be made to improve TEM from the
following aspects.
7.2.1.1 TEM Test Content and Method
First and foremost, the test content of TEM could be more comprehensive to better
match the TEM test construct, thus improving the validity of TEM and bringing
more beneficial washback. If possible, the TEM-Oral test needs to be popularized.
Compared with TEM reaching an annual test population of over 200,000, the num-
ber of TEM-Oral test takers remains much smaller, about 10,000–20,000 every year,
which means that in some universities, no students take the oral test. So, it naturally
follows that many participants surveyed were not familiar with TEM Oral Test
Syllabus (Tables 5.2, 5.17, and 5.34), less emphasis was attached to the speaking
skill in class (Tables 5.27, 5.28, 5.41, and 5.42), and less time was spent on speaking
practice after class (Table 5.45). These findings deserve our close attention. As TEM
is supposed to assess test takers’ language proficiency which includes speaking, the
relatively low familiarity of the TEM-Oral test and the less importance attached to
it are not conducive to promoting positive washback. Therefore, it is necessary to
encourage more students to take the TEM-Oral test, to make it better known to
teachers and students so as to promote a balanced development in students’ lan-
guage proficiency. Moreover, the test content of TEM could be more practical and
cater to students’ needs, both in the test materials selected and test items designed.
On the one hand, the test materials could be more practical, including those articles
that students may use in their daily life and future career. For example, the transla-
tion materials could include not only literary works, but also articles of various
kinds such as business letters and news. On the other hand, the test items should
focus on the common problems of the students. For instance, the proofreading part
in TEM8 could employ more common mistakes made by students, selected from
some corpora of students’ essays.
Furthermore, the test methods employed in TEM could be modified to include
more item types, such as blank filling and short answer questions, so that test method
effect could be reduced, thus improving the validity of TEM. Due to some practical
reasons, multiple choice question is still the most frequently employed item type in
TEM, especially in TEM4. Moreover, some language abilities, such as reading,
cloze and general knowledge, are solely assessed in multiple choice questions. In
this sense, it is recommended that more subjective item types be added as a supple-
ment to better assess test takers’ ability in these aspects.
7.2.1.2 TEM Test Administration
First of all, TEM test administration could be simplified. As is shown in Tables 5.3
and 5.18, convenience of test administration for TEM8 received the lowest satisfac-
tion rate, which, as is indicated by the responses to the short-answer question, might
have resulted from the complicated administration procedure required for a TEM8
listening task which integrates listening, note-taking with gap-filling. During this
listening task, if answer sheets (a gap-filling passage) were distributed before listen-
ing (they should be distributed after listening), test takers would gain an advantage,
as they could just fill in the gaps without taking notes (test takers are required to take
notes while listening to a mini-lecture). If a small mistake occurred in test adminis-
tration (such as distributing answer sheets prematurely), it could result in construct-
irrelevant variance, thus undermining washback. Therefore, a review of the test
invigilation procedures is needed with a view to improving practicality and thus
maintaining validity.
Second, TEM test administration procedures should be clarified to ensure that
they are strictly followed in every participating school. The following suggestions
have been offered by both teachers and program administrators. First, if possible,
training should be offered to TEM invigilators, so that they will have a better idea of
the administration procedures. Second, a more detailed test administration manual
should be provided, so the teachers who are unfamiliar with TEM can better follow
the administration procedures. Third, directions for the start and end of each section
should be more precise, such as “from 10:15 to 10:25”, instead of “about 10 min”.
Last, additional suggestions have been made on the improvement of TEM test
administration. First, it might be better to collect the answer sheets to TEM8 mini-
lecture and TEM4 dictation after the listening part has been completed, since stu-
dents do not want to be interrupted when they are doing listening tasks. Second, it
might be better if TEM8 could be advanced to the end of the third year or the first
semester of the fourth year, so that it would not interfere with students’ intern at the
beginning of the last semester. Moreover, if TEM8 were administered earlier, stu-
dents could get the TEM8 certificate earlier, which may put them at an advantage in
the job market. Third, the administration of TEM4 might be conducted at the end of
the second year, in June, so that it would not interfere with the routine teaching.
Fourth, it is advised that TEM be administered twice a year, and students should be
allowed to take and retake the test as they want, so they would be under less pressure
when they sit for the test.
7.2.1.3 Communication Between TEM Test Developers and TEM Users
As suggested by the research findings, it is essential to promote communication

between test developers and test users. Some of the findings seem to suggest that the
program administrators and teachers did not have adequate knowledge of the test:
their unfamiliarity with TEM marking criteria, the unintended use of TEM scores
and their attitude towards potential unintended use.
As is shown in Tables 5.2, 5.17 and 5.34, some of the program administrators,
teachers and students were not familiar with the marking criteria. However, marking
criteria usually embody a test’s purpose and focus, and tests, especially criterion-
referenced tests like TEM, can hardly achieve expected goals if their marking crite-
ria are unfamiliar to teachers and students. For instance, a scoring scheme combining
features of holistic and analytic marking is employed to grade TEM writing scripts.
The scheme focuses on language use as well as on ideas and arguments. Familiarity
with the scoring scheme can help teachers set course objectives and plan their les-
sons in line with the writing requirements in the Teaching Syllabus, because the
scoring scheme incorporates these writing requirements. In this respect, regular

communication between test constructors and test users, teachers in particular, is of
much importance.
Another evidence for the lack of communication between test developers and test
users is the participants’ attitude towards the unintended use and interpretation of
TEM scores. As is shown in Tables 5.7, 5.8, 5.22, 5.23 and 5.37, unintended use of
TEM score was not common, but about half of the participants endorsed such uses.
This indicates that some participants have not been adequately informed of the pur-
pose of TEM. Although it is difficult for test developers to decide how test results
are actually used, they are expected to “anticipate the common uses of certain kinds
of tests, and the potential consequences of such use, even if these uses are not
explicitly advocated” (Kane 2001, p. 338). Therefore, it is advisable that TEM test
developers identify possible uses and interpretations of TEM scores and provide
guidelines on how decisions about scores are to be made. Thus, communication
between test developers and test users could be promoted.
Various ways of communication can be employed, through test syllabus, mark-
ing criteria, test report, etc.
First, the TEM Test Syllabuses could be revised to provide more information on
test purpose, content, method, administration, so that test users may have a clear
idea about all aspects of the test.
Second, detailed marking criteria could be developed, together with samples, to
help teachers and students better understand requirements of the tasks that call for
human judgment in grading. As suggested by some teachers and program adminis-
trators, it is desirable to involve teachers from each TEM participating school in
TEM marking. Due to the concern of reliability and practicality, this practice can
hardly be followed. However, it is possible to offer TEM information packet every
year after TEM administration, including test papers, the standard answers, the
detailed explanation of the answers, and the marking criteria. Moreover, nationwide
TEM seminars could be held, so that teachers from various schools would have
equal access to the TEM practice.
Last, an informative test report is needed to provide adequate feedback to teach-
ers and students, as well as other stakeholders, so that appropriate decisions can be
made. As is described in Tables 5.3 and 5.18, the item of “quality of TEM test
report” received a fairly low satisfaction rate. A possible explanation is that the
information in the current TEM test report is quite limited, including only a band of
“excellent”, “good”, “pass” or “fail”, but with no composite scores. If TEM scores
were to be used for more important decisions on teaching and learning, it would be
necessary to revise the present TEM score reporting system to supply more infor-
mation. As suggested by the participants, the following information could be pro-
vided: (1) scores for each part (e.g. listening, reading); (2) verbal description of
language behavior for each score band which would render TEM scores more
meaningful; (3) correction rate for each item, especially that in the grammar and
vocabulary part, so that students’ strengths and weaknesses can be identified; (4)
average performance of each school in a certain section, which, the same as correc-
tion rate, may help identify students’ strengths and weaknesses; and (5) student
ranking in terms of composite score and section scores, so that students may have a
better idea of their strengths and weaknesses. In addition, some teachers suggested
reporting the composite score rather than the band of “excellent”, “good” or “pass”,
because it would encourage students to work for a better score, not just to pass the
test. Moreover, those students who narrowly failed the test, getting 59 or 58, would
feel less disappointed.
7.2.2 Implications for the Reform of Teaching and Learning
Several implications concerning teaching and learning for English majors arise
from the participants’ perception of TEM8 washback, their account of the teaching
and learning practice in their schools, as well as the classroom observation
findings.
7.2.2.1 Course Design
TEM may provide useful feedback information for course design because it is
advisable that course design at the school level not only follow the Teaching
Syllabus, but also cater to the needs of the students, which can be reflected in their
TEM performance. As shown in Table 5.11, most schools offered courses recom-
mended in the Teaching Syllabus, although TEM related courses were, compara-
tively speaking, more often introduced. Moreover, short-answer question findings
indicate that additional writing and listening courses were introduced in some
schools because the program administrators thought their students might need more
training in order to meet the required language proficiency standards measured in
TEM. The above findings show that TEM8 has provided some useful feedback to
course design—students’ weaknesses were identified so that adequate teaching
activities were arranged. Thus, TEM results could help program administrators in
designing courses that could be better targeted at students’ needs.
Second, TEM preparation courses should, by no means, interfere with regular
teaching. As is shown in Table 5.13, most schools offered TEM preparation, either
as lectures or as a special course, which is, to some extent, necessary, if we want to
keep our students well informed of the test, so that they can better demonstrate their
English proficiency in TEM. However, at the same time we need to be aware that
excessive preparation, especially test-taking technique training, might not only
deprive students of class time for other courses, but also arouse their negative atti-
tude towards the test. Moreover, classroom observation findings show that TEM
preparation courses were more teacher-dominated and Chinese was used more fre-
quently, so it was not beneficial to the development of language proficiency in the
long run. Thus, it is of importance that we maintain a balance between appropriate
test preparation and excessive test preparation.
7.2.2.2 English Teaching and Learning
The role of TEM in teaching should be facilitative rather than decisive. As is stated
in the TEM Test Syllabus, the purpose of TEM is to measure the language profi-
ciency of undergraduate English majors, to check whether they have met the
required level specified in the Teaching Syllabus. Therefore, TEM may provide
important feedback on the strengths and weaknesses of the students, so that better
targeted teaching can be organized. In this sense, TEM may influence teaching, but
its influence should be limited. It would put the cart before the horse if teaching
were to be organized with passing TEM as the aim. As is stated by Qi (2004) in her
study on the washback of NMET, if the aim of teaching and learning would be to
achieve higher scores rather than to develop the desired ability, high-stakes tests
were not likely to achieve their intended washback effects. Therefore, it is advisable
that TEM exert influence on teaching in a manner that is both discreet and
positive.
Next, the washback of TEM on classroom teaching was mainly reflected in the
following three aspects: (1) the teaching content was more TEM-oriented, covering
various sections in TEM; (2) the classes were more teacher-centered, with teacher
lecture being the major form of instruction; and (3) Chinese was more often used in
class, both for teachers and for students. In this regard, the immediate aim of TEM
preparation classes was to familiarize students with TEM, rather than to improve
their English proficiency. Therefore, the teaching content and method of TEM prep-
aration classes should not be set up as a model for non-preparation classes. However,
as indicated by classroom observation findings, non-preparation classes were also
teacher-centered, with teacher lecture covering about 70% of the class time (Table
6.12); meanwhile, there was no significant difference in the features of teacher-
student interaction between TEM preparation and non-preparation classes, both of
which were unauthentic. In this view, efforts should be made to improve English
classroom teaching, to enliven the role of students in the classes so that the class
would be more student-centered. Moreover, teachers should be encouraged to
improve their communicative skills and their questioning techniques, so that stu-
dents will get more involved in classroom activities. As a result, the classroom
atmosphere would be more active and lively.
Finally, efforts should be made to arouse students’ intrinsic interest in English
learning. According to questionnaire surveys, around one third of the students con-
sidered passing TEM one of their learning aims (Table 5.51). In addition, some
teachers expressed their worries that students’ motivation for English learning
would descend when the TEM season was over. In this sense, efforts should be
made to enliven the English classes and to improve its practicality, so as to maintain
students’ interest in English learning. It would be desirable if program administra-
tors design the courses catering to the students’ needs, teachers stress the role of
students in class and involve them in classroom learning, and students develop an
intrinsic interest in English so that their enthusiasm for English learning can be
maintained.
Appendices
Appendix I Survey Questionnaires
Questionnaire for Experts (EQ)

https://doi.org/10.1007/978-981-13-1963-1
188 Appendices
Appendices 189
190 Appendices
Table: Test items in TEM4 and TEM8
Test Task Format No. of questions % Time

TEM4 Dictation Subjective 1 15 15
Listening Objective 30 15 15
Cloze Objective 20 10 15
Grammar & vocabulary Objective 30 15 15
Reading Objective 20 20 25
Writing
1. Composition Subjective 1 15 35
2. Note-writing Subjective 1 10 10
TEM8 Listening
1. Lecture Subjective 10 10 25
2. News Objective 10 10
Reading Objective 20 20 30
General knowledge Objective 10 10 10
Proofreading Objective 10 10 15
Translation
1. C-E Subjective 1 10 60
2. E-C Subjective 1 10
Writing Subjective 1 20 45
Note: %: weighting, with the overall percentage being 100; Time is given in minutes; C-E: Chinese
to English; E-C: English to Chinese
Appendices 191
192 Appendices
Appendices 193
Questionnaires for TEM4-Related Program Administrators (AQ-4)

194 Appendices
Appendices 195
196 Appendices
Questionnaire for TEM8-Related Program Administrators (AQ-8)

Appendices 197
198 Appendices
Appendices 199
Questionnaire for TEM4-Related Teachers (TQ-4)

200 Appendices
Appendices 201
202 Appendices
Appendices 203
Questionnaire for TEM8-Related Teachers (TQ-8)

204 Appendices
Appendices 205
206 Appendices
Appendices 207
Questionnaire for TEM4-Related Students (SQ-4)

208 Appendices
Appendices 209
210 Appendices
Appendices 211
Questionnaire for TEM8-Related Students (SQ-8)

212 Appendices
Appendices 213
214 Appendices
Appendices 215
Appendix II Classroom Observations
Classroom Observation Scheme
Percentage of time devoted (minimal 5%)
No. Classroom observation coding scheme

1 Content General knowledge
Reading
Listening
Writing
Speaking
Translation
Integrated skills
Grammar & vocabulary
TEM-listening
TEM-dictation
TEM-mini-lecture
TEM-gra &voc MCQ
TEM-reading
TEM-cloze
TEM-writing
TEM-general knowledge
TEM-proofreading
TEM-translation
2 Organization Teacher lecture
Student presentation
Individual seat work
Pair work
Group work
Teacher-fronted interaction
Teacher-student interaction
3 Materials Textbook
TEM-related material
Materials encouraging language use
Other materials
5= extremely frequent, 4= frequent, 3= sometimes, 2= occasionally, 1=seldom
216 Appendices
No. Classroom coding scheme 5 4 3 2 1

4 Classroom language Teacher’s use of English
Students’ use of English
5 Teacher’s communicative features Teacher’s asking for genuine information
Teacher’s giving genuine information
Length of teacher’s questions
Length of teacher’s answers
6 Students’ communicative features Students’ asking for genuine information
Students’ giving genuine information
Length of students’ questions
Length of students’ answers
7 Teacher’s feedback Teacher’s positive feedback
Teacher’s negative feedback
Teacher’s correction on meaning
Teacher’s correction on form
8 Atmosphere Laughter (frequency)
Transcripts of the Two Teachers’ Lessons
Note: T=teacher, S=students, Ss=students
Teacher A (Fundamental English, Grade 2)
T: Ok, let’s begin our class. Today, en Unit 7, The Chaser. First we’ll finish the
words and phrases because last time there were three groups who haven’t done this
presentation. So first we will finish the words and phrases. Then we will look at the
text itself. So let me see, the first group, Group One. Ok, I forgot to mention, today,
Ms Xu, is sitting at the behind of our class because she is going to videotape our
class for her research, just for research purpose. Ok, Group One.
(Students explain the meaning of new words one by one.)
S: 这段的第一个单词…
T: OK, can you make the words larger?
S: Oh, 第一个单词出现在第一段中,kitten, 它的意思是小猫,小动物,然后有一
只很可爱的小猫;作为不及物是产小猫。
第二个单词是creaky, 作为及物动词来说是使咯吱咯吱响;作为名词是 creak,
是指发出咯吱咯吱的声音;作为形容词的话它也是首先是咯吱咯吱作响的,然
后第二个意思是腐朽的,破烂的,要坍塌的,在文中creaky stairs就是非常不牢的,
发出咯吱咯吱响的楼梯。
Appendices 217
第三个词是peer,peer作为不及物动词是凝视、细看、盯着看。例句是She
peered at the neighbours from behind the curtain. 还有可以作为隐约出现,微现 ,
比如形容月亮,就是The moon peered through the clouds.作为名词来说peer可以
表示同辈、同事、同龄人、同等地位的人,就是和你有相同职位的人,或者是
贵族和相匹敌的人,在以前的古语里也用作同伴和伙伴。作为及物动词它也
是过去的用法是与什么相等,还是有相同的意思;作为不及物动词是同等、比
得上,用法就是peer with sb. Your brother can peer with him.
然后第四个单词是dim,作为形容词来说它可以表示暗淡的、昏暗的、模糊
的、看不清的、悲观的、怀疑的,就是表示比较消极阴暗的意思。然后作为
及物动词也是使变模糊、使变暗淡、使变朦胧。在文中是on the dim hallway
也是表示昏暗的意思。第二个意思是相比之下黯然失色,比什么更不明显,更
不鲜亮。The light of a candle is dimmed by electric light. 作为不及物动词它可
以表示变暗的、变模糊。When the music started, the lights dimmed. 还是表示
相同的意思,就是不同的用法。
第五个单词是obscure,作为形容词它的意思是不清楚的、不分明的,表示它
的意思表达不清,或是作为语言文体,可以表示含糊的、晦涩的、不清晰的。
第三个意思也可以表示不显著的、不引人注目的。然后第四个是无名的、微
贱的。当然也可以表示偏僻的、隐藏的。就是他们这些意思都相互联系的,
可以意会,都差不多。然后作为及物动词来说,它可以表示掩盖、遮盖。The
moon was obscured by clouds. The aims were obscured by means.都是表示被遮
盖、隐藏起来,被掩盖起来的。然后第二个意思就是使含混,使转弯抹角、使
难以理解。The item he intentionally obscured would make several interpretations
possible. 就是也是表示使朦胧、使含混的意思。然后第三个意思是使黑暗、
使失色、使不显著,都是作为它形容词用法的动词。它的副词是obscurely,是
费解地、晦涩的,隐蔽的。名词可以是obscurity, obscureness, 都是很常见的名
词变形。
然后第六个单词是buff, 它可以表示浅黄色的、暗黄色的、米色的、橘黄
色的,也可以表示软皮,作为及物动词它可以表示由软皮摩擦的,缓冲、擦亮、
抛光某物。在文中这个应该就是一个形容颜色的。
第七个是大家以前学过的acquaintance。作为名词它可以表示相识、认
识、熟悉,一般是用作acquaintance with someone,或是熟悉、了解、知
道,acquaintance with sb. 也可以表示相识的人们、熟人, 最重要的是Make
acquaintance with sb. 它的动词是acquaint,使了解、使知道、使熟悉, 告知、通
知可以用Acquaint with:You must acquaint with your new duties. 它的名词是
acquaintanceship, 这是表示认识、相识、交往关系,是一个抽象的表示。
第八个单词是deal,deal作为不及物动词可以表示论述、谈及、讨论、谈
论、谈到,通常与with或in连用,比如说Science deals with facts. 科学是论述客观
事实的。第二个可以表示惩处、采取行动,也是通常与with 或in连用,Law
courts must deal with lawbreakers. 第三个是表示平等待人,也是与with连用。第
四个就是我们最常用的to deal with a problem. 第五个就是交易、经营、与什
么做生意,其实也是处理问题的另一种引申的意思 to deal with a firm.。然后作
为及物动词,很常见的是分配,分发。然后在游戏中打牌就是发牌。第三个就
是经营的意思 to deal grapes to a supermarket.
第九个单词就是laxative, 它是表示泻药,然后作为形容词就是通便的。
单词就是这些,谢谢!
218 Appendices
S: (from the second group):第一个单词是charge,它的意思就是ask sb. to pay

money for sth. we’re selling to them or doing for them. 意思就是向某人要钱。作
为不及物动词的用法是Charge for sth.这个就不解释了。然后作为及物动词的
话就是Charge sb. sth for sth.,然后 be charged at sth. 两个例句。第二个意思是To
arrange for payment be made later. 是的及物动词,它是Charge sth to sb. or sth.的
意思。
第二个单词是Costly: 第一个意思是costing a lot of money,examples: a costly
business process exercise or costly skills/ perfume;第二个意思是Costing a lot of
problems.
第三个单词是Oblige,意思是make it necessary for sb. to do sth., 经常用作被
动,中文意思是迫使,然后有一个比较有用的词组是feel or be obliged to do sth.
它的动词形式是obligate, 形容词是Obligatory. 第二个意思是do sb a favor, fulfill
the wishes of them, be willing to oblige. 形容词是Obliging,就是愿意的,willing to
help的意思。
第四个是detachment, the state of not reacting or being involved in sth. in an
emotional way. 就是She studied the blood in detachment. 动词是detach, 用法是
detach oneself from sth.,detach himself from their conflicts就是不让他自己牵涉
到他们的争吵中。形容词是detached, feel detached from sth.
OK, impulse: a sudden strong feeling that you must do sth.这是作为名词的用
法。feel an impulse to do sth., or control or resist an impulse. 然后用法是on
impulse, on an impulse, act on impulse就是出于冲动做某事。
第六个单词是bountifully, 它的形容词是bountiful.
哎呀,前面这个impulse 还有个意思就是sexual impulse/ creative impulse就是
创造性、性冲动,它的名词是impulsion, 形容词是impulsive.
第六个单词是bountifully, 第一个意思是generous, 近义词是bounteous, 第二
个意思是available in large quantities 这里它的近义词是abundant.
S: (from the third group): 第一个词是divorce,就是离婚的意思,它可以做动词或
名词,动词,都是离婚的意思,它的名词是divorcement.
T: 字太小,大一点。
S: 第二个词是indulge,在第三十九段的第三行,give free rein to, 这个indulge in
的意思就是沉溺于,相当于addict,成瘾,还有一个absorb; 另外一个意思就give
satisfaction to,就是满足的意思; 第三个意思是enjoy to excess, 就是过度地沉溺
于,有点贬义的意思。比如说She’s indulged in ice- cream意思就是她过度食用
冰激淋。第四个意思是treat with excessive indulgence, 其实这几个意思都是相
似的。就是沉溺于、享受的意思。indulge 的形容词indulgent. 名词
indulgency,indulgency. 还有一个形式就是indulgence.
第三个是save up for, 为什么储蓄、贮存。反正这个意思很简单,大家记一
下就行了。
Appendices 219
第四个是potion, 在文中它是在第四十段love potion, 就是春药的意思。这个

potion的意思是magical or poisonous b everage。它可以做药来讲,但通常是指药
性不太好的药。
然后是phial, 第四十一段的第二行,红色字体,它是a small bottle that contains
a drop, 就是中学时候做实验的那个药剂瓶, 还有就是药店的那个药剂瓶。
下面是oblige, oblige它作动词经常用被动passive, force or compel sb. to do
sth. by law or because it’s a duty. 这儿有个例句,就是 Netherlands and Switzerland
oblige their residents to buy insurance. 就是让私人一定要买保险. 下面是,第二个
意思是provide service or favor for someone. 是帮助, oblige you. Oblige的形容词
有这三种: obliged, obliging, obligated.
下面最后一个是better off, 第四十三段的第二行,in a more fortunate or pros-
perous condition,就是富裕、富足的意思嗯就这样。
T: Very good! You have made a quite thorough introduction of the new words in
our text, but still I have something more to add.
Are the words large enough for you to see? Yes? OK.
Ok, just ignore foster. Peer, this time, some group mentioned the word peer. Peer
means look very carefully at something, especially because you are having diffi-
culty seeing it. Peer means to… I mean, please don’t confuse peer with another
word peek. They are similar in spelling, while peek means look at something quickly
and secretly, look at something quickly and secretly, especially through a hole or
other small opening, especially through a hole or other small opening. For example,
for example, now she peeked to see if he was noticing her. Here I didn’t mention this
word peek in the word list, but I suddenly think of this word peek, so peek means to
look at sth. quickly or secretly through a small hole or other small opening. Please
pay attention to it.
Then, the next word is obscure. Obscure has several different meanings, which
are clearly used. The first meaning, not clear and the second not well known, and
usually not very important. For example, an obscure poet. Obscure can also mean
difficult to understand. For example, obscure legal phrases. means it is hard to
understand. Obscure can also be used as a verb, as a transitive verb, “to hide, to
make it difficult to see or understand”. And its noun obscurity. This one is used most
often.
And the next word I didn’t mention is stock. Stock is, let me see, in paragraph 5,
the first sentence. “My dear sir,” replied the old man, “my stock in trade is not very
large…” Stock, pay attention, can be used both as a countable noun and an uncount-
able noun., refers to a supply of a particular type of thing that a shop has available
for sale. In Chinese, it means商店里所出售的商品,存货。For example, we have a
huge stock of quality carpets on sale. Here it means?Can you translate it into
Chinese? Here quality carpets means very good carpets, carpets of high quality.我
们正在出售大量的质量上乘的地毯。And two phrases, out of stock and the
opposite is in stock, means unavailable or available in a particular shop. For exam-
ple, I’m sorry that swimming suit is completely out of stock in your size. So, it
220 Appendices
means whether you can buy your sizes of swimming suit or not in this shop? No,
yeah.
Ok, next, perceive, because in our text, imperceptible, you have already men-
tioned it.
Beverage, a hot or cold drink. And I remember someone mentioned beverage,
refers to a drink, which actually means any drink that is not water. I remember you
mentioned that.
Or a phrase that or it’s or that’s just as well. Do you know this phrase? This
phrase is in paragraph 11, the first sentence. “Probably it’s just as well,” said the old
man. Because Alan says, I want nothing of that sort, so the old man answers
“Probably it’s just as well”. Do you know this phrase? This is often used in spoken
English. It is used to say that it is fortunate that something is true or happen, because
otherwise there would be problems. 正好,幸好,刚好。“Probably it is just as well”
means 也许这样更好,你不想买,这样更好。For example, two examples. “It just
as well that we have prepared everything beforehand.” It’s very fortunate that we
have prepared everything in advance. Then the next, a dialogue: A says, “We’re too
late to see the film.” B answers, “Just as well, I hear it isn’t very good.” Just as well
没关系,没什么。
Ok, oblige, this is very formal. You have already mentioned. Well, please pay
attention to the third and fourth one. And here are two expressions we use quite
often, especially on formal occasions.” I’d be obliged if…” It is a polite way to
make a request. “I’d be obliged if…” For example, I’d be obliged if you treat this
letter as strictly confidential. It means…? I will be very grateful if you don’t tell this
letter to anyone else. I’ll be very grateful. Another, I’m much obliged to you. This is
a bit old fashioned. It is used to thank someone very politely.
Ok, that substitute. Last time you’ve already mentioned the word substitute. Here
I would like to emphasize this phrase. Substitute A for B. Pay attention; let’s look at
this example first. The recipe says that you can substitute honey for sugar. Its
Chinese translation 用蜂蜜蜜代替食糖。So you should pay attention to this honey
and sugar, where to put something to be replaced, and where to put this thing, and
this thing which is used to replace something else. So substitute honey for the sugar
means to use honey to replace the sugar. Do not confuse this…this phrase.
Then scorn, you may have already known, scorn means contempt, scorn for.
Rapture, this word is quite literary. Rapture, it means great excitement, happi-
ness. For example, the boys gazed up at him in rapture. And the phrase be in rap-
tures or go into raptures, especially used in British English, is very formal, means to
express or feel great pleasure and happiness about something. Be in raptures or go
into raptures over or about or at sth. or sb. And its adjective, rapturous, very happy
and excited.
Overwhelm, here in our text, if somebody is overwhelmed by an emotion. They
feel it so strongly that they cannot think clearly. We have “overwhelmed” and “over-
whelming”, two adjectives.
Well, another phrase, by any chance, you didn’t mention. By any chance is in
Paragraph 35, the second line. And, by the way, since there are always sirens, if by
any chance you should, later on, slip a little, you need not worry.” Used in spoken
Appendices 221
English, it is used to ask politely if sth. is true. 万一,也许,perhaps. For example, if

by any chance somebody comes to see me, ask him to leave a message.万一有人来
看我,请让他给我留个口信吧。By any chance. Another, “Are you Mrs. Brown, by
any chance?” Please don’t confuse by any chance with by chance. By chance means
the way something happens without being planned or caused by people or any issue.
For example, I bumped into him quite by chance in Oxford Street. So by chance, or
by accident. And in this adverbial clause, another word, “Here, if by any chance,
you should slip a little”.
Here, should, this is a modal auxiliary. It is used to talk about sth that may pos-
sibly happen or be true. Pay attention, “should” often appears in clause, an adverbial
clause, 用在条件状语从句中表示对将来的一种推测,猜测的语气,也许。For
example, “Naturally, he was nervous in case anything should go wrong” means
anything may, or may probably go wrong. Another example, “What if I should fall
sick and not be able to work.” Here “what if”. We can regard it as a phrase; “what if
something should happen” just means “what should I do if something should
happen”.
There is still a word in paragraph 37, the last line. Let’s read the last sentence.
“And, of course, she will never give you the least, the very least, grounds for uneasi-
ness.” Do you know the word ground used in this sense? Yes or no? No? Ok. Have
you found the word in the text? It’s in paragraph 37. Here grounds, pay attention,
first, it’s in plural form. Grounds, it refers to good reason for doing sth, or for saying
sth. We often say grounds for sth. or grounds for doing sth. For example, there’re
strong grounds for believing his statement. Next, have grounds to do sth. “Do the
police have grounds to arrest you?” Can you understand the sentences? Yes, Ok.
Next, on what kind of grounds, a phrase, on moral, or legal, or medical etc. grounds.
“The proposal was rejected on environmental grounds.” The proposal hasn’t consid-
ered the environmental matters. Maybe it doesn’t provide any measures to protect
the environment, so it was rejected on environmental grounds. Or on grounds of, or
on the grounds of sth. “Flying was ruled out on grounds of cost.” Because it is too
expensive, so they decided not to travel by air, travel by plane. So flying was ruled
out. On grounds that, it can also be followed by a clause, for example. “We oppose
the view on grounds that it discriminates against women.” On the grounds that just
means because.
OK, indulge, you already mentioned.
S: 老师,grounds是ground的复数,还是grounds就是一个词?
T: It means when ground is used in plural form, grounds, it can mean good reason.
when it means reason, we usually use grounds, not ground. So you mean, it whether?
S: 所以这就是单复数形式?
T: OK, anyway I’ll E-mail this list to you. So you can read them more carefully
after class. Any questions?
222 Appendices
Ok, then we’ll look at the text, The Chaser. This time we will deal with it in a
different way. In the past, I ask you to answer my questions, just prepare the text
comprehension part before the class.
Today I will give you 6 questions, just in class. The 6 questions are here. Can you
see them clearly? Altogether 6 questions. We have 6 groups here, and here I have 6
pieces of paper, with a number on each paper. So each group will pick up one piece,
and the number in it means the number of the question you will answer. I want you
to have a discussion, try to answer my question. Then after this discussion. I’m not
sure how much time is enough for you. So let’s just go and see. Then after the dis-
cussion, each group will have to answer these questions. Each group can send a
representative, to come here and get a paper?
OK, first, let’s just go through the question list. Actually these questions cover
the whole story.
So the first question, “What does the title The Chaser refer to?”
Two, “Why do you think the old man told Austen about the life cleaner before
selling the love potion?” Because he talked a lot about his life cleaner. So why does
he mentioned or introduced life cleaner in detail before he sold him the love potion.
The third questions, “Why is the love potion sold so cheap and the life cleaner so
expensive?” How does it relate to the theme of the story? Or put it in another way,
it means do they have any symbolic meanings? Love potion sold so cheap while the
life cleaner so expensive.
The fourth question, “What kind of love can the love potion produce in a per-
son?” Please pay attention, attention please, I hope you can describe the effect of
love potion in your own words. Do not simply read the part in the text. Try to sum-
marize it and describe it in your own words. And then what do you think of this kind
of love? So make your own comments on it.
And the fifth one, “How do the young man and the old man differ in their char-
acters and their outlook of love?” Outlook of love means, how to put it, I mean, what
do they think of love. Maybe translate into Chinese, means 爱情观。Their attitude
towards love. And how does the author makes this difference clear to the reader?
Here how does the author make this difference clear to the reader refers to what
technique that the author uses to make this difference clear to the reader.
The last question, six, “It is said that the story is a satire, do you agree? Why or
why not?” So, I mean, the six questions are all essay questions, which means it can’t
be answered by one or two sentences. You need to think about them over.
So you can form groups and have a discussion. Or whether you need to read
through the text carefully before you start your discussion. I mean think about it.
Do you have any questions? If you are still not clear about how to answer the
question, you can just ask.
(The students discuss in groups and the teacher walks from group to group to talk
to the students. 25 minutes)
T: OK, have you all finished the discussion? Are you ready to answer the ques-
tions? Yes or no?
Ss: No.
Appendices 223
T: (wait a while) Ok, so can we answer the questions? Just tell me, are you ready?
Ss: Yes.
T: Ok, and the rest of you? Are you ready?
Actually we can still discuss together when we are trying to answer these ques-
tions. And, I mean, after each group answers these questions, and the other group, if
you have different answers, if you want to argue, you can also talk.
Ok, the first question, what does the title The Chaser refer to? Maybe our first…I
mean, the immediate reaction, means chaser refers to a person, “er” usually refers
to a person. Ok, let’s see. Ok, that group, you may come up here, so we can hear
better.
S: First of all, I think the chaser refers to Alan Austen, who is deeply in love with
the beautiful girl. And the second, I think the chaser means the life cleaner because
the chaser in the vocabulary means a milder drink taken after a strong drink of
liquor. So if the love potion is the strong drink of liquor, then I think Alan Austen
needs the life cleaner to clean the love potion.
T: Do you all agree? So, first the chaser refers to Alan Austen.
S: Because he is in love with a beautiful girl.
T: So actually he is in pursuit of a girl. In that sense he is a chaser. While the chaser,
according to our word list, so please read on page 118. A chaser can mean a milder
drink taken after a strong drink of liquor. So it refers to some kind of drink taken
after something else is taken. So here the chaser refers the life cleaner.
Do you agree? Because the old man believes Alan some day will come back
again for this life cleaner, so the chaser was the life cleaner. Do you agree? Can you
think of any other indication of a chaser? No, OK, thank you. Yeah, most probably,
you get the chaser as the life cleaner.
T: Now, the second question “Why do you think the old man told Austen about the
life cleaner before selling the love potion?” Have you all got the answer?
S: I’ll divide the question into two parts. The first part is why do we think the old
man told Austen about the life cleaner? First, we think the things done firstly will
not get an agreeable result. Maybe after the girl take the love potion mentioned in
the chapter, maybe she will change into another different person.
At that time, for myself opinion, I don’t think he will like her as before. As my
partner’s opinion, it’s human’s nature. If you really get sth., that you are eager for a
long time, you won’t consider it as valuable as before. So maybe the young man
wouldn’t like the girl as before.
The second question is why do the old man told him before selling him the love
potion. As the ordinary transcript? 童话故事剧本?
224 Appendices
T: Script.
S: As the script of most of the fairy tales or novels. There is the magician gives the
man some medicines that have some very bad results? 副作用?
T: Side effects.
S: Yeah, very bad side effects. But they often tell the man after giving them the
medicine. So I think the old man told him before he sold him the love potion,
because he wants him to pay more attention. We think the true purpose is to get the
5000 dollars.
T: What do you think? Actually, the first part of your answer is not very clear. You
said, it’s natural that people will not cherish sth. or someone after they have got it or
the person.
S: Yes, that’s my partner’s opinion, because they think the chaser means the young
man. And after he get the girl’s love, he won’t be a chaser again.
T: How would he?
S: That’s the reason.
T: Then how does it relate to the love potion and the life cleaner? The question is
why the old man told the young man about the life cleaner before selling the love
potion.
S: She means if the girl takes the love potion, the young man got the love and won’t
cherish her again, and want to get rid of the love. And he’ll need the life cleaner to
get rid of the girl.
T: And before means he really wants the ordinary ones to sell the life cleaner.
S: He wants to pay more attention to the life cleaner.
T: Do you all agree, the rest of you? Yes, it’s because his true purpose is to sell the
life cleaner. Actually, we can infer from the story, that Alan is not the first person to
come to the old man. There have been a lot of young people like Alan who are fer-
vently in love and try to get the love potion. So this old man has already seen a lot
of such young people. He knows from his experience that some day Alan will come
back for sure. That’s why he mentioned this life cleaner first, because he’s sure that
some day Alan will come back for it. This is his real purpose.
T: Actually the third question is quite closely related to the second one. “Why is
the love potion sold so cheap? why the life cleaner so expensive?” How does it
relate to the theme of the story or whether they have any symbolic meaning?
S: The first question is from paragraph 39 to 43. It implies that love is far from
being precious and desirable. It is easy for them to fall in love but it is hard for them
to escape it. He should regret some day he would have to pay a much higher price
to put himself out of it. So for the customers, the life cleaner is more important than
the love potion. And the second answer is to the young man, the love means the love
Appendices 225
of a lover. And I think, to anyone it is easy to fall in love, but hard to save it. And we
should cherish it and always pay much attention to it.
T: Do you got it? Do you understand her? Yes, Ok, thank you.
The love potion is sold so cheap because according to the old man in our story,
who says, because young man who need it are seldom rich. So if he does not sell it
so cheap, they will not come back later for something more expensive. So this is a
sales technique.
Or if you relate it to the theme. Any of you have something to say?
Liang has already given us a quite good version. Why one is so cheap while the
other so expensive if relate to the theme?
Here the young man buys the love potion, it means at the same time he gets the
love from the girl. So in a sense, he is trying to buy love. If love can be bought, it’s
cheap. Love got in this way is not so dear. It is cheap, because it can be bought. It’s
not priceless. Why life cleaner so expensive? Because you have the strong idea that
in the future you have to pay a heavy price. That’s why life cleaner is so expensive.
It is related to the theme, its symbolic meaning.
Four, “what kind of love can love potion give a person?”
S: For question 4, if the girl takes this love potion, she would regard the man as the
whole thing of her life. She will care of this man every time, every hour, even every
minute. She may be jealousy of other women to have some relation with her boy-
friend. And even if the man have some affairs with other women, she will be hurted,
but she will forgive him. They will not be divorce. That’s the effect brought by this
love potion. We think this kind of love may be not look rightly like love. You know
love may need more space and maybe trust. And they get too close, the girl care too
much of the man. He would finally be bored of this girl. And they would finally get
departed.
T: So what a coincidence that this question is got by the three young men. Because
in our story Alan is also a young man, and I’m curious about their comment on this
kind of love. So you don’t like this kind of love, this kind of girl.
OK, actually they have made quite thorough summary. However, I don’t think
you have included everything. It’s already time, we’ll continue this question to fin-
ish them next time. And the rest of you please will you think about it? Would you
like a love potion?
Please read the part from paragraph 17 to 37, and next time I will ask some of
you “do you have anything to add about the effect of the love potion”.
Teacher F (TEM8 Preparation Course, Grade 4)
T: So everybody, please… Well, we just start our lecture right now. And would you
please keep quiet? And I hope that the students sitting in the back row, would you
please move forward a little bit, so that, yeah, so that I can see your faces. And last
226 Appendices
time, we said from this month we are going to say a little bit about… we’ll just have
lectures on linguistics and so-called general knowledge. So general knowledge will
also be divided into three parts. First of all, we’ll just address the issue of linguistics.
Afterwards, we need to… we also need to say something about, let me see, the cul-
tural part… I mean the general knowledge of the English-speaking countries.
Afterwards, we also need to learn English literature, but of course, we will not
review all of the concepts in literature, but at least we need to know some of the
important concepts and notions. And today, we should say, I’ll first address linguis-
tics. Of course you can take notes, and if you really don’t want to take any notes, it
doesn’t matter. I’ll just upload my ppt. I have already uploaded the key to error cor-
rection and proofreading, and I wonder if anybody checked the key. Did anybody
check that?
Ss: No.
T: No? I didn’t change my mail box, so we will still use the public mail-box we
used last semester. Do you still remember that account and password?
Ss: (Shaking heads.)
T: Ah, nobody remembers that. The account and the password, I hope that you can
check the notes for last semester. And I have already uploaded the keys to that
proofreading part and please check it. And today we will start with the linguistics
part. And tomorrow we will just shift back to that classroom, but that classroom is
not equipped with mass media, so tomorrow we will just say, go on with our work
sheets. And somebody still didn’t get the work sheet, I hope that you can contact
your monitor. Do you have for some extra copies, XXX (the name of a student) ?
Have you got any extra copies? Oh yeah?
S: No.
T: No? So, see if I can get you one.
S: I’ll see to it.
T: And…OK, fine.
And you’d better turn to your work sheet B; in work sheets B you will find three
parts. The first part is listening comprehension. So Listening Comprehension will
be postponed to November.
So this month we will be focused on the part of general knowledge and transla-
tion. So first of all, tomorrow… we Will check two translations, and please do prac-
tice test 1 and 2. OK? In your work sheet A. In your work sheet A, you can see there
are eight tests, altogether eight tests. So I hope that you can finish test 1 and test 2.
You will just leave Listening Comprehension part, and please fulfill the second part
general knowledge and also the translation. But make Chinese-English translation,
Practice 1 and Chinese-English translation. 《文化不限于语言那么简单》, this
one. And Practice 3, so tomorrow I will just check these two practice tests,
OK. Practice 1 and Practice 3. Are you clear? Practice 1 and Practice 3. I mean
Appendices 227
especially the Chinese-English translation part. I need to check it. So I hope that you
can fulfill that two translations today.
Now we will just go on with our lecture.
It’s a pretty long PPT, a pretty long file… Of course, I don’t think I have time to,
you know, elaborate on each slide, because this file actually covers all the contents,
all the contents in linguistics. And these were the reference books and I guess you
are familiar with these two books, right? 何兆熊’s book and 胡壮麟’s book.
Actually… you did linguistics… Ah sorry, sorry, well…(The teacher forgot to turn
on the head projector.)
Well, which one is the textbook for your linguistics course? Which one is your
textbook? 《新编简明英语语言学教程》,是不是?Ah, so that. That one is your
textbook.
For the general knowledge, actually I scanned some of the test items from 2010
to 2012, and some of the items are really difficult, so I hope that if… Of course, if
time permits, you will still go over all of the basic concepts in the realm of linguis-
tics. And here today I’ll go through the basic concepts in the sub branches, such as
semantics and pragmatics and well and… for TEM 8. Of course we still need to be
very patient. It takes two or three minutes.
Thank God, it works. Pay attention to the reference books, and here, three refer-
ence books, but our focus is still this one 《简明语言学教程》。 And it should be
your textbook right? It is your course textbook. So we will go over some basic
concepts.
And first of all, what is language, because some of the test items also address the
concept of language. So what are the basic concepts of language? Do you know
anything about it? What are basic concepts and what are the features of language?
Ss: (murmuring…)
T: Features of language?
Ss: Shaking heads.
T: No idea? You’ve heard about it?
So language can be generally defined as a system of arbitrary vocal symbols used
for human communication. Here some of the key words, so which one is the first key
word? Arbitrary. And please remember all these words. Sometimes they will test
you what are the basic features of language. So arbitrary is the first feature. What is
the meaning of arbitrary?
Ss: 任意的。
T: And then vocal symbols, this is easy to comprehend. So usually language is
used orally, right? For instance, I am now delivering a lecture for the information
conveyed orally. Let us say the vocal system, right? It is the oral system. The vocal
system for human communication. So language is for communication. This is the
reason why now we always say conduct… a lot of educators just promoted the…
communicative approach of language teaching, right? So language is for the pur-
228 Appendices
pose of communication. Language is a system. It is systematic. It is rule-governed.

This is the reason why we need to learn grammar. And their elements should be
arranged according to certain rules. It cannot be combined at will. So are you famil-
iar with these concepts? Are you familiar with them? Arbitrary system is arranged
according to certain rules like they are also some of the distinctive features. Some
differences between Chinese language and English language. And English language
is considered more systematic especially in syntax. I mean the sentence patterns
should be arranged more systematically. But Chinese is different, actually espe-
cially in the syntactical level. The Chinese language is more flexible. So usually you
would see a lot of Chinese sentences with more than one maybe more than two
predicates. But in English language is really not permitted. If there are two predi-
cates, you need to use compound sentences. You need to use connectives or other
system. So they are different.
Language should be a system. It is systematic.
And language is arbitrary. So arbitrary… like this one, is its definition. So no
intrinsic connection between the word and the thing it denotes, e.g. “pen” by any
other name is the thing we use to write with. So you can use it as a pen and in
Chinese it’s 钢笔。 So we can use the different words to represent one thing.
Arbitrary 本来什么意思啊?随意的,任意的。So it is an arbitrary system to say no
intrinsic, no innate connection between the noted objects and also the reference. You
can use different types of references. OK. Now you know that. So that means no
basic… no relationship between the words and the referent and to the objects. It is
denoted. No intrinsic relationship… actually no relationship by nature between the
objects and the word. And are you familiar with that semantic triangle? 就是“语义
三角”。语义三角的话,我们里面有…… So this is a…
It is a very important notion in semantics. You call that semantic triangle. Because
whenever we refer to something, it should be the object. E.g. the chalk. It is the
object. And then I used the word chalk. And that is a referenc. That is a word or
phrase I use in that context.
And then the first thing that pops up into your mind should be the reference. The
meaning of the word. Right? I refer to the object. That is the chalk. But when I utter
the word chalk, the first thing pops up in your mind is the meaning of chalk. 啊,粉
笔。Of course your imagination is the picture of that chalk and piece of chalk. So
those are the basic concepts that form semantic triangle. Semantic triangle goes the
notions such as the reference, the reference is actually the thinking. The meaning of
the word in your mind. And then the word. So there’s no direct connection between
the word and the objects. So the first thing that appears in your mind is definitely the
meaning of the word, your thought. And then afterwards, you would associate the
meaning of the word and the object. So the object is called referent. And pay atten-
tion to these basic notions.
So Language is symbolic in nature, so words are always associated with objects,
I’ve already talked about it. Actions ideas by convention. “A rose by any other name
would smell as sweet” so you can call it rose or you can give it a different name. It
all depends. Right? You can call it 玫瑰花,or you can call it in some other lan-
Appendices 229
guages. I mean to use a different word, but there must be some association between
the word and the object. So that is what it refers to. So it should be symbolic.
And it is primarily vocal. So the primary medium is sounds for all languages;
writing system came much later than spoken form. And we all know that in Chinese
history, can be date back to 3300BC. Why? Because we have a long history of writ-
ten records. And you know the first records is called oracles. We carved something
on those turtle shells and bones. So written language can be dated back to 3300BC. So
this is why people, those archeologists, say China has a history of more than 3000
years. Well, of course, there will be more evidence to show that China may have a
longer history of written language. But anyway, vocal system usually comes first. It
is prior to the written system. So language is primarily vocal.
And language is human-specific. So it is different from the human communica-
tion systems than other forms of life possess, e.g. bird songs, bee dance, animal
cries. So it designs and defines human language. So these are the keywords. And I
hope that you can memorize them…Ah I don’t mean that you memorize them by
rote, but at least be familiar with those terms after the lecture… What are the
ARBITRARINESS and PRODUCTIVITY, CREATIVITY. So that we can learn
English language, because you always…say learn something by heart, and always
need to create sentences. Also creativity is also one of the basic features.
And Duality, so what is the meaning of duality? So let’s have a look. I’ll not go
on with this one. SO productivity. So human languages are always productive and
creative. So here are two examples…A gibbon call system… A gibbon?那个长臂
猿不是经常会叫嘛,有一首诗什么诗里面什么……什么……猿啼啊……什
么……它会叫……那么这个叫声呢,是什么样的呢?So it is a fixed repertoire
which is rapidly exhausted. So this one is exhaustible. It is a fixed system. It is a
closed system. But it is not an open system. But human language is an open system.
So that makes a great difference. And bee dance is also limited. It is also limited in
productivity. Only human language is productive and creative. But of course you
cannot be too creative. Especially in the foreign language learners you must obey
grammatical rules and you also need to obey the morphological rules because some
times Chinese students, you would like to coin a new word. But this is really not
advisable because it might result in some deduction of your score in TEM8 writing.
So I hope that you will avoid the coinage of new words.
Next is Duality. So double articulation. So what is duality? Duality refers to the
lower level and higher level of the linguistic system, the language system. So the
lower level, the sounds… when you listen to a foreign language, it might not mean
anything to you. So the sounds, the basic levels of language. And then high level.
High level means larger units of meaning. So high level is meaning and of course
we need to know the meaning of those words, we need to know. You first listen to
those syllables and then you address the different syllables to different word units,
then meaning. Right? And then, afterwards, the larger units of meaning would make
sense to you. Otherwise, language doesn’t make sense to you. So language is also
the system of duality. It is a communication system with duality. It is considered
more flexible. Yeah, far greater number of messages can be sent. A small number of
sounds can be grouped and regrouped in a large number of units of meaning, and the
230 Appendices
units of meaning can be arranged and rearranged into an infinite number of sen-
tences. So that is duality. Did you get it?
Last semester we also talked about the important notions and corpus linguists
and I asked you to remember the collocations. And usually those corpus linguists,
they emphasize larger units of words and phrases. Before we always use words or
isolated words. And Chinese students also memorize the isolated words. And after-
wards the corpus linguists, they turn the larger units of words, the word items. They
would like to ask students to not only memorize, but at least you should be familiar
with the collocations and items, and yeah, remember them. That would be really
useful in your writing, especially when I am grading on your writings, I really found
some students, well, they witness their great improvement especially in terms of
collocation. So last semester of learning is still very effective. And some student
also checked, I mean, you also referred to them to the corpus and then find those
useful collocations.
But grammar is still a big issue. A lot of students still make very simple gram-
matical mistakes. E.g. the subject-predicate concord…就是这个主谓一致。我们
以前改错的时候已经讲过很多遍了。那个主谓一致啊,它的什么宾语呀,还有
什么什么物主代词啊,你都得发生变化!但是很多时候这个“一致”还是做得很
差……经常这个语法错误还是会出现。那么我们说这个专八在改卷子的过程
当中,中国老师还是会非常注重你们这个语法。如果语法好的话,一般分数肯
定要高一些。所以尽量避免这些大错。还有一些同学犯一些比较小的错误,
那么肯定就是你没有最后做检查,啊,最后5分钟一定要做一个检查,最后哪怕
时间再紧,哪怕你Conclusion写得短一点,但是5分钟做检查是必须的。恩,因为
一遍下来不可能不犯什么语法错误,然后检查一遍,相对来说好一些,这样子的
话得分率也会高一些。我们还是比较强调语法的。 Ah of course content is also
a major issue but to Chinese teachers I don’t think that content makes any sense. Of
course they will see if you delivered very creative ideas, but if your composition is
full of mistakes, I don’t think they will give you a good score. So remember that…
ah… grammar still goes first.
And then, displacement. This might be a new word to you. So what is displace-
ment? What does displacement mean? Language can be used to refer to things,
which are not present: real or maybe it also can refer to the imagined matters in
the… Here are some examples.
A gibbon never utters a call about something he ate last year. No, because a gib-
bon’s call system… of course, is rather limited. So to say exhaustible… right? It’s
limited system, and they will not refer to a call to…ah… something they ate… ah
last year. But language is different. We can refer to something we will do in future.
We can also move back to our history, to use the simple past tense to refer to some-
thing we did last year, or maybe… ah… in the previous time. So it is called dis-
placement. Yeah!
There is something special about the bee dance. And bee dance is something
similar to gibbon calls. I will not repeat it. But anyway it shows that human lan-
guage is more creative. We can refer to something in our imagination, like dragon.
Dragon is never a real animal, but we also use dragon as our cultural symbol. You
see… ah, it is also an example of it. It is an example of displacement.
Appendices 231
And Cultural transmission. This is an easy term to understand. Language is cul-

turally transmitted. Chinese speaker and English speaker… they are not mutually
intelligible, why? Because language carries culture. And when we talk about trans-
lation we call… ah… 翻译墙. What is 翻译墙?翻译墙just means if this translator
is not familiar with two language systems, and then they will mechanically translate
the target language, translate the foreign language into the native language. Maybe
that translation doesn’t conform to the grammatical rules of the native language. So
that is called 翻译墙. The mechanical translation. So you see that translation… I
mean language also carries the cultural element, right? For example, you hit two
birds with one stone. 那么我们可以说是“一石双鸟”,但是“一石双鸟”在我们中
文里面其实不太有,对啊?我们不太说“一石双鸟”,那么这个实际上是英语。那
么这个也带有一种翻译的痕迹在里面。但是说如果说我们把它翻译成“一箭
双雕”,那么你马上就觉得非常的idiomatic,对吧?Because Chinese people are
more familiar with this “一箭双雕”. So language is the way to translate culture.
Ah OK, so much for these… Ah, let’s just review those key words of the basic
features of language. Ah those five words, what are the five words? What is the first
one? A打头的。。。
Ss: Arbitrary.
T: Ah… Arbitrariness. And then? The human language… Do you think human
language somewhat exhaustible? Can you exhaust linguistics system? So human
language is creative. It is productive or creative. And? Human language is what?
Human language is the system of duality. Right? It is a system of Duality. And what
is the fifth one? Cultural Transmission. Yeah! And displacement. 让我们看一下前
头讲过的。那么这至少呢,不是说呢……你要全部背出来,但至少呢,就是要熟
悉。有的时候好像有一年就考到这个Arbitrariness,如果没记错的话……反正
也就是说……也会,也会……你要熟悉它那些terms。把那些术语呀,能够一下
子一目了然。它不会考很深奥的东西,它不会考你这个文学啊,语言学里这个
很深奥的概念。但是,就是说为什么要有这个general knowledge呢?因为是英
语专业的同学,英语专业的同学,you are supposed to have these basic knowledge.
Ah. Linguistics, Culture and Literature. 所以说应该是一个基本的要求啦!就是对
英语专业的同学的一个基本的要求。以前是没有的,以前是没有这个general
knowledge的,人文知识这一块是没的……虽然只有10分,但是你想想看,你这10
分如果丢掉5分的话,你这5分要从写作啊,或者阅读啊,去捞回来的话也是很困
难的。写作基本上也固定了,大家基本上写来写去,你的这个水平我都了解了,
大概是一个三六九等,怎么样,都知道了。但是你说你要一下子,啊,你从一
个……就是说你语言上还是有很多错误的,一下要提高到没有任何错误,这个
很难。是吧?但是你像这个general knowledge,本来就是背背的事情嘛,我想基
本上,拿一个7、8分,总归要的吧,啊。你的7分8分,你要放到proofreading, 基本
上也是很难。是啊?proofreading我们上个月做下来的话基本也是,啊,错一半。
或者好一点的话嘛,对个6、7个,啊已经蛮好了。那么下次我们讲lecture com-
prehension, 啊,也是一样的,也不容易啊,所以我希望呢,就是说这个general
knowledge 能够拿分的尽量拿分,这个相对来说比较容易一点。而且5分钟能
做完。
232 Appendices
Ah, so these were called the design features. 啊,就是一些结构性的。好像有一

年的考题好像是那个:Which of the followings is not the design feature? is not the
designed feature of language. 好像有一年是这样。我们待会儿也会过一遍哦,历
年的考题。OK.
So these were the five basic features of… so called design features. Arbitrariness,
productivity, duality, displacement and cultural transmission.
Well let’s just move to something new.
What is linguistics?
---- So linguistics is the scientific study of language.
----A person who studies linguistics is known as a linguist.
So linguistics are usually considered very boring. And sometimes say… a little
bit difficult. But linguistics is a scientific study of language. So that is linguistics. It
is scientific. So some people consider it… very boring. Well again, I hope that you
can still choose topics in linguistics. I still believe linguistics is not really hard.
Actually a lot of students thought to choose literature, but they are not familiar with
that manuscript of that author. Maybe they were to write something about one
author, but actually they didn’t read the… I mean the books written by the author.
So that kind of graduation thesis is really of not good quality. I still promote
linguistics.
And the scope or major branches of linguistics. So these were the sub braches
and phonetics. Of course language is primarily vocal system. Right? It is primarily
vocal. So phonetics is a core of linguistics. But it’s rather pity that fewer and fewer
linguists would like to study phonetics. So phonetics, phonology, and… are very
hard. And…Morphology. Morphology就是我们说的词法。我们在讲语料库的
时候,实际上很多时候都是强调,强调这个词之间的一种组合。但是它可能还
没有到达那个句法,没有到达那个句法的层面,介于那个Morphology 和那个
Syntax之间,但是更强调的呢,是这个语义层面的东西。比如说什么语义韵啊,
之类的,大家可能都已经忘记了……但是,就是说,这个collocation还是很重要
的,至少你在这个写作的过程当中,是吧,曾经一度的强调,啊,比如说,这个“提高
竞争力”怎么说,啊,那么“提高竞争力”很多同学还是会用improve one’s compe-
tition,那么这个很显然,not idiomatic,so idiomatic uses “we need to sharpen our
competitive edge”, right? So… that is called idiomatic collocation.
Semantics. OK. Actually there’s also a part called pragmatics. 这个pragmatics
它可能就放在semantics里面了,那我觉得也不是特别地好。实际上pragmatics
这个圈要加在外面,应该和semantics 要分开来,那么最外围的嘛,就是这个
social linguistics. Psycholinguistics, So… and other branches of linguistics…
So, theoretical linguistics. Theoretical linguistics include phonetics, phonology,
morphology, syntax, semantics, and pragmatics. Right?
Use of Linguistics. Applied linguistics. Actually language teaching, language
education is part of applied linguistics so also a sub branch of applied linguistics.
And some students would apply for TESOL. Actually TESOL is also applied lin-
guistics. Ah TESOL is also a sub branch of applied linguistics. And sociolinguistics,
psycholinguistics. Yeah… I… will not repeat them but this is just the scope, I mean
major branches of linguistics. And theoretical linguistics… So phonetics, which is
Appendices 233
phonetics. Phonetics怎么说的?叫语音学,对吧?It is a study on speech sounds. So

it is also divided into articulatory phonetics, acoustic phonetics, auditory phonetics.
这个你就不用记了,一般情况下,不会考那么细的,啊,有的时候可能会,会,会让
你区别,哦。Can you distinguish phonetics and phonology? So can you distinguish
phonetics and phonology?
And the use of linguistics, applied linguistics… The narrow sense of applied
linguistics is language teaching, but the broader sense is not confined to language
teaching. For example, now a lot of linguists would like to apply the theories of
linguistics to analysis of utterance, analysis of text. And it is also called applied
linguistics. So some of the applied linguists would like to use corpus tools and cor-
pus theories to analyze for instance the press conference. So this is definitely… an
example of applied linguistics. So applied linguistics in narrow sense refer to lan-
guage teaching, but its broader sense is never confined to linguistics teaching.
Remember that. So 应用语言学,它实际上是超过语言教学的,语言教学只能说
是它的一个分支。因为这个应用语言学,现在这个范围也应用得非常地广。
那么现在什么,什么,啊,话语分析啊,然后,什么多模态话语分析啊……啊等等,
它实际上也是applied linguistics 啊。
Social linguistics, so in the field of social linguistics, of course social factors
would be… yeah…considered. And we talk a lot about pigeon Creole, in social
linguistics. So…. Ah… Socio linguistics, social factors will also be considered.
And Psycholinguistics. So linguistic behavior and psychological process. Like
some experts believe SLA (Second Language Acquisition) is a sub branch of
psycholinguistics.
And Stylistics. Stylistics就是文体学。你们可能也有类似的一些课,是吧。像
这个文体学的话现在也算比较热门的.So stylistics…. Some scholars would like
to… They use linguistics approaches. They employ linguistics approaches to ana-
lyze literature. So this is also very cutting-edge field of research.
Some other applications, huh? Computational linguistics like machine transla-
tion. 现在也有很多人在研究这个machine translation,那么这个也是属于这个
computational linguistics. 啊,这个就不讲了。
So descriptive and prescriptive. 这个呢,有点难的,啊,有点难的,但是……特别
是我们讲到pragmatics 的时候,我们会讲descriptive and prescriptive, 比如说这个
grammar,你觉得这个grammar是descriptive 还是prescriptive啊?Grammar, 哼。
徐闻, please. Grammar is descriptive or prescriptive? Just base on your intuition.
Based on your intuition, do you think grammar is something that is fixed or is it
prescriptive? Or descriptive. Prescriptive! Prescriptive, 规定性的。If something is
prescriptive, that means you can still distinguish the right and the wrong. Right? So
is grammar prescriptive or descriptive? Prescriptive! Yes, that is it. So grammar is
prescriptive, but when we talk about pragmatic rules, for example, so called conver-
sation… ah … what is it? Cooperative principles. Ah, the pragmatics-principles, do
you think those pragmatics rules are descriptive or prescriptive? Pragmatics rules,
啊,XXX (name of a student), please. Do you think that pragmatics rules, for exam-
ple, the cooperative principles? The four cooperative principles? Ah the Maxim
principle, do you think these principles are prescriptive or descriptive? Can you lay
down the rules? Can you lay down the rules? Can you lay down the rules that will
234 Appendices
define a language? So, in terms of pragmatic rules, they are descriptive or prescrip-
tive? 像语用学的一些,对,就是曾说是descriptive. 我们不能称之为prescriptive。
啊这个是有一点深度的。 Prescriptive的话呢,你可以区分它的错和对,我们说
语法它是这个叫做prescriptive, 那它是规范性的。你可以知道,你说I is,那肯定
是错的,对吧,肯定是错的,那么这个是prescriptive。但是如果我们讲到这个语
用,啊,语用学,讲合作原则。那么这些我们只能说是descriptive。 You can never
exhaust, you can suppose that you employ the Grice’s theory that… ah… so called
cooperative principle as frame work or research. You can always find those extra
examples that violate all those principles. So you can never exhaust your examples
in your framework. So the framework can never cover a language. So, it is usually
descriptive. We describe the language phenomena and finally draw some conclu-
sions. And those conclusions are so called principles. But those principle can never
exhaust concrete examples.所以这个descriptive和prescriptive最大的区别,一个就
是规范性的,它能够说明什么是对的,什么是错的。但是语用学的一些东西,它
只能说是descriptive。我说violate the cooperative principle, 啊,那也是可以的。
啊,你像中国的说话都是绕着弯说,那绕着弯说,那肯定是violate the cooperative
principle. If we conform to cooperative principle, we use, of course, very precise
and very direct language. But no, in some times, in some situations, we will never
use direct language, for the sake of our cultural conventions. Right? So this shows
these examples violate the principles. So those principles are never prescriptive,
they are descriptive.
So Synchronic vs. diachronic.
So Synchronic study is description of a language at some point of time. But
Diachronic study is the… I didn’t remember any students who did diachronic study.
Usually they will just do synchronic study, because diachronic study will take a long
time. So usually diachronic study will take 2 to 3 years. So there is too much time
consuming. We also do… we will not suggest that. But synchronic study is easier.
Ah, Speech and writing.
这个就没什么好说了,啊,那这个一定要记住的,这个经常考的Langue vs
parole。这个是那个Saussure,Saussure是那个就是语言学之父啦,现代语言学之
父。那么这个人的名字要记住的,啊。这个无论如何要记住。现代语言学之
父哦。
And he used to be called the father, the grandfather of modern linguistics and this
father of modern linguistics and never wrote a book. Actually the book was com-
piled by his students. The students compiled all of his lecture notes. And these lec-
ture notes are compiled into book. 所以 he didn’t write a book, but at least he is the
most important linguists. One of the most important linguists, ah.
Langue refers to the abstract linguistic system shared by all members of the
speech community.那么它指的是语言的形式,语言的形式,啊,Langue, the
abstract linguistic system,啊,那么我们把它分为语言和言语,那么有的人把它翻
译成语言和言语。那么所谓语言和言语,它的区别……语言呢,它指的就是一
个abstract system, abstract linguistic system, but parole is different. Parole is the
actualization, the realization of that linguistics system. 所以就是把它分为语言和
言语,这个一定要记住的,啊,这个肯定会考的,啊,反正就是,还有就是这个人的
名字肯定要记住。And… OK.
Appendices 235
So Saussure takes a sociological view of language and his notion of langue is a

matter of social conventions. So he believes we must obey those social conventions.
And most of the social conventions is somewhat defined by linguistics… ah… lin-
guistics system. And to them parole is different. Parole is just the realization, the
actual use of language.
This one is also important. It might be tested. Because Chomsky. I’m sure that
you all know Chomsky. 啊,那个syntax,然后要你分析,那么什么VP,那么这个XP,
VP… NP… So he defined, he made the difference between competence and perfor-
mance. So what is competence and what is performance? They appeared in pairs,
like langue and parole appeared in pairs. And then competence and performance
also appeared in pairs. Competence is the user’s knowledge of the rules of his lan-
guage. So this… Ah… According to Chomsky, competence is something that is
innate. Right? Ah… People jus acquire language they can acquire language, so why
Second Language Acquisition is called acquisition but not learning. Ah… So it is
also the root Chomsky defined this competence as the uses of knowledge of rules
and this knowledge is actually… it is… latent in your mind. And the only thing is to
trigger that part of knowledge. So this competence is innate competence.
And Performance. Performance just means the actual realization of this knowl-
edge in linguistic communication. So now, when you are using language, either you
using language vocally or in the written form. That is your actual use of language.
So actual use of language is performance. Usually there won’t be a gap, right? There
wouldn’t be a gap between them, competence and performance. No competence is
somewhat innate, but there might be critical age especially for you native language.
And the critical age. If you grow beyond that critical age, this linguistic knowledge
can not be triggered any more and of course, it undermines your performance. There
is usually a gap between competence and performance.
So Chomsky,啊,这个人的名字也一定要记住的,那么他也是这个,可以说是,
这个,啊,the greatest modern linguist. And Chomsky looks at language from a psy-
chological point of view. And to him competence is the property of mind. It is the
property of mind of each individual. So Chomsky then.. afterwards… put forward
universal. So language can be universal. The competence is the property that every-
body is equipped with.
那么这个……我们看一下……Phonology,这个我们就算了,我们就跳过吧。
这个好像至今还没怎么考到过。那么一些比较基本的我们还是要了解一下,
那么phonology和那个phonetics还是有区别的。So phonology, here, phonetics.
The difference between phonology and phonetics, is… phonology actually
focuses on the variation of speech sounds, but phonetics is the wider range. It should
be a much wider range. Then phonology… so phonetics actually treat language as a
system. So that is a difference between phonetics and phonology. So phonetics is a
sub branch of linguistics. It studies the characteristics of the speech sounds and
provides methods for their description, classification and transcription.
So three branches of phonetics. I will skip that. And major Articulatory Variables.
这个啊,这个也是非常。这个也是比较细的,但是呢,了解一下呢,也没有什么不
好。因为有一年的,就是那个10年还是11年的那个mini-lecture就是考到了。那
么mini-lecture考的是什么呢?考的是那个paralinguistic features。那么这个
236 Appendices
paralinguistic features 它实际上,它的这个主题的theme is, the associations

between paralinguistic features and their cultural background. At first, the actually
… the speaker just categorized the paralinguistic features. And there’s nasal sounds
and what? Another word… what is… a hoarse voice. So here nasal sounds. You can
see those variables… That is phonetics… can also be used to describe the paralin-
guistic features.
So that means you still need to be familiar with these notions. Voicing, like voice
and voiceless sounds, nasal sounds and then aspiration. Aspirated sounds and
unaspirated sounds.
啊,这个还是要知道的,什么元音、辅音,那么这个是,啊,我们称之为paralan-
guage啊,啊,那你还是,还是要,啊metalanguage 那么这些mealanguage我们还是
要记住的,ah, vowels and consonants. So the essential difference between the two
classes is what? For vowels, is there any obstacle? Is there any obstacle to the air-
stream? No obstacle to the airstream. So these are called… these are called vowels.
Consonants are different. 我们说气流通过没有阻碍叫元音,气流通过有阻碍,那
么叫什么?辅音。Ah, consonants.
Classification of consonants. So manner of articulation and place of articulation.
那么这个发音的方式和发音的部位,来做一些区别。哎,这个就不说了。反正
你们就稍微了解一下吧。什么叫做fricative啊,什么叫做爆破音啊,explosive啊,
对吧,啊,爆破音啊,摩擦音啊什么的,这个是place of articulation.
So the classification of vowels. Classification of vowels is easy. Like
Monophthongs. Monophthongs就是什么?单元音,they are also called pure vowels
and Diphthongs or gliding vowels,双元音,diphthongs. [ai][ ɔi][əu]什么
啊,diphthongs. So that is the first category. See, diphthongs and monophthongs.
So close, semi-close, open vowels. This is the diagram of single vowel
classification.
According to the shape of the lips or the degree of lip rounding. So rounded, so
what kind of vowels are called rounded vowels? XXX,what vowels are called
rounded vowels? Can you name some of the rounded vowels? Do you think that [a]
is a rounded vowel? Is it rounded vowel?
Ss: En...
T: [a], it’s not rounded, right? So rounded, like [ɔ:][ ɔ],so these are rounded vowels.
And actually, you need to make some strength, right? Your lips need to make some
strength. So they are called rounded vowels. The lips need to make some effort like
[ɔ:] [u:][ ɔ], so these are called rounded vowels. And others are called unrounded
vowels. Long or short. Long vowels or short vowels.
So Phonology studies the patterning of speech sounds, ah, the ways in which
speech sounds form systems and patterns in human languages. So phonology, 音位
学. And phonology is usually focused on the isolated sound patterns. And one pho-
neme could be actualized in different ways, in different words. Right? So, there’s
still some difference between phonology and phonetics. Of course both are con-
cerned with the same aspect of language, the speech sounds. Both study speech
sounds. But they differ in their approach and focus. Phonetics is of general nature.
Appendices 237
It is a wider range. 啊,那么含义更广一些。It is interested in all the speech sounds

used in all human languages; it aims to answer questions like: how they are pro-
duced, how they differ from each other, what phonetic features they have, how they
can be classified, etc. But phonology is different. Phonology aims to discover how
speech sounds in language form patterns and how these sounds are used to convey
meaning in linguistic communication.
So this is the reason why I said phonology is… phonology emphasizes the iso-
lated sound patterns, and it also emphasizes the variation of sound patterns or iso-
lated sound items. One phoneme might have different allophones, in different
words.
A phone is a phonetic unit or segment. But phoneme is different. It is a phono-
logical unit, like [i:] is a phone or phoneme? [i:][i][ e][æ]… So these are called
isolated phonemes. They are phonemes, right? But [i:] in different words have their
variations. There are variations of the phoneme [i:] in different words. So phoneme
refers to a phonological unit; it is a unit of distinctive value; an abstract unit, not a
particular sound, like [i:] is an abstract unit, it is not a particular sound. Because in
particular words, it is pronounced differently. Like [i:] in beep must be different in
pee. Right? So actually this [i:], this abstract unit of phoneme [i:] has got its varia-
tion, in different phonological contexts. 理解了伐?phoneme ,它就是一个音位,那
么它就是一个音位的话,它是一个抽象的概念,对伐,你像[i:]这个音,它是抽像
概念,因为它在不同的单词里面它实际上它发音是不一样的,哼,它发音是不一
样的,那么你比如说[p],[p],对吧!那么你如果,啊,在polite里面,你可能发[p]的音,
但是你在speak里面,它就变掉了!啊,就浊化啦!浊化了以后呢,它就变成[b]了,那
么所以说,就是这个我们说是音位学需要研究的,不同的音位在不同的单词里,
它的发音会起到变化。那么这里面也有一些规律,比如说浊化,就是一个规律,
是吧,就是音位学需要研究的。So it is a phonological unit, not a real sound, not a
particular real sound, it is abstract sound. So it is represented. It has got its varia-
tions in different phonetic context.
Allophones so the phones that can represent a phoneme in different phonetic
environments. 那么就是我刚才讲的,一个音位,一个phoneme,它可能有不同的
这个allophones, 那么像[i:]这个音,或者像[p]这个音,它在这个speak里面,那么
它就是一个allophone,对伐!它在polite 里面它就是另外一个allophone, 那么实
际上这个allophone它是……你可以把它理解成是这个phoneme的这个下意词,
它实际上是这个phoneme在不同的单词里面的一个具化,一个具化的……这
个……我不知道会不会考到那么细,但是我们也都过一遍吧。
啊,这个phonemic contrast,这个是你们在学语音的时候都会出来这个[p][b]
[t][d][k][g]… so this is the phonemic contrast. phonemic contrast complementary
distribution…
And…Minimal pair. Minimal pair is what we used in phonetics because these
days I also teach phonetics, so I would like to review these basic notions. So I hope
that you guys will be familiar with some important concepts. So these are called
minimal pairs, because only the middle vowels are different and then the consonants
are just the same. So this is called minimal pair.
So stress, tone, intonations, they are called Suprasegmental features…stress,
tone, intonations… ah… An then we always have to pay attention to intonation.
238 Appendices
This is also called something between the American sound system and British sound
system. And different accents represent the different intonation. And we call that
American intonation as a staircase intonation, because the Americans usually use
very vivid intonations, but British, they do not use vivid intonations. So that also
makes difference between RP (Received Pronunciation ) and general American.
Americans are very lazy to move their lips. So you can hardly read their lips, because
the vowels produce in a somewhat backward way…and they seldom move their
lips. So we can hardly read their… American’s lips. So we usually mock at
Americans because they are lazy, right? But there must be variations of intonation.
就是美音的话,它的这intonation,它的语调,它的起伏很大,所以我们把它称之为
staircase intonation。那么如果你想要了解,这个美因和英音的区别的话,那么,
就是说,你可以再去看一下,始终它的这个语调,是不一样的。那么我们这个语
音就讲到这里了。
然后就是这个Morphology,Morphology呢,就是把我们有一些……额……具
体的有一些……那个……就是说专有名词啊,因为也考到过,morphology也考
到过,我们看一下它有些专有名词是不一样的。那么这个我们是应该知道的,
Open class word and closed class word. So what kind of words could be categorized
as open class words,恩,XX,please? What kind of words could be categorized as
closed class words? Closed calls words… open class words.
S: Open class words is that you can add new words.
T: Such as a… what? Nouns. Right? Nouns, adjectives. They were called the con-
tent words, ah, open class words. But what kind of words are called closed class
words, for example, articles, a, an, the… are these words open class words? Or
closed class words? Are they closed or open?
S: Yes?
T: 还能加伐?你说我冠词再加一个上去?我再加一个b上去,有吗?没有的是伐?
那么这个是称之为closed class。啊,那么考细一点的话,也可以考的,啊,我说
what are the following choices are closed class words,那也可以考的是伐?啊,which
of the following, except一个横杠 is the so called open class words. Right?
So Morpheme is the minimal unit of meaning.……like one morpheme。这
个……额,大家应该有数的哦?一个语素,是伐,a boy is a one morpheme word. But
boys… boys one morpheme or two morphemes? Boys, one morpheme or two
morphemes?Two morphemes, right? Ah, so three morphemes and four morphemes,
like gentlemanliness. So they mean something like a gentle… and man… they mean
something, gentleman. It’s a combined word. And –ly, ly represents what? So ly
usually represents the part of speech, it is the adverb. But here gentlemanly is an
adjective. Because an plus ly, it could be an adjective and -ness it is definitely a mark
of noun, right? So Morpheme is the minimal unit of meaning.
Affix, prefix 什么意思啊?prefix,前缀。那么suffix呢?后缀,啊,这个没什么问
题。
Free morpheme & bound morpheme.
Free morpheme is one that may constitute a word in free form… in free form.
But a bound morpheme means those morphemes cannot appear in single words.
Appendices 239
Can they appear in single words? Like –ly or -s or -ness can they appear as single
words? Independent words. No, never, ah, they can never appear as single words. So
they usually appear a… what? They usually appear with those free morphemes. So
bound morpheme just refers to one that may appear with at least one other mor-
pheme. They can not stand by themselves. Ah, like this -s… ah, -ness…ed… ly…
这些,那么都叫做bound morphemes. Because they cannot appear independently.
So they must appear together with other morphemes.
Derivational morphemes. Usually we say… ah, foolish is derived from fool. So
these are called derivational morphemes.
So Inflectional morphemes are lexical… they have lexical. Their morphemes
which are for the most part purely grammatical markers, signifying such concepts
as tense, number, case and so on. So actually inflectional morphemes include what?
Ah, include number. Ah -s… ah plural s… or aspect. Right? Or case. Or tense…
-ed… so these are called inflectional morphemes. 这个,我们morphemes 就讲到这
里。
那么syntax, syntax, 我们还是稍微过一遍。那么最低的一个层面,我们从最
核心的phonetics, phonology开始讲,然后呢,是什么?上面一个层次就是什么?
啊,叫什么?恩,词法叫什么?morphology. 然后呢,词法在上面的就是什么?句
子。Study of sentences. The study of sentences is called syntax. So the adjective
form is syntactic. It is a branch of linguistics to study how words are combined to
form sentences, and rules that govern the formation of sentences. Chomsky says…
well Chomsky of course studies syntax. And he is the first one that initiated XP
analysis, right? So he is more interested in similarities and surface structure, deep
structure. So I guess you are familiar with these terms? Right? And Chomsky is
more interested in similarities. More than their differences. So Chomsky proposes
that grammars of all languages share the same framework. So that is called universal
grammar. And nowadays, with that new notion, because of that new notion of
Context. This context is... So it seems Chomsky’s universal grammar doesn’t have
very large market now. But of course, traditionally in America, lots of linguists still
study the universal grammar, but of course, Chomsky’s own students also betrayed
him, right? They found that a new sub branch, a new branch called Cognitive
Linguistics. But anyway, Chomsky proposes that he is more interested in the univer-
sal features. The similarities between different languages, so he proposes different
grammar, different linguistics share the same grammatical rules of the same gram-
matical framework. And that framework is called universal grammar.
Eh, well, so here are some important notions. Like category. Category refers to a
group of linguistic items which fulfill the same or similar functions in a particular
language such as a sentence, a noun phrase or a verb. The most central categories to
the syntactic study are the word-level categories, for example, part of speech is a
category. The criterion on which the category is determined, meaning, Inflection,
Distribution and the most reliable criterion on which determining a word’s category
is its distribution. Distribution of the place. Of…Yeah… the place of that word.
Coordinative rule. Coordinative sentences. Are you familiar with the coordina-
tive sentences? Coordinative sentences are also called compound sentences. They
are different from the complex sentences. And would you please translate some-
240 Appendices
thing into Chinese? What are called complex sentences and what are called com-
pound or coordinate sentences? E.g. and, but… sentences with and or but are called
coordinative sentences or complex sentences. Are they compound sentences? XX,
are they compound sentences?
S: En..
T: They are compound sentences, coordinative sentences. 就是并列的,并列的,这
种句子,这种从句的形式。Coordinative sentences. Complex sentences 就不一样
了,就是主从复合句了。Ah, e.g the attributive clauses. Sentences with attributive
clauses are definitely called complex sentences.
So, OK, we will just skip that one. Let’s just… Also very detailed… very detailed
rules in syntax. I hope that you can review them yourselves and phrase and
elements…
And are also called specifier head…
And Complements…specifiers…
Of course they are of semantic functions
Semantically, specifiers make more precise the meaning of the head; syntacti-
cally, they typically mark a phrase boundary. Specifiers can be determiners as in
NP. NP means nominal phrases, VP, verb phrases.
Complements, so… Complements themselves can be a phrase, they provide
information they provide complementary information about entities and locations
whose existence is implied by the meaning of the head, e.g. a story about a senti-
mental girl; There can be no complement, one complement, or more than one com-
plement in a phrase. So a story about a sentimental girl. Of course story is the head,
about the sentimental girl, the complement.
And modifiers, so modifiers could be pre modifiers and post modifiers.
Deep structure & surface structure, 这个是肯定要记住的,这个是肯定要记住
的,啊深层结构、表层结构,因为这个是句法里面经常讲的,然后要你们画图,画
VP,就是说什么VP、NP画出来,对吧,那么来区分它这个deep structure, surface
structure… ah… sometimes they share the same surface structure, but actually
when you analyze those sentences they had different deep structures. And here:
John is easy to please.
John is eager to please.
They have the same surface structure. Right? They have the same surface struc-
ture, but actually the meanings are different. The deep structure is different. So
structurally similar sentences might be very different in their meanings, for they
have quite different deep structures. Like these were also the frequent quoted exam-
ples to determine their different structures. Consider the sentence, flying planes can
be dangerous. So flying planes can be dangerous here, flying planes can be… what?
The deep structure, the meaning could be different in different complex, like flying
plane. For instance, in one case, it means you are engaged… you are engaged in this
dangerous activity. So flying planes, here fly… this is definitely a verb. A verb. So
here flying is verb. And plane is the object of the verb. Right? So flying planes
Appendices 241
means you are involved in that dangerous activity. But flying planes flying could
also be a pre-modifier that modifies plane, and then flying planes could be an NP. A
non-verb phrase. So flying planes could be dangerous. Because flying planes is an
object that is not stable. And it might collide with some high building, like 911, so
the flying planes… may collide with the high rising buildings…
So you see this sentence actually involves different deep structures.
And… 这个就不讲了哦,实在讲起来也特别地,这个枯燥。反正这个是……
句法里面最重要的就是,这个啊,surface structure, deep structure,反正你们去过
一遍吧,一些基础的一些理论再去过一遍。
那么semantics也经常要考到的,啊,也经常要考到的,比方说,比如说像这个,
有一年就考到了,啊,有一年就考到了,那么我们来复习一下它是什么意思。我
来讲讲看它是啥意思,啊,它叫这个Synonymy,啊,Synonymy。Synonym, we all
know synonym, antonym, right? So what is the meaning of synonym, antonym?
And Homonymy and Hyponymy, would you please translate that into Chinese? Can
you please translate that into Chinese?
啊,XXX, please, so what are synonyms?
S: 同义。
T: 哎,同义,啊,同义. Like small and tiny, they are synonyms. And what are
antonyms?
S: 反义。
T: Like black and white, they are antonyms. So 反义关系, Polysemy?
S: En…
T: 就是一词多义,一词多义。Polysemy. And… then Homonymy, 同音异义,也
属于这个Homonymy。Ah, that’s right, 那么这个Hyponymy呢?Hyponymy是怎
么样的一个结构啊?啊,怎么样一个关系啊?Hyponymy,哎上下义关系,那么我们
说,比如说colour,是个上位词,对吧?那么下位词可以有很多了,包括red, yellow,
green啊,那么这个是上下义的关系,叫Hyponymy。那么这个是考过的,是考到
过的,所以你还是要记住它。它是什么意思你还是要记住的。So they are called
the major sentence relations, because they are part of the central part of semantics
Synonymy, right? Synonymy refers to the sameness or close similarity of mean-
ing. Words that are close in meaning… they are called synonyms.
Dialectal synonyms are synonyms used in different regional dialects, like
autumn - fall, biscuit – cracker… so they are called dialectal. They are derived
from… this dialectal is derived from the word dialect. I’m sure that you know dia-
lect… Shanghai dialect, other dialects? So that means synonyms used in different
regions. For instance, the British language, the American language, right? Ah
American English, British English… so some words we use in British English and
some words we use in American English, like fall is more frequently used in
American English and gasoline…
242 Appendices
And… So because they are synonyms. They are definitely synonyms. That means
synonymous to each other, but they are used in different regions… ah different
words are used in different regions, but actually they mean the same thing.
Ah, Stylistic synonyms,这个考到过了,啊,Stylistic synonyms考到过了。就是
说,这个. kid, child, offspring,那么为什么说它是这个Stylistic difference啊?为什
么啊?Well because they are stylistic. So they differ in terms of what? They differ in
terms of what?
S: En…
T: 恩?Why are they called stylistic synonyms? 啊,XXX please, why are they called
stylistic synonyms?
S: En…
T: They means the same, right! Offspring, or my offspring… my kid, my child,
they means the same, but they differ in terms of their style So they differ in terms of
their… what? Which one sounds more far? Which one sounds more far? Offspring
or kid?
S: Offspring.
T: Yes, offspring, so they differ in terms of formality. So this is called stylistic. Ah
difference.
那么我们下次课的话, 下次lecture的话结束掉,然后再将一部分那个文化方
面的。反正我们争取在三堂lecture 里面把这个general knowledge结束掉。然
后大家不要忘记做什么东西啊?就是你们一个practice 1, 还有一个practice 3,把
那个general knowledge 和那个translation 都做好,我明天要检查,明天还要讲一
下,就是那个,translation对掉。
所以明天上课的话,我们还是到那个traditional classroom, OK? So thank you
very much. And please hand in your compositions. OK, thank you for your
attention.
Appendix III Interviews
Guiding Questions in the Program Administrator’s Interview
We are conducting a study on the impact of TEM on English teaching and learning.
Could you please help us by answering the following questions? Your responses
will be treated in confidence and will not be used for any other purposes. Thank you
very much for your time and cooperation!
1. Do the English majors in your school attend TEM 4/8?
2. Are you familiar with TEM? (test content, method, score allotment, time allot-
ment, marking criteria, etc.) What do you think of the marking criteria of TEM?
Appendices 243
(Is it easy to follow? Do you refer to it in TEM preparation?) Do the teachers in

your school participate in TEM marking?
3. How do you evaluate TEM? (Is it difficult? Are the test content and method
appropriate?)
4. How does TEM affect the course design in your school? (What are the English
courses introduced in your school?)
5. Do you have TEM preparation courses in your school? If yes, what is the differ-
ence between TEM preparation classes and non-preparation classes, such as fun-
damental English course? What do you think of the difference? Will it lead to
students’ better performance in TEM? Will it help improve English ability?
6. Do you have any comments and suggestions on TEM and teaching for English
majors?
Guiding Questions in the Teacher’s Interview
1. Have you ever taught TEM related courses? (If no, are you familiar with TEM?)
(Is it easy to follow? Do you refer to it in TEM preparation?)
appropriate?)
4. How does TEM affect your teaching? (teaching content, method, pressure, etc.)
5. What do you think is most important in TEM preparation? What do you think of
TEM preparation?
6. What do you think is the difference between TEM preparation classes and non-
preparation classes, such as fundamental English course? What do you think of
the difference? Will it lead to students’ better performance in TEM? Will it help
improve English ability?
(If the teacher has not taught TEM preparation courses, go to Question 10.)
7. Do you speak Chinese more in TEM preparation classes? What do you think of
this practice?
8. Do you lecture more in TEM preparation classes? Do students practice less?
What do you think of this practice?
9. Do you have students work on TEM items after class? Do you mark their home-
work? Which item do you mark the most frequently?
majors?
244 Appendices
Guiding Questions in the Student’s Interview
1. Have you ever taken TEM (4,8)?
(Is it easy to follow? Do you refer to it in TEM preparation?)
appropriate?)
4. How does TEM affect your English learning? (interest, effort, etc.)
5. How do you evaluate your TEM preparation experience?
6. What do you think is the difference between TEM preparation classes and non-
preparation classes, such as fundamental English course? What do you think of
the difference? Will it lead to better performance in TEM? Will it help you
improve your English ability?
7. Does your teacher speak Chinese more in TEM preparation classes? What do
you think of this practice?
8. Does your teacher lecture more in TEM preparation classes? Do students prac-
tice less? What do you think of this practice?
9. Which item type in TEM do you practice the most? (Do you spend less time on
sbjective items such as writing? Why?)
majors?
Bibliography
C. Alderson, Foreword. [A], in Washback in Language Testing: Research Context and Methods,
ed. by L. Cheng, Y. Watanabe, A. Curtis (Eds), (Lawrence Erlbaum Associates, Inc, Mahwah,
2004), pp. ix–xii
C. Alderson, L. Hamp-Lyons, TOEFL preparation courses: A study of washback [J]. Lang. Test.
13(3), 280–297 (1996)
C. Alderson, D. Wall, Does washback exist? [J]. Appl. Linguis. 14(2), 116–129 (1993)
R. Al-Mahrooqi et al. (eds.), Revisiting EFL Assessment: Critical Perspectives [C] (Springer,
Cham, 2017)
American Educational Research Association (AERA), American Psychological Association(APA),
National Council on Measurement in Education(NCME), Standards for Educational and
Psychological Testing [Z] (AERA, APA, NCME, Washington, DC, 1985)
L. Bachman, Fundamental Considerations in Language Testing [M] (Oxford University Press,
Oxford, 1990)
L. Bachman, A. Palmer, Language Testing in Practice [M] (Oxford University Press, Oxford,
1996)
L. Bachman, A. Palmer, Language Assessment in Practice [M] (Oxford University Press, Oxford,
2010)
K. Bailey, Working for washback: A review of the washback concept in language testing [J]. Lang.
Test. 13(3), 257–279 (1996)
C. Bogdan, K. Biklen, Qualitative Research in Education: An Introduction to Theory and Methods
[M] (Allyn & Bacon, Boston, 2006)
J. Cai, CET iBT: Significance and washback [J]. Comput. Assist. Foreign Lang. Educ. China 5,
3–8 (2009)
L. Cheng, How does washback influence teaching? Implications for Hong Kong [J]. Lang. Educ.
11, 38–54 (1997)
L. Cheng, Changing Language Teaching through Language Testing: A Washback Study [M]
(Cambridge University Press, Cambridge, 2005)
L. Cheng, Washback, impact and consequences. [A], in Encyclopedia of Language and Education,
Vol. 7: Language Testing and Assessment, ed. by E. Shohamy, N. H. Hornberger (Eds), 2nd
edn., (Springer, New York, 2008), pp. 349–364
L. Cheng, Voices from test takers: Further evidence for language assessment and use [J]. Educ.
Assess. 16, 104–122 (2011)
L. Cheng, Y. Watanabe, A. Curtis (eds.), Washback in Language Testing: Research Context and
Methods [C] (Lawrence Erlbaum Associates, Inc., Mahwah, 2004)
D. Cohen, The coming of age of research on test-taking strategies [J]. Lang. Assess. Q. 3, 307–331
(2006)

https://doi.org/10.1007/978-981-13-1963-1
246 Bibliography
J. Cronbach, Course improvements through evaluation [J]. Teach. Coll. Rec. 64, 672–683 (1963)
W. Dai, H. Feng, A retrospect of the tests for foreign language majors [J]. Foreign Lang. World 6,
2–8 (2010)
Z. Dang, Improving the Authenticity of the Listening Sub-tests in TEM [D]. Unpublished PhD
Dissertation. Shanghai International Studies University, 2004
A. Davies, Demands of being professional in language testing [J]. Lang. Test. 14(3), 328–339
(1997)
A. Davies et al., Dictionary of Language Testing [Z] (University of Cambridge Local Examinations
Syndicate, Cambridge, 1999)
N. Denzin, Research Act: Theoretical Introduction to Sociological Methods [M] (McGraw-Hill,
New York, 1978)
I. Ferman, The washback of an EFL national oral matriculation test to teaching and learning [A], in
Washback in Language Testing: Research Context and Methods, ed. by L. Cheng, Y. Watanabe,
A. Curtis (Eds), (Lawrence Erlbaum Associates, Inc, Mahwah, 2004), pp. 191–210
N. Fredericksen, The real test Bias: The influence of testing on teaching and learning [J]. Am.
Psychol. 39(3), 193–202 (1984)
R. Frederiksen, A.A. Collins, Systems approach to educational testing [J]. Educ. Res. 18(9), 27–32
(1989)
M. Frohlich, N. Spada, P. Allen, Differences in the communicative orientation of L2 classrooms
[J]. TESOL Q. 19(1), 27–57 (1985)
R. Gorsuch, Factor Analysis [M] (Lawrence Erlbaum, Hillsdale, 1983)
A. Green, IELTS Washback in Context [M] (Cambridge University Press, Cambridge, 2007)
X. Gu, Positive or Negative—an Empirical Study of CET Washback [M] (Chongqing University
Press, Chongqing, 2007)
X. Gu, Y. Peng, A longitudinal study on college English teachers’ perception of CET and its wash-
back [J]. Foreign Lang. Their Teach. 6, 37–41 (2010)
L. Hamp-Lyons, Ethical test preparation practice: The case of the TOEFL [J]. TESOL Q. 33(2),
329–337 (1998)
L. Hamp-Lyons, Fairness in language testing [A], in Fairness and Validation in Language
Assessment, Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University
Press, Cambridge, 2000a), pp. 30–34
L. Hamp-Lyons, Social, professional and individual responsibility in language testing [J]. System
28, 579–591 (2000b)
R. Hawkey, Impact Theory and Practice: Studies of the IELTS Test and Progetto Lingue 2000 [M]
(Cambridge University Press, Cambridge, 2006)
D. Ho, Why do teachers ask the questions they ask? [J]. Reg. Lang. Cent. J. 36(3), 297–310 (2005)
A. Hughes, Introducing a needs-based test of English language proficiency into an English
Medium University in Turkey [A], in Testing English for University Study (ELT Documents
#127), ed. by A. Hughes (Ed), (Modern English Publications in association with the British
Council, London, 1988), pp. 134–146
A. Hughes, Backwash and TOEFL 2000 [Z]. Unpublished manuscript, University of Reading,
1993
A. Hughes, Testing for Language Teachers [M] (Cambridge University Press, Cambridge, 2003)
Y. Jin, Backwash effect of CET-SET on the teaching of EFL in China [J]. Foreign Lang. World 4,
56–61 (2000)
Y. Jin, On the improvement of test validity and test washback—The CET washback study [J].
Foreign Lang. World 6, 65–73 (2006)
Y. Jin, J. Fan, Test for English Majors (TEM) in China [J]. Lang. Test. 28(4), 589–596 (2011)
M. Kane, Current concerns in validity theory [J]. J. Educ. Meas. 38(4), 319–342 (2001)
T. Kellaghan, V. Greaney, Using Examinations to Improve Education: A Study of Fourteen African
Countries [M] (World Bank, Washington, DC, 1992)
Y. Kong, J. Nie, Compound dictation in CET and its backwash effect on language teaching [J].
Foreign Lang. World 2, 51–57 (2002)
Bibliography 247
A. Kunnan, Fairness and justice for all [A], in Fairness and Validation in Language Assessment,
Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University Press,
Cambridge, 2000), pp. 1–14
A. Kunnan, Language assessment from a wider context [A], in Handbook of Research in Second
Language Teaching and Learning, ed. by E. Hinkel (Ed), (Lawrence Erlbaum Associates, Inc.,
Mahwah, 2005), pp. 779–794
H. Latham, On the Action of Examinations Considered as a Means of Selection [M] (Deighton,
Bell and Company, Cambridge, 1877)
X. Li, How powerful can a language test be? The MET in China [J]. J. Multiling. Multicut. Dev.
11(5), 393–404 (1990)
S. Li, X. Chen, On the present and future of TEM from the perspective of language testing theories
[J]. Foreign Lang. World 3, 15–20 (2012)
X. Lian, On the testing device and the modifications of the scoring system for dictation in CEPTEM
[J]. Foreign Lang. Educ. 2, 75–79 (1991)
W. Lu, SPSS for Windows Statistical Analysis [M] (Publishing House of Electronics Industry,
Beijing, 2006)
E. Macaro, Codeswitching in the L2 classroom: A communication and learning strategy [A], in
Non-native Language Teachers: Perceptions, Challenges and Contributions to the Profession,
ed. by E. Llurda (Ed), (Springer, New York, 2005), pp. 63–84
F. Madaus, The influence of testing on the curriculum [A], in Critical Issues in Curriculum, ed. by
L. N. Tanner (Ed), (Chicago University Press, Chicago, 1988), pp. 83–121
F. McNamara, Measuring Second Language Performance [M] (Longman, Harlow, 1996)
S. Messick, Validity [A], in Educational Measurement, ed. by R. Linn (Ed), 3rd edn., (ACE and
Macmillan, New York, 1989), pp. 13–103
S. Messick, Validity and washback in language testing [J]. Lang. Test. 13, 241–256 (1996)
K. Morrow, The evaluation of tests of communicative performance [A], in Innovations in Language
Testing: Proceedings of the IUS/NFER Conference, ed. by M. Portal (Ed), (NFER/Nelson,
London, 1986), pp. 1–13
A. Moss, J. Girard, C. Haniford, Validity in educational assessment [J]. Rev. Res. Educ. 30, 109–
162 (2006)
NACFLT, Syllabus for University English Language Teaching [Z] (Shanghai Foreign Language
Education Press/Foreign Language Teaching and Research Press, Shanghai/Beijing, 2000)
NACFLT, Syllabus for TEM4 [Z] (Shanghai Foreign Language Education Press, Shanghai, 2004a)
NACFLT, Syllabus for TEM8 [Z] (Shanghai Foreign Language Education Press, Shanghai, 2004b)
NACFLT, Syllabus for TEM4-Oral [Z] (Shanghai Foreign Language Education Press, Shanghai,
2005)
NACFLT, Syllabus for TEM8-Oral [Z] (Shanghai Foreign Language Education Press, Shanghai,
2008)
Q. Patton, Qualitative Evaluation and Research Methods, 2nd edn. [M]. (Sage, London, 1990)
J. Popham, Two-plus decades of educational objectives [J]. Int. J. Educ. Res. 11(1), 31–41 (1987)
L. Qi, The intended washback effect of the National Matriculation English Test in China: Intentions
and reality [M] (Foreign Language Teaching and Research Press, Beijing, 2004)
L. Qi, A review of washback studies [J]. Foreign Lang. Learn. Theory Pract. 4, 23–28 (2011)
X. Qin, Fully exploiting the positive washback effects of TEM4/8 to arrest the decline in the qual-
ity of teaching and learning [J]. Foreign Lang. World 3, 10–14 (2012)
B. Resnick, P. Resnick, Assessing the thinking curriculum: New tools for educational reform [A],
in Changing Assessments: Alternative Views of Aptitude, Achievement and Instruction, ed. by
B. G. Gifford, M. C. O’Conner (Eds), (Kluwer Academic Publishers, Boston, 1992), pp. 37–75
X. Shi, A longitudinal backwash study of the new CET4/6 listening tests [J]. Foreign Lang. World
3, 80–86 (2010)
C. Shih, Perceptions of the General English Proficiency Test and its Washback: A Case Study at two
Taiwan Technological Institutes [M] (LAP LAMBERT Academic Publishing, Saarbrucken,
2011)
248 Bibliography
E. Shohamy, Fairness in language testing [A], in Fairness and Validation in Language Assessment,
Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University Press,
Cambridge, 2000), pp. 15–19
E. Shohamy, The Power of Tests: A Critical Perspective of the Uses of Language Tests [M]
(Longman, Harlow, 2001)
E. Shohamy, S. Donitsa-Schmidt, I. Ferman, Test impact revisited: Washback effect over time[J].
Lang. Test. 13(3), 298–317 (1996)
L. Smith, Put to the test: The effects of external testing on teachers [J]. Educ. Res. 20(5), 8–11
(1991)
H. Smith, S. Higgins, Opening classroom interaction: The importance of feedback [J]. Camb.
J. Educ. 36(4), 485–502 (2006)
M. Spaan, Enhancing fairness through a social contract [A], in Fairness and Validation in Language
Assessment, Studies in Language Testing 9, ed. by A. Kunnan (Ed), (Cambridge University
Press, Cambridge, 2000), pp. 35–38
B. Spolsky, The examination-classroom backwash cycle: some historical cases [A], in Bringing
About Change in Language Education, ed. by D. Nunan, R. Berry, V. Berry, (The University of
Hong Kong, Department of Curriculum Studies, Hong Kong, 1996), pp. 55–66
SPSS Inc, SPSS Statistics for Windows, Version 17.0 [Z] (SPSS Inc., Released, Chicago, 2008)
Y. Sun, Context, Construct, and Consequences: Washback of the College English Test in China [M]
(LAP LAMBERT Academic Publishing, Saarbrucken, 2016)
Y. Tang, J. Peng, The washback effect of CET spoken English test upon college English learners
The TEM Test Centre, The Test for English Majors (TEM) validation study [M] (Shanghai Foreign
Language Education Press, Shanghai, 1997)
R. Ullmann, E. Geva, The Target Language Observation Scheme (TALOS): Handbook [M]
(Modern Language Center, Ontario Institute for Studies in Education, Mimeo, 1982)
L. Van Lier, Reeling, writhing, drawling, stretching and feinting in coils: Oral proficiency inter-
views as conversation [J]. TESOL Q. 23, 489–508 (1989)
D. Wall, Introducing new tests into traditional systems: Insights from general education and from
innovation theory [J]. Lang. Test. 13(3), 334–354 (1996)
D. Wall, The impact of high-stakes testing on teaching and learning: can this be predicted or con-
trolled? System 28(4), 499–509 (2000)
D. Wall, The Impact of High-Stakes Examinations on Classroom Teaching: A Case Study Using
Insights from Testing and Innovation Theory [M] (Cambridge University Press, Cambridge,
2005)
D. Wall, J. Alderson, Examining washback: The Sri Lankan impact study [J]. Lang. Test. 10(1),
41–69 (1993)
Y. Watanabe, Does grammar-translation come from the entrance examination? Preliminary find-
ings from classroom-based research [J]. Lang. Test. 13(3), 318–333 (1996)
Y. Watanabe, Constructing a classroom observation scheme for the test impact research[J]. Sophia
Linguistica 41, 297–313 (1997)
Y. Watanabe, Methodology in washback studies [A], in Washback in Language Testing: Research
Context and Methods 9, ed. by L. Cheng, Y. Watanabe, A. Curtis (Eds), (Lawrence Erlbaum
Associates, Inc, Mahwah, 2004), pp. 9–36
J. Weir, Communicative Language Testing [M] (Prentice Hall, New York, 1990)
Q. Wen, K. Wu, W. Wang, Y. Ren, A study of the feasibility of the National Oral Test for English
majors-band 8 [J]. Foreign Lang. World 4, 67–73 (2002)
Q. Wen, W. Wang, D. Zhou, Y. Wang, A study on the implementation of National Oral Test for
English majors-band 8 [J]. Foreign Lang. World 5, 53–58 (2005)
H. Wesdorp, Backwash effects of language-testing in primary and secondary education [J]. J. Appl.
Lang. Stud. 1(1), 40–55 (1982)
Y. Wu, On the new version of TEM8 syllabus and its wash-back effects [J]. Foreign Lang. Teach.
Abroad 1, 36–40 (2005)
Bibliography 249
Q. Xi, J. Jiang, TEM4 testing modes and fundamental English teaching reform [J]. Foreign Lang.
Their Teach. 4, 24–27 (2006)
W. Xiao, X. Gu, C. Ni, The washback mechanism of CET: a longitudinal study via multi-group
structural equation modeling [J]. Foreign Lang. Learn. Theory Pract. 3, 37–43 (2014)
Q. Xu, A study on the washback effects of TEM8 [J]. Foreign Lang. World 3, 21–31 (2012)
N. Yao, Giant footsteps: A retrospective account of TEM4/8 from 1990 to 2010 [J]. Foreign Lang.
World 6, 19–25 (2010)
J. Ye, Thoughts on the backwash of CET to teaching [J]. Foreign Lang. World 3, 40–43 (1998)
G. Yu, Y. Jin (eds.), Assessing Chinese Learners of English: Language Constructs, Consequences
and Conundrums[C] (Palgrave Macmillan, London, 2016)
Y. Zhu, Towards the standardization and Scientificness of Tem4 and TEM8 [J]. Foreign Lang.
World 1, 24–29 (1995)
P. Zhu, Changes in the new editions of syllabus for TEM4 and TEM8 [J]. Foreign Lang. World 1,
67–70 (2005)
S. Zou, A survey report on TEM4 and TEM8 in some universities and colleges [J]. Foreign Lang.
World 1, 20–23 (1995)
S. Zou, The alignment between teaching syllabus and language tests: In the case of TEM8 design
and administration [J]. Foreign Lang. World 6, 71–78 (2003)
S. Zou, Understanding the washback effects of tests—With special reference to the revision of the
TEM4/8 test battery [J]. Foreign Lang. World 5, 59–66 (2005)
S. Zou, Test for English majors – Its past and future [A], in Foreign Language Educators on
Language Education, ed. by Z. Zhuang (Ed), (Shanghai Foreign Language Education Press,
Shanghai, 2008), pp. 378–390
S. Zou, Looking back and ahead – In commemoration of the 20th anniversary of TEM4 and TEM8
S. Zou, Q. Xu, Criterion-referenced assessment and its washback effects: The case of TEM [J].
Foreign Lang. Learn. Theory Pract. 1, 42–48 (2014)

(Qian Xu Jun Liu) A Study On The Washback Effects (B-Ok - CC)

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

(Qian Xu Jun Liu) A Study On The Washback Effects (B-Ok - CC)

Загружено:

Авторское право:

Доступные форматы

Qian Xu · Jun Liu

A Study on the Washback

ISBN 978-981-13-1962-4 ISBN 978-981-13-1963-1 (eBook)

Library of Congress Control Number: 2018951416

© Springer Nature Singapore Pte Ltd. 2018

1 TEM: A High-Stakes Standardized Test in the Chinese Context�������� 1

3 Theoretical Frameworks for Washback Studies ���������������������������������� 23

5.3 Teachers’ Perceptions ���������������������������������������������������������������������� 123

7 Conclusions and Implications ���������������������������������������������������������������� 179

TOEFL Test of English as a Foreign Language

Fig. 1.1 Educational hierarchy of English teaching for English majors........... 2

Table 5.11 Courses introduced by TEM participating universities.................. 119

Table 6.1 Background information of the teachers observed......................... 158

1.1 The Educational System in China

In China, quite a few universities offer a four-year program for undergraduate

© Springer Nature Singapore Pte Ltd. 2018 1

Department of Higher Education

National Advisory Committee for Foreign Language Teaching

Teaching Syllabus Designers TEM Testing Committee

Teaching Syllabus TEM

Fig. 1.1 Educational hierarchy of English teaching for English majors

1.2 The History of TEM

1.2.1 The Launch of TEM

1.2.2 The Development of TEM

Table 1.1 (continued)

1.2.3 The Soaring Test Population

1.2.4 Research on TEM

Table 1.2 TEM studies published over the years

2. Validation of a certain item in TEM investigated the construct of that section,

1.3 TEM Test Characteristics

TEM is a set of criterion-referenced English tests administered by NACFLT on

1.3.1 TEM Test Purpose

1.3.2 TEM Test Content and Format

Table 1.4 Test content and format of TEM

Table 1.4 (continued)

1.3.3 TEM Administration and Marking

1.3.4 TEM Test Score Report and Interpretation

1.4 TEM Test Characteristics vs. the Teaching Syllabus

1.4.1 The Teaching Syllabus

The Teaching Syllabus is designed to standardize education for undergraduate

1.4.2  Comparison Between the Teaching Syllabus and TEM

2.1 Definitions of Washback

2.1.1 Washback or Backwash

© Springer Nature Singapore Pte Ltd. 2018 17

2.1.2 Various Terminology

In applied linguistics, washback is generally defined as the effect of testing on

2.1.3 Washback and Validity

However, this definition is disputed. Ferman (2004) suggested that it is difficult

2.2 Dimensions of Washback

Washback can be either intended or unintended. Intended washback refers to the

3.1 Frameworks for Washback Studies

3.1.1 Alderson and Wall’s Washback Hypotheses

© Springer Nature Singapore Pte Ltd. 2018 23

14. Tests will have washback on all learners and teachers.

3.1.2 Hughes’ Washback Trichotomy

3.1.3 Bailey’s Basic Model of Washback

Combining Alderson and Wall’s Washback Hypotheses with Hughes’ Washback

3.1.4 Green’s Washback Model

In his washback study on IELTS, Green (2007) proposed a model, incorporating

3.2 Empirical Studies on Washback

3.2.1 Summary of Empirical Studies

Table 3.1 Overview of the empirical studies on washback

2007 teachers; students classroom observations; about academic writing;

1 TEM: A High-Stakes Standardized Test in the Chinese Context�� 1

3 Theoretical Frameworks for Washback Studies �� 23

5.3 Teachers’ Perceptions �� 123

7 Conclusions and Implications �� 179

1.1 The Educational System in China

1.2 The History of TEM

1.2.1 The Launch of TEM

1.2.2 The Development of TEM

1.2.3 The Soaring Test Population

1.2.4 Research on TEM

1.3 TEM Test Characteristics

1.3.1 TEM Test Purpose

1.3.2 TEM Test Content and Format

1.3.3 TEM Administration and Marking

1.3.4 TEM Test Score Report and Interpretation

1.4 TEM Test Characteristics vs. the Teaching Syllabus

1.4.1 The Teaching Syllabus

1.4.2 Comparison Between the Teaching Syllabus and TEM

2.1 Definitions of Washback

2.1.1 Washback or Backwash

2.1.2 Various Terminology

2.1.3 Washback and Validity

2.2 Dimensions of Washback

3.1 Frameworks for Washback Studies

3.1.1 Alderson and Wall’s Washback Hypotheses

3.1.2 Hughes’ Washback Trichotomy

3.1.3 Bailey’s Basic Model of Washback

3.1.4 Green’s Washback Model

3.2 Empirical Studies on Washback

3.2.1 Summary of Empirical Studies

3.2.2 A Critique of Approaches in Washback Studies

4.1 Framework for TEM Washback Study

4.1.1 Contributing Factors to TEM Washback

4.1.2 TEM Washback Mechanism

4.1.3 Several Terms Redefined

4.1.3.1 TEM Test Construct

4.1.3.2 TEM Test Characteristics

4.1.3.4 Positive and Negative Washback

4.1.4 Research Design

4.3.1 Research Design for the Questionnaire Surveys

4.3.1.1 The Survey Questionnaires

4.3.1.2 Data Collection

4.3.1.3 Data Analysis

4.3.2 Classroom Observations

4.3.2.1 English Classrooms Observed

4.3.2.2 Teachers Observed

4.3.2.3 The Observation Scheme

4.3.2.4 Data Collection

4.3.2.5 Data Analysis

4.4 Considerations for Validity and Reliability

4.4.1 The Development of Research Instruments

4.4.1.1 The Questionnaires

4.4.1.2 The Classroom Observation Scheme

4.4.1.3 The Interview Questions

4.4.2 Data Collection and Analysis

4.4.2.1 The Questionnaire Surveys

4.4.2.2 The Classroom Observations

4.4.2.3 The Interviews

4.4.3.1 Data Triangulation

4.4.3.2 Methodological Triangulation

5.1 Experts’ Perceptions

5.1.1 Demographic Information

5.1.2 Experts’ Evaluation of TEM

5.1.2.1 TEM Characteristics