76173800

TESOL
International
Journal
Teaching English to Speakers of Other Languages
Volume 12 Issue 1 2017 ISSN 2094-3938

TESOL INTERNATIONAL JOURNAL
Volume 12 Issue 1, 2017
Special Issue
Teaching, Learning, and Assessing Vocabulary
Guest Editors
Marina Dodigovic
Stephen Jeaco
Rining Wei
Chief Editor
Xinghua Liu
Published by the TESOL International Journal
http://www.tesol-international-journal.com
© English Language Education Publishing Brisbane

Australia
This book is in copyright. Subject to statutory exception no reproduction of any part may take place without
the written permission of English Language Education Publishing.
No unauthorized photocopying
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying or otherwise, without the prior written permission
of English Language Education Publishing.
Chief Editor: Dr. Xinghua Liu
ISSN. 2094-3938
TESOL International Journal
Chief Editor
Xinghua Liu
Shanghai Jiao Tong University, China
Associate Editors
Hanh thi Nguyen Dean Jorgensen
Hawaii Pacic University, USA Gachon University, South Korea
Joseph P. Vitta Khadijeh Jafari
Queen’s University Belfast, UK Islamic Azad University of Gorgan, Iran
Editorial Board
Ai, Haiyang - University of Cincinnati, USA Li, Chili - Hubei University of Technology, China
Anderson, Tim - University of British Columbia, Canada Li, Liang - Jilin Normal University, China
Arabmofrad, Ali - Golestan University, Iran Li, Yiying - Wenzao Ursuline University, Taiwan
Batziakas, Bill - Queen Mary University of London, UK Lo, Yu-Chih - National Chin-Yi University of Technology, Taiwan
Behfrouz, Behnam - University of Buraimi, Oman Nguyen, Ha Thi - Monash University, Australia
Bigdeli, Rouhollah Askari - Yasouj University, Iran Niu, Ruiying - Guangdong University of Foreign Studies, China
Bretaña – Tan, Ma Joji - University of the Philippines, O'Brien, Lynda - University of Nottingham Ningbo, China
Philippines
Çakir, İsmail - Erciyes University, Turkey Rozells, Diane Judith - Sookmyung Women’s University, S. Korea
Chang, Tzu-shan - Wenzao Ursuline University of Languages, Salem, Ashraf Atta Mohamed Safein - Sadat Academy for
Taiwan Management Sciences, Egypt
Choi, Jayoung - Georgia State University, USA Sultana, Shahin - B. S. Abdur Rahman University, India
Chuenchaichon, Yutthasak - Naresuan University, Thailand Ta, Thanh Binh - Monash University, Australia
Chung, Edsoulla - University of Cambridge, UK Tran-Dang, Khanh-Linh - Monash University, Australia
Cutrone, Pino - Nagasaki University, Japan Ulla, Mark B. - Mindanao State University, Philippines
Dang, Doan-Trang Thi - Monash University, Australia Witte, Maria Martinez - Auburn University, USA
Deng, Jun - Central South University, China Wu, Chiu-hui - Wenzao Ursuline University of Languages, Taiwan
Derakhshan, Ali - Golestan University, Iran Yan, Yanxia - Xinhua University, China
Dodigovic, Marina - American University of Armenia, Armenia Yu, Jiying - Shanghai Jiao Tong University, China
Farsani,Mohammad Amini - Kharazmi University, Iran Zhang, Xinling - Shanghai University, China
Floris, Flora Debora - Petra Christian University, Indonesia Zhao, Peiling - Central South University, China
Hos, Rabia - Zirve University, Turkey Zhao, Zhongbao - Hunan Institute of Science and Technology,
China
Ji, Xiaoling - Shanghai Jiao Tong University, China
Jiang, Xuan - St. Thomas University, USA
Kambara, Hitomi - University of Oklahoma, USA
Khajavi, Yaser - Shiraz university, Iran
Lee, Sook Hee - Charles Sturt University, Australia
Contents
Trends in Vocabulary Research

Marina Dodigovic, Stephen Jeaco, Rining Wei 1
A New Inventory of Vocabulary Learning Strategy for Chinese Tertiary EFL Learners
Xuelian Xu, Wen-Cheng Hsu 7
“I Used Them Because I Had to . . .”: The Effects of Explicit Instruction of

Topic-Induced Word Combinations on ESL Writers
Jelena Colovic-Markovic 32
The Effect of Input Enhancement on Vocabulary Learning:

Is There An Impact upon Receptive And Productive Knowledge?
Christian Jones, Daniel Waller 48
Vocabulary Teaching: Insights from Lexical Errors

Mª Pilar Agustín-Llach 63
Lexical Transfer in the writing of Chinese learners of English

Marina Dodigovic, Chengchen Ma, Song Jing 75
Helping Language Learners Get Started with Concordancing

Stephen Jeaco 91
Self-assigned Ranking of L2 Vocabulary

Heidi Brumbaugh, Trude Heift 111
Recognition Vocabulary Knowledge and Intelligence as Predictors of

Academic Achievement in EFL Context
Ahmed Masrai, James Milton 128
Using Category Generation Tasks to Estimate Productive Vocabulary Size

in a Foreign Language
Shadan Roghani, James Milton 143
How General is the Vocabulary in a General English Language Textbook?

Hedy McGarrell, Nga Tuiet Zyong Nguien 160
A Corpus Comparison Approach for Estimating the Vocabulary Load of Medical

Textbooks Using The GSL, AWL, and EAP Science Lists
Betsy Quero 177
TESOL International Journal 1
Trends in Vocabulary Research

Marina Dodigovic*
American University of Armenia, Armenia
Stephen Jeaco
Xi’an Jiaotong-Liverpool University, China
Rining Wei
The Guiding Ideas

This issue is dedicated to vocabulary teaching, learning and assessment. The topic is timely, as vocabulary
research is one of the fastest growing areas in applied linguistics and TESOL. Some of its most prominent areas
are represented here, including vocabulary knowledge, the processes that lead to vocabulary learning, ways of
assessing aspects of vocabulary knowledge as well as lexical errors. Understanding how vocabulary works in
language use is one of the prerequisites for its successful instruction and acquisition. Hence, needs analysis
focused corpus research is also a major part of the issue’s brief. Last but not least, the issue highlights some
applications of technology in vocabulary instruction and learning. In the following, we brie'y discuss some of the
questions the contributors to this issue are trying to address. This is followed by brief summary reviews of the
papers themselves.
Key Topics in Vocabulary Research

Since “language ability is to quite a large extent a function of vocabulary size” (Alderson, 2005, p. 88), it is
imperative that language educators understand the processes that lead to vocabulary growth. The current issue
focuses on many aspects of the acquisition of vocabulary in an additional language. Additional language or L2
can be any language which is not the 2rst language of the learner, while its vocabulary can be acquired
incidentally or learned deliberately. Incidental vocabulary learning occurs while learners are focused on
something other than learning the vocabulary itself (Paribakht & Wesche, 1996). In addition to repeated
encounters with a word (Nation, 2006), such learning requires strategies of determining and consolidating the
meaning of new words (Schmitt, 1997), as well as some rehearsal, encoding and activation effort (Gu & Johnson,
1996). Therefore, reading, with or without the use of a dictionary (Gu & Johnson, 1996), may present ample
opportunities for incidental learning (Nation, 2006). In particular, reading-based and writing-intensive university
courses delivered in English could afford the ideal platform for incidental acquisition of general, technical and
subtechnical English vocabulary by students of non-English speaking backgrounds (Coxhead, 2000). While there
is evidence of current research interest in intentional learning of English vocabulary in additional language
contexts, incidental learning seems to be less represented and is mainly investigated in small-sample qualitative
studies (e.g. Song & Fox, 2008). Dodigovic, Ma and Jing (this issue) touch upon incidental vocabulary learning,
whereas Colovic-Markovic (this issue), Jeaco (this issue) as well as Jones and Waller (this issue) consider activities
directed toward deliberate vocabulary learning.
Although incidental vocabulary learning is an accepted concept, vocabulary is not acquired entirely by
chance. On the contrary, its acquisition is facilitated by certain learning strategies. Intake and subsequent
integration of new lexical knowledge normally require repeated input processing during multiple experiences
with a word (Nation, 2006). Hatch and Brown (1995) see the word-learning process as a “series of sieves”
*Tel.: +374 60612-740; Email: mdodigovic@aua.am; 40 Marshal Baghramyan Ave., Yerevan, 0019, Armenia
2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

through which a new word must pass as it gains entry into the learner’s lexicon (p. 373). According to Gu and
Johnson (1996), vocabulary learning strategies are classi2ed into four groups: metacognitive, cognitive, memory,
and activation strategies. Metacognitive strategies, include selective attention as well as self-initiation strategies,
while cognitive strategies include the use of dictionaries, guessing and note taking strategies. Memory strategies
consist of rehearsal and encoding strategies. Finally, activation strategies are those that learners utilize in order to
use the new words in various contexts. Schmitt (1997) classi2es vocabulary learning strategies into two groups.
The 2rst group determines the meaning of new vocabulary items which the learners face for the 2rst time, and
contains determination and social strategies. The second group, on the other hand, entails strategies which
consolidate the meaning of vocabulary items when encountered again by the learners. This group consists of
cognitive, metacognitive, memory, and social strategies. However, not much is known about the relative
frequency or effectiveness of each of the above strategies. Xu and Hsie (this issue) look into such strategies and
their representation.
There are two types of vocabulary knowledge: receptive and productive (Nation, 2006). Receptive
vocabulary enables the learner to comprehend readings or listenings. In this volume, Masrai and Milton (this
issue) discuss this aspect of vocabulary. Productive vocabulary, on the other hand, facilitates the productive skills
of speaking and writing. In addition to vocabulary size, which is expressed in the number of words a learner
knows, vocabulary is also measured in terms of depth (Beglar & Nation, 2007). Depth concerns everything a
learner knows about a word, including ways of spelling and pronouncing it, the sentence structure it requires, its
part of speech, the functions it can have in connected discourse, the contexts in which it can possibly occur, other
words that may accompany it, the idiomatic expressions it is known to build and the connotations it can have
(Folse, 2004). Brumbaugh and Heift (this issue) build on the concept of vocabulary depth. It is expected that in
productive skills, such as speaking and writing, a larger vocabulary size would have the effect of a greater lexical
range used, while a greater depth of vocabulary knowledge would result in a more accurate and skillful use of
vocabulary.
Tests such as Vocabulary Size Test (VST) are often used to measure the size of the learners’ vocabulary
(Beglar & Nation, 2007). This test has been speci2cally developed to “provide a reliable, accurate, and
comprehensive measure” (Beglar, 2010, p. 103) of L2 English learners’ receptive vocabulary in its written form,
including the 14,000 most frequent word families in English. Other such tests are described in this issue.
However, it is more dif2cult to measure vocabulary depth in relation to productive vocabulary size. This is further
discussed by Roghani and Milton (this issue).
Instances of language use lacking in accuracy, otherwise known as language errors, are signi2cant in three
respects: they inform the teacher about what should be taught; they inform the researcher about the course of
learning; they are outcomes of the learner’s target language hypothesis testing (James, 1998). The sources of
error are deemed to be the redundancy of code (intralingual), various sources of interference (interlingual) and
unsuitable presentation (George, 1972). Similarly, James (1998) distinguished between a slip, an odd mistake or a
systemic error. A slip is expected to result in self-correction, a mistake calls for feedback, while error requires full
correction of the erroneous utterance. In this volume, Augustin-Llach (this issue) examines a range of lexical
errors.
According to Cook and Singleton (2014), second language acquisition (SLA) is primarily concerned with the
interplay between a learner’s 2rst (L1) and an additional language (L2). Thus Li (2014) identi2es such an
interplay in the interlanguage of Chinese learners of English. According to Wang (2014), this is characterized by
the structural and lexical patterns of Chinese in the learner’s grammatical and lexical choices in English, which
are not necessarily transparent to other speakers of English, thus potentially obscuring comprehension. In
particular, lexis in L2 often adopts the L1 semantic features (Cook & Singleton, 2014). An example of this is a
Chinese student asking at the end of a presentation: “Do you have a problem?” The Chinese equivalent “ 问题
(wen ti)” means both a question and a problem. Collocations or multi-word units present another challenge for
L2 learners (Yamashita & Jiang, 2010). An example of the in'uence of L1 on collocations in English as L2 is “eat
medicine” (rather than “take medicine”), based on Chinese “ 吃药 (chi yao)”. These examples represent evidence
of subordinate bilingualism, which according to Cook and Singleton (2014) has its roots in translation as a

teaching/learning method. Dodigovic (2014) found that learners with limited vocabulary use bilingual
dictionaries with only one English translation equivalent, which also restricts the depth of their English
vocabulary (Schmitt, 2010). In line with this, Dodigovic, Ma and Jing (this issue) pursue such patterns in the
writing of Chinese learners of English.
Vocabulary is ideally suited to corpus linguistic approaches in research and teaching. The term corpus
commonly ‘‘refers to an electronic text’’ (Holmes, 1999, p. 241) and is often in fact a compilation of text samples
that one wants to examine for vocabulary use or other features. Special software is applied to 2nd out, for
example, which words or expressions are most frequently used by an author or a group of authors. Having a
corpus of authentic language data gives one the opportunity to either postulate very speci2c hypotheses or
identify patterns through corpus data analysis (Tognini-Bonelli, 2001). As a method, corpus linguistics allows for
a quantitative approach, in that it counts the occurrences of the examined linguistic phenomena.
When applied to language learning, this method can be very helpful. It can be used to gain a better
understanding of how the target student population uses language and what misconceptions the students might
have about the additional language they are learning. While some researchers prefer to pro2le the vocabulary
(Cobb, 2004) of either the learners or their learning resources, others use learner corpora to gain a better
understanding of learner errors (Granger, 2003). The latter include Dodigovic, Ma and Jing (this issue).
Furthermore, many have used target language corpora to teach language, a technology enhanced approach that
is sometimes called data-driven (Allan, 1999; Levy, 1997). This topic is also pursued by Jeaco (this issue).
Vocabulary is also important in educational needs analysis. Needs analysis refers to a procedure in
language planning (Nunan, 1988). This procedure serves three main purposes (Richards, 1984, cited in Nunan,
1988). Firstly, it can be used to obtain wider input into content, design and implementation of a language
programme. Secondly, it can be used to develop goals, objectives and content for a language programme. Finally,
it can provide data for programme evaluation. It can on the one hand be based on soft data and opinions or on
hard data, such as linguistic facts (Johns, 1997).
With its force of hard data evidence, the corpus approach is particularly useful in raising the teacher’s
awareness of their students’ learning needs, but it can also be used to demonstrate to the students and the
respective institution how their use of language differs from the targeted standard. Indeed, the level of
institutional language awareness can be raised to the point at which the institution becomes able to anticipate
learning problems and better facilitate teaching, learning and assessment. In particular, corpus analysis can help
institutions decide whether the teaching materials and methods used are conducive to learning success.
Technology plays a key role in making that hard evidence readily available. In this volume, Quero (this issue) as
well as McGarrell and Nguien (this issue) take this approach. Other uses of technology with respect to
vocabulary are described by Jeaco (this issue) as well as Brumbaugh and Heift (this issue).
Papers in This Issue

As can be seen from the discussion above, the contributions to this issue are interlinked through a multiplicity of
topics, so it was not easy to group them or decide in which order to present them. The current order follows to
some extent the topical development from the previous section of this paper, from learning strategies via
deliberate and incidental learning, lexical errors, receptive and productive vocabulary size and depth to corpus
based needs analysis and technology.
Xu and Hsu develop a new Inventory of Strategies for Vocabulary Learning (SIVL), which appears to be
more appropriate to the Chinese context. To validate their instrument their paper reports on con2rmatory and
exploratory factor analyses, and their 2ndings demonstrate the reliability and validity of SIVL as a research
instrument for assessing the strategy use of English language learners in this context at university level. Moreover,
they point to ways in which the SIVL could become a resource for raising the awareness of both language
learners and teachers of strategy use and strategy training, thereby strengthening vocabulary teaching and
learning.
Colovic-Markovic discusses the explicit instruction of formulaic language. The research on formulaic
language in L2 writing emphasizes the essential role of topic-induced word combinations (Erman, 2009). Her

study compares the improvements in productive use of target structures for a treatment group, who received
explicit instruction, and a contrast group. The results demonstrate the gains of explicit instruction for the
production of topic-induced phrases and the paper explores some of the attitudes of the language learners
through analysis of interviews.
Jones and Waller present a quasi-experimental study examining textual and aural input enhancement for
vocabulary teaching at an elementary level in a higher education context. The enhancements provided for the
treatment group consisted of the bolding of target words in a menu and three repetitions of the modeling of the
words by the teacher. Their results demonstrate some clear bene2ts of both kinds of enhancement when
teaching lexis.
Augustin-Llach takes the evidence of lexical errors for a theoretical exploration of EFL vocabulary teaching,
reviewing previous research and suggesting new ways to engage pedagogically with lexical errors. By drawing on
a solid research base, the fusion of analyses from different studies in this important area leads directly into some
practical implications and calls for broader appreciation of the need for explicit vocabulary instruction through a
range of approaches.
Dodigovic, Ma and Jing reveal insights into 2rst language (L1) lexical transfer within the context of L1
Chinese learners of English through analysis of individual words, collocations and multi-word units. In a cross-
sectional study of written work from university students, they demonstrate that the most frequent cause of errors
is the L1 polysemy of individual words, with multi-word units (MWU) and collocation errors following after.
They also 2nd a slight but not statistically signi2cant drop in the frequency of lexical transfer errors in the more
advanced learner group in all three of these areas.
Jeaco discusses the use of corpora in vocabulary learning and reports on an evaluation of a concordancing
tool which was designed for English language learners and teachers. The software tool, called The Prime Machine
(Jeaco, 2015), includes support features for conducting searches on vocabulary and language patterns,
encouraging language discovery processes for the comparison of speci2c words and collocations. This paper
introduces some of the pedagogical perspectives on the software design, and reports on the positive reception of
the software from students with little or no prior experience in concordancing work.
Brumbaugh and Heift present an empirical investigation into the use of a Computer Assisted Language
Learning (CALL) tool for the assessment of the depth of vocabulary knowledge of intermediate L2 English
learners. The study introduces the design and use of Bricklayer and the 2ndings provide evidence of the validity of
this assessment tool, and the paper explains how such an approach strengthens models of both knowledge and
behavior for CALL adaptive systems.
Masrai and Milton’s paper explores predictors of academic achievement, building on work on general and
academic vocabulary knowledge (Townsend et al., 2012) and general intelligence (Laidra et al., 2007). Their
examination of these and additional factors adds to a predictive model, drawing on L1 vocabulary knowledge,
L2 general and academic vocabulary knowledge, and intelligence (IQ). They demonstrate the way in which each
element in the model makes unique contributions, and how the four elements explain different aspects of
variance in the academic achievement data.
Roghani and Milton investigate the usefulness and effectiveness of a category generation task for productive
vocabulary size assessment. For the assessment, learners would be asked to make a list of words within a speci2c
category and be asked to list words. The resulting list of words can then be compared with receptive vocabulary
size estimates. Through analysis of results from learners at different levels of performance, and comparison with
two standardised tests, they demonstrate that the category generation tasks are reliable and effective.
McGarrell and Nguien tackle the question of optimal language input for institutional contexts where
textbooks form the basis for instruction. They present an analysis of lexical bundles in a popular textbook of
General English, comparing these with frequently occurring lexical bundles in corpora. The analysis looks at the
functions of the lexical bundles covered and their usefulness . Their 2ndings demonstrate limitations in the
usefulness of the lexical bundles in the textbook, and the authors argue for more attention to be paid to lexical
bundles in language teaching and materials development.

Last but not least, Quero reports on a subject-speci2c study into the vocabulary load of English medical
handbooks, considering the lexical demands in terms of the number of words needed for comprehension of
medical texts. The study used a corpus approach, drawing on existing word lists and making comparisons
between the medical text corpus and a corpus built from seven general English corpora. The results provide
insights into the vocabulary needs of medical students and health professionals, with a long list of subject-speci2c
(medical) words having been generated through this approach.
Conclusion
This issue covers a range of topics related to teaching, researching, learning and assessing vocabulary in an
additional language. Each of the papers furthers our understanding of issues such as incidental and deliberate
vocabulary learning in terms of vocabulary depth or size, and each considers their roles in areas such as
academic success, teaching of lexical phrases and their representation in textbooks as well as the vocabulary
required to succeed in certain academic disciplines. The editors are con2dent that each reader will be able to
identify at least some points of relevance in relation to their own research or practice.
References
Alderson, J. C. (2005). Diagnosing foreign language pro7ciency. London: Continuum.
Allan, M. (1999). Language awareness and the support role of technology. In R. Debski and M. Levy (Eds.)
WORLDCALL. Global Perspectives on Computer-Assisted Language Learning (pp. 303 – 318). Lisse: Swets &
Zeitlinger.
Beglar, D. (2010). A Rasch-based validation of the vocabulary size test. Language Testing, 27 (1), 101-118.
Beglar, D. & Nation, P. (2007). Vocabulary size Test.
Cobb, T. (2004). The Compleat Lexical Tutor, www.lextutor.ca, Acessed 4 June 2004.
Cook, V. & Singleton, D. (2014). Key Topics in Second Language Acquisition. Bristol: Multilingual Matters.
Coxhead, A. (2000). The academic word list. TESOL Quarterly. 34 (2), 213 – 238.
Dodigovic, M. (2014). Strategies used by successful learners within the context of incidental vocabulary
acquisition in an additional language. RDF Project Completion Report, XJTLU.
Erman, B. (2009). Formulaic Language from a learner perspective: What the learner needs to know. In R.
Corrigan, E. A. Moravcssik, H. Ouali, and K.M. Wheatley (Eds.), Formulaic Language, Volume 2, (pp. 323-
346) Philadelphia: John Benjamin Publishing Company.
Folse, K. (2004). Vocabulary myths: Applying second language research to classroom teaching. Ann Arbour: University of
Michigan Press.
George, H. (1972) Common Errors in Language Learning. Rowley: Newbury House.
Granger, S. (2003). Error-tagged learner corpora and CALL: A promising synergy, CALICO Journal 20 (3), 465 –
480.
Gu, Y., & Johnson, R. K. (1996). Vocabulary learning strategies and language learning outcomes. Language
Learning, 46, 643-679.
Hatch, E. & C. Brown. (1995). Vocabulary, Semantics and Language Education. New York: Cambridge University Press.
Holmes, G. (1999). Corpus CALL: Corpora in language and literature. In K. Cameron, (Ed.) CALL: Media, design
and applications (pp. 239 – 270). Lisse: Swets and Zeitlinger.
James, C. (1998). Errors in language learning and use: Exploring error analysis. Routledge.
Johns, A. M. (1997) Text, Role and Context. Developing Academic Literacies. New Yourk: Cambrige University Press.
Levy, M. (1997). Theory-driven CALL and the Development Process. Computer Assisted Language Learning, 10 (1),
41-56
Li, W. (2014). New Chinglish: Translanguaging Creativity and Criticality. Keynote speech, AILA World Congress
2014, Brisbane.
Nation, I. S. P. (2006). Language education - vocabulary. In K. Brown (ed.) Encyclopaedia of Language and Linguistics,
2nd Ed. Oxford: Elsevier. Vol 6: 494-499.
Nunan, D. (1988). The Learner-Centred Curriculum. Cambridge: Cambridge University Press.

Paribakht, T.S. & Wesche, M. (1996). Enhancing Vocabulary Acquisition Through Reading: A Hierarchy of
Text-Related Exercise Types. The Canadian Modern Language Review, 52(2), 155-178.
Schmitt, N. (1997). Vocabulary learning strategies. In Schmitt & McCarthy 1997:199-227.
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. London: Palgrave Macmillan.
Song Y. & Fox, R. (2008). Integrating Incidental Vocabulary Learning Using PDAs into Academic Studies:
Undergraduate Student Experiences. Lecture Notes in Computer Science 5169:2008, 238 – 249.
Tognini-Bonelli, E. (2001). Corpus linguistics at work. Amsterdam: John Benjamins.
Townsend, D., Filippini, A., Collins, P., & Biancarosa, G. (2012). Evidence for the importance of academic word
knowledge for the academic achievement of diverse middle school students. The Elementary School Journal,
112(3), 497-518.
Wang, Y. (2014). Chinese Speakers’ Attitudes Towards Their Own English: ELF or Interlanguage. Teaching
English in China, 5, 7 – 12.
Yamashita, J. & Jiang, N. (2010). L1 In'uence on the Acquisition of L2 Collocations: Japanese ESL Users and
EFL Learners Acquiring English Collocations. TESOL Quarterly, 44 (4), 647 – 668.
About the authors

Marina Dodigovic, PhD, is a professor at the American University of Armenia. Previously, she served as the
director of both the MA TESOL program and the Research Centre for Language Technology at Xi’an Jiaotong-
Liverpool Uuniversity. Her research interests have gravitated toward vocabulary teaching, learning and
assessment.
Stephen Jeaco, PhD, is an Associate Professor at Xi'an Jiaotong-Liverpool University. He has worked in China
since 1999 in the 2elds of EAP, linguistics and TESOL. His PhD was supervised by Professor Michael Hoey and
focused on developing a user-friendly corpus tool based on the theory of Lexical Priming.
Rining WEI (Tony), PhD, is a Lecturer in the Department of English, Xi'an Jiaotong-Liverpool University,
Suzhou, China. His areas of research include argumentation, TESOL, and quantitative research methods. His
papers have appeared in journals including English Today, Asian EFL Journal, Journal of Multilingual and Multicultural
Development, and World Englishes.

A new inventory of vocabulary learning strategy for

Chinese tertiary EFL learners
Xuelian Xu*
Wen-Cheng Hsu**
Abstract
The past three decades have witnessed a boost of interest in vocabulary learning in EFL contexts since Meara (1980)
identi#ed it as ‘a neglected aspect of language learning’ (p. 221). A mushrooming amount of literature has emerged in
various aspects of vocabulary and its acquisition (e.g., Carter, 1998; Coady & Huckin, 1997; Manyak, 2010; Meara, 1995,
2005; Nation, 1990, 2006; Read, 2000; Schmitt, 2000; Schmitt & McCarthy, 1997). With a movement from teaching-
orientedness to learner-certeredness and learner autonomy, vocabulary learning strategies seem to have gained its legitimacy
as one auxiliary approach to vocabulary learning. Despite this, there appears no satisfactory instrument particularly for
assessing vocabulary learning strategy use in an EFL context, although a few researchers have tried to do so (e.g., Gu &
Johnson, 1996; Schmitt, 1997). To this aim, a new inventory for vocabulary learning, the Strategies Inventory for Vocabulary
Learning (SIVL) was proposed for Chinese EFL university learners. To validate the instrument, con#rmatory and
exploratory factor analyses were employed to assess its psychometric properties. Results showed that the hypothesized
theoretical model proved to be a good representation of the sample data, and that the SIVL exhibited satisfactory
psychometric features. This positive evidence indicates that the SIVL can serve as a reliable and valid research instrument
for assessing Chinese EFL university learners’ vocabulary learning strategy use. It is suggested that the SIVL can be a
valuable resource for EFL learners and practitioners in that it can raise their awareness of strategy use and strategy training
by employing this instrument, leading to more successful vocabulary teaching and learning.
Key words ： Vocabulary learning, Learning strategies, Vocabulary learning strategies, Strategy classi#cation,
Strategy inventory, Factor analysis
* Tel: (+86) 512 88161328. Email: xuelian.xu@xjtlu.edu.cn. Language Centre, Xi’an Jiaotong-Liverpool University, No. 111
Ren’ai Road, HET, SIP, Suzhou, Jiangsu Province, P R China 215123
** Tel: (+86) 512 88161144. Email: wencheng.hsu@xjtlu.edu.cn. Language Centre, Xi’an Jiaotong-Liverpool University, No.
111 Ren’ai Road, HET, SIP, Suzhou, Jiangsu Province, P R China 215123

Introduction
To date, vocabulary learning strategies (VLS) have drawn increasing attention as one auxiliary approach to
vocabulary learning, with a movement from teaching-orientedness to learner-centredness and learner autonomy
thanks to the complexities of the processing of word knowledge and the range of factors involved in knowing,
processing, storing, and applying a word (Carter, 1998), which entails varying strategies. VLS is even more
important ‘because of the large number of low-frequency words and because of their infrequent occurrence and
narrow range, it is best to teach learners strategies for dealing with these words rather than to teach the words
themselves’ (Nation, 1990, p. 159). However, foreign language classrooms are always notorious for their precious
classroom teaching time, and it is impossible to teach everything about a word that students must become
independent word learners (Waring, 2002). The use of VLS can help students to deal with their vocabulary
learning independently. Schmitt (2000) claims that, in contrast to language tasks that involve several linguistic
skills, many learners do seem to use strategies for their vocabulary learning possibly due to the fact that the
‘relatively discrete’ nature of vocabulary learning compared to ‘more integrated’ language activities makes it
easier to utilize strategies effectively. In addition, Nation and Newton (1997) point out that, ‘[t]ime may be set
aside for the learning of strategies and learners’ mastery of strategies may be monitored and assessed’ (p. 241).
VLS has thus become essential inside and outside the classroom.
Theory and practice of VLS mainly stem from language learning strategies (LLS). The earlier literature in
SLA usually assumes strategies as a cognitive learning process (e.g., O'Malley & Chamot, 1990), while scholars in
educational psychology regard strategies from the social cognitive point of view which stresses metacognitive,
affective and social domains (e.g. Schunk, 2001; Zimmerman, 1989, 2000). In recent decades, a few in SLA
attempt to look at strategies from a volitional perspective focusing on metacognitive and affective domains (e.g.
Dörnyei, 2005; Tseng, Dörnyei, & Schmitt, 2006). Although different scholars claimed that they have their own
theoretical underpinnings, a consideration of metacognitive, cognitive and social cognitive perspectives can offer
a more holistic picture of VLS. It is under this proposition that an inventory that can tap into all the phases of
vocabulary learning strategies can be produced.
Vocabulary Learning Strategy Classi#cation

Numerous studies have sought to classify learning strategies, in the #eld of general language learning strategies
(LLS), the most prominent typologies are still O’Malley and Chamot’s (1990) and Oxford’s (1990) although a
good number of taxonomies of VLS have been proposed, most of which can be seen as part of a study into
learners’ strategy use (e.g., Gu & Johnson, 1996; Schmitt, 1997; Stoffer, 1995). Four taxonomies closely related to
this study, e.g., O’Malley and Chamot’s (1990), Oxford’s (1990), Schmitt’s (1997), and Stoffer’s (1995), will be
discussed:
Based on cognitive psychology, O’Malley and Chamot (1990) developed a taxonomy involving three broad
types of strategies: Metacognitive Strategies, Cognitive Strategies, and Social/affective Strategies. Metacognitive
Strategies are higher order executive skills, using knowledge about cognitive processes and involving an attempt
to regulate language learning by way of planning, monitoring, and evaluating. Cognitive Strategies are those
‘operat[ing] directly on incoming information, manipulating it in ways that enhance learning’ (O'Malley &
Chamot, 1990, p. 44). Social/Affective (or Socio-affective) Strategies refer to the ways in which learners choose to

interact with others or ideationally control over affect. The three types are further categorised into several
subgroups respectively. Metacognitive Strategies include four subgroups, which are de#ned as below:
 Selective attention: Focusing on special aspects of learning tasks, as in planning to listen for key words or
phrases.
 Planning: planning for the organisation of either written or spoken discourse.
 Monitoring: reviewing attention to a task, comprehension of information that should be remembered, or
production while it is occurring.
 Evaluation: checking comprehension after completion of a receptive language activity, or evaluating language
production after it has taken place (O’Malley & Chamot, 1990, p. 46)
Cognitive Strategies are divided into the following eight subgroups:

 Rehearsal: repeating the names of items or objects to be remembered.
 Organization: grouping and classifying words, terminology, or concepts according to their semantic or
syntactic attributes.
 Inferencing: using information in text to guess meanings of new linguistic items, predict outcomes, or
complete missing parts.
 Summarising: intermittently synthesising what one has heard to ensure the information has been retained.
 Deducing: applying rules to the understanding of language.
 Imagery: using visual images (either generated or actual) to understand and remember new verbal
information.
 Transfer: using known linguistic information to facilitate a new learning task.
 Elaboration: linking ideas contained in new information, or integrating new ideas with known information
(O’Malley & Chamot, 1990, p. 46).
Social/affective Strategies involve three subcategories:

 Cooperation: working with peers to solve a problem, pool information, check notes, or get feedback on a
learning activity.
 Questioning for clari#cation: eliciting from a teacher or peer additional explanation, rephrasing, or examples.
 Self-talk: using mental direction of thinking to assure oneself that a learning activity will be successful or to
reduce anxiety about a task (O’Malley & Chamot, 1990, p. 46).
Another inMuential classi#cation of LLS is Oxford’s (1990) system (Table 1), which is divided into Direct
Strategies for handling the target language and Indirect Strategies for generally managing the learning of the
target language. The former is composed of Memory Strategies, Cognitive Strategies, and Compensation
Strategies. The latter includes Metacognitive Strategies，Affective Strategies and Social Strategies.

Table 1
Oxford’s (1990) System
Learning Strategies
General Speci#c
Direct Memory Creating mental linkages Grouping
Strategies Strategies Associating/elaborating
Placing new words into a context
Applying images and sounds Using imagery
Semantic mapping
Using keywords
Representing sounds in memory
Reviewing well Structured reviewing
Employing action Using physical response or sensation
Using mechanical techniques
Cognitive Practising Repeating
Strategies Formally practicing with sounds and writing systems
Recognising and using formulas and patterns
Recombining
Practicing naturalistically
Receiving and sending messages Getting the idea quickly
Using resources for receiving and sending messages
Analysing and reasoning Reasoning deductively
Analysing expressions
Analysing contrastively (across languages)
Translating
Transferring
Creating structure for input and Taking notes
output Summarising
Highlighting
Compensation Guessing intelligently Using linguistic clues
Strategies Using other clues
Overcoming limitations in speaking Switching to the mother tongue
and writing Getting help
Using mime or gesture
Avoiding communication partially or totally
Selecting the topic
Adjusting or approximating the message
Coining words
Using a circumlocution or synonym

Indirect Metacognitive Centring your learning Overviewing and linking with already known material
Strategies Strategies Paying attention
Delaying speech production to focus on listening
Arranging and planning your Finding out about language learning
learning Organising
Setting goals and objectives
Identifying the purpose of a language task
Planning for a language task
Seeking practice opportunities
Evaluating your learning Self-monitoring
Self-evaluating
Affective Lowering your anxiety Using progressive relaxation, deep breathing, or
Strategies meditation
Using music
Using laughter
Encouraging yourself Making positive statements
Taking risks wisely
Rewarding yourself
Taking your emotional temperature Listening to your body
Using a checklist
Writing a language learning diary
Discussing your feelings with someone else
Social Strategies Asking questions Asking for clari#cation or veri#cation
Asking for correction
Cooperating with others Cooperating with peers
Cooperating with pro#cient users of the new language
Empathising with others Developing cultural understanding
Becoming aware of others’ thoughts and feelings
From the above we can clearly see that there exists a substantial amount of overlap between the two LLS
classi#cation systems. First, O’Malley and Chamot’s (1990) Metacognitive Strategies have a straight counterpart
in Oxford’s (1990) system. This category generally functions as planning, organising, and evaluating one’s own
language learning. Second, both systems involve strategies handling affect and social interaction. Affective
Strategies are techniques for learners to manage their emotional and motivational states, and Social Strategies
techniques about learning the target language with other people. O’Malley and Chamot classify Affective
Strategies and Social Strategies as one single type: Socio-affective Strategies, whereas Oxford categorises them as
separate groups and lists a lot more affective and social strategies than O’Malley and Chamot. Third, O’Malley
and Chamot’s Cognitive Strategies roughly match a combination of Oxford’s Memory Strategies and Cognitive
Strategies, with an exception of ‘guessing from context (inferencing)’, which is part of O’Malley and Chamot’s

cognitive category but is listed by Oxford as a compensation strategy to make up for missing knowledge. Unlike
O’Malley and Chamot, Oxford intentionally divides Memory Strategies off from Cognitive Strategies as
‘Memory Strategies appear to have a very clear, speci#c function that distinguishes them from many Cognitive
Strategies’ (Hsiao & Oxford, 2002, p. 371). In other words, although Memory Strategies assist cognition in
nature, the operations referred as Memory Strategies are particular mnemonic devices helping learners store and
transfer information to long-term memory and retrieve it whenever necessary. Most Memory Strategies tend to
be associated with shallow processing while Cognitive Strategies tend to make a contribution to deep processing
(Dörnyei, 2005). Lastly, Oxford classi#es Compensation Strategies as a separate category because she seems to
believe that it is essential to make up for missing knowledge in any of the four language skills: listening, reading,
speaking, or writing. This category is intended to enable learners to use the target language for either
comprehension (i.e., listening, reading) or production (i.e., speaking, writing) in spite of the missing knowledge.
In the more speci#c area of VLS, some researchers have also sought to develop a vocabulary-speci#c strategy
classi#cation system. There are two main typologies: Schmitt’s (1997) and Stoffer’s (1995). Schmitt (1997) claims
that Oxford’s classi#cation system is generally suitable for VLS but still not satisfactory in a number of aspects:
1. No category in Oxford’s system satisfactorily depicts the type of strategies employed by an individual
learner when he/she is faced with discovering a new word’s meaning without others’ help;
2. In Oxford’s system, it seems dif#cult to classify some strategies which could easily #t into two or more
groups;
3. In Oxford’s system, it remains unclear whether some strategies should be categorised as Memory Strategies
or Cognitive Strategies.
Therefore, Schmitt (1997) offers a vocabulary-speci#c strategy classi#cation system by grouping VLS into
two broad categories: Discovery Strategies, i.e., strategies for the discovery of a new word’s meaning, and
Consolidation Strategies, i.e., strategies for consolidating a word once it has been encountered. The former
category involves two subcategories: Determination Strategies and Social Strategies. The latter group includes
Social Strategies, Memory Strategies, Cognitive Strategies, and Metacognitive Strategies. Schmitt also stresses
that the goal of both Cognitive Strategies and Memory Strategies is to aid recall of words through some form of
language manipulation, and more criteria should be used to separate Memory Strategies from Cognitive
Strategies. Therefore, he adopted the #ve areas of storing and memory strategies of Purpura’s (1994, cited in
Schmitt, 1997, pp. 205-206) as the other criteria, namely, Repeating, Using mechanical means, Associating,
Linking with prior knowledge, and Using imagery.
Schmitt’s (1997) system seems to be the most comprehensive VLS taxonomy to date, and is a useful attempt
to display where general LLS and VLS intersect. However, Schmitt’s (1997) system still has its weaknesses. Firstly,
it does not include affective strategies. Secondly, a number of items fall into more than one subcategory; for
instance, ‘Mashcards’ is grouped into both Determination Strategies and Cognitive Strategies. This would cause
confusion in de#ning and classifying strategy categories. Lastly, there is no clear-cut distinction between
Discovery and Consolidation strategies.
While the systems discussed above are all based on theoretical induction, Stoffer’s (1995) study attempted to
categorise strategies from an empirical study of her own. She developed an inventory of nine categories from the

analysis of the data from a self-composed 53-item vocabulary learning strategy questionnaires. The nine factors
resulting from a factor analysis are listed below:
1. Strategies involving authentic language use
2. Strategies involving creative activities
3. Strategies used for self-motivation
4. Strategies used to create mental linkages
5. Memory strategies
6. Visual/auditory strategies
7. Strategies involving physical action
8. Strategies used to overcome anxiety
9. Strategies used to organize words
However, such classi#cation tends to result in an unidenti#able group of strategies in each factor. For
example, Item 13 ‘Use rhymes to remember new words’ falls into three factors: 5, 6 and 7. Item 18 ‘Break lists
into smaller parts’ falls into both factors 5 and 9.
Considering all the strengths and limitations existing in the above classi#cation systems, we developed an all-
encompassing inventory.
Classi#cation of VLS in This Study

We classify VLS into 4 broad categories and 25 subcategories (Table 2). The four main strategies are
Metacognitive Strategies, Cognitive Strategies, Memory Strategies and Socio-affective Strategies. Metacognitive
Strategies in our classi#cation stemming from O’Malley and Chamot (1990) refer to as higher order executive
skills, using knowledge about cognitive processes and involving an attempt to regulate language learning by way
of planning, monitoring, and evaluating. They include three subgroups: Paying Attention, Arranging and
Planning, and Monitoring and Evaluation. As vocabulary learning is closely linked to the mechanism of memory,
Memory Strategies are isolated from Cognitive Strategies. The de#nition of Cognitive Strategies in our
classi#cation is adapted from Schmitt (1997), namely, approaches ‘not so focused on manipulative mental
processing, including guessing, ‘repetition and using mechanical means to study vocabulary.’ Cognitive Strategies
thus can be further grouped into Guessing, Using Dictionaries, Using Study Aids, Taking Notes, Repetition,
Word Lists, and Activation. Memory Strategies are referred to as approaches associating new words to existing
knowledge (Schmitt, 1997), including Grouping, Word Structure, Association/Elaboration, Imagery, Visual
Encoding, Auditory Encoding, Semantic Encoding, Contextual Encoding, Structured Reviewing, Using
Keywords, Paraphrasing, and Physical Action.

Table 2
Classi:cation of VLS in This Study
Categories Subcategories Descriptions/De#nitions
Metacognitive Paying Attention Deciding in advance to pay attention in general to a vocabulary learning
Strategies task and to ignore distractions by directed attention, and/or to pay
(MET) attention to speci#c aspects of vocabulary learning tasks or to situational
details.
Arranging & Planning Involving #nding out about vocabulary learning, organising the schedule,
setting goals and objectives, considering task purposes, planning for tasks,
and seeking chances to practise words.
Monitoring & Evaluation Identifying errors in understanding or producing the new word, tracking
the source of important errors, trying to eliminate such errors, and
evaluating one’s own progress in vocabulary learning.
Cognitive Guessing Seeking and using linguistic or other (i.e., background knowledge) clues in
Strategies order to guess the meaning of a new word.
(COG) Using Dictionaries Using dictionaries as a resource to #nd out the meaning and use of a new
word, and ways of looking up a word in the dictionary.
Using Study Aids Using resources other than dictionaries to help learn or practise new words.
Taking Notes Putting synonyms or antonyms together in the notebook, or writing down
the meaning of vocabulary when it is thought to be commonly used or
interesting, when it is looked up in the dictionary, or when it can help
distinguish between the meanings of words.
Repetition Saying, listening to, or writing a new word over and over.
Word Lists Using word lists and Mashcards for the initial exposure to a word and
reviewing it afterwards.
Activation Practising new words in listening, speaking, reading and writing, and
practising new words in imaginary/realistic settings.
Memory Grouping Classifying words based on topic, type of word, practical function,
Strategies similarity and opposition, etc.
(MEM) Association/Elaboration Relating new words to known words or concepts, or relating one piece of
information to another, to create associations in memory.
Word Structure Structurally analysing a new word to determine or consolidate its meaning.
Imagery Relating new words to concepts in memory by means of meaningful visual

imagery, either in the mind or in an actual drawing. The image can be a
picture of a word, a set of locations for remembering a sequence of words,
or a mental representation of the letters of a word.
Visual Encoding Using visual cues to facilitate recall, typically the words’ orthographical
form.

Auditory Encoding Representing a new word’s phonological form to facilitate recall by creating
a meaningful, sound-based association between new words and known
words, using phonetic spelling, and using rhymes.
Semantic Encoding Producing semantic networks or grids to remember words.
Contextual Encoding Memorising new words in a context.
Structured Reviewing Going over new words soon after the initial meeting, and then at carefully
planned intervals.
Using Keywords Remembering a new word by using auditory and visual links.
Paraphrasing Reformulating a word’s meaning to improve recall of the word.
Physical Action Physically acting out a new word, or meaningfully relating a new word to a
physical feeling or sensation.
Socio-affective Questioning for Asking others to explain, paraphrase, correct, or give examples.
Strategies Clari#cation/Correction
(SOC) Cooperation Working with peers or pro#cient English users inside and/or outside class.
Managing Emotion Relaxing, encouraging and rewarding oneself, paying attention to signals
given by body, and discussing feelings with someone else.
Socio-affective strategies relate to social and affective domains. They are interrelated and complementary
and are not mutually exclusive, as for language learners, especially Chinese EFL learners, who are inclined to be
shy and reticent, while using social strategies like ‘I interact with native speakers’, tend to use strategies relating to
affect or emotion, like ‘I try to relax whenever I am afraid of using a word.’ at the same time, to keep the
conversation going. This is the reason why the two dimensions are classi#ed into one single group of strategies.
Socio-affective Strategies are thus referred to as the ways in which learners choose to interact with others or
ideationally control over affect (O'Malley & Chamot, 1990). They include Questioning for
Clari#cation/Correction, Cooperation, and Managing Emotion.
As a result, the four strategy categories are further divided into 25 subcategories. The four categories and
their subcategories underlay the basic framework for the SIVL.
Assessment of Vocabulary Learning Strategies

Questionnaires are the most commonly-used tools to assess strategy use. One easy way to compose
questionnaires is to transform an existing taxonomy into self-reported questionnaires. Five self-reported
questionnaires related to the present study are discussed: 1) Oxford’s (1990) Strategy Inventory for Language
Learning (SILL), 2) Stoffer’s (1995) Vocabulary Learning Strategies Inventory (VOLSI), 3) Schmitt’s (1997) VLS
List and Kudo’s (1999) VLS Questionnaire, 4) Gu and Johnson’s (1996) Vocabulary Learning Questionnaire and
5) Tseng, Dornyei and Schmitt’s (2006) Self-regulating Capacity in Vocabulary Learning.

Oxford’s (1990) SILL (Version for Speakers of Other Language Learning English)
Oxford developed the SILL based on her strategy taxonomy. This has been the most popular and practical
instrument for assessing language learning strategy use in different cultural ESL/EFL contexts. The SILL is a 5-
point Likert-type scale containing 50 individual items, divided into six parts as discussed above. With regard to
the psychometric properties of the instrument, much evidence has shown that the SILL appears to have utility,
reliability and validity in varying EFL contexts. The SILL proved to be very useful particularly in EFL
classrooms, with the main goal of revealing the relationship between strategy use and language performance,
between strategy use and individual differences such as gender, motivation, learning styles, etc. (Oxford & Burry-
Stock, 1995). Dörnyei (2005) also admits that the SILL is ‘a useful instrument for raising student awareness of L2
learning strategies and for initiating class discussions’ (p. 183). In addition, the reliability of the SILL has been
checked across many cultural groups. For example, in the Taiwanese/Chinese EFL context, the SILL has
obtained a high reliability coef#cient (Cronbach alpha) of .94 (Yang, 1999). As for the criterion-related and
construct validities of the SILL, there is considerable evidence mainly based on ‘its predictive and correlative link
with language performance (course grades, standardised test scores, ratings of pro#ciency), as well as its
con#rmed relationship to sensory preferences’ (Oxford & Burry-Stock, 1995, p. 1).
Stoffer’s (1995) VOLSI

According to Stoffer (1995), the VOLSI is a 53-item 5-point Likert-type scale and was validated with high
reliability (Cronbach alpha = .90). Evidence for its content-related validity could be assumed in that the 53 items
on the VOLSI evolved directly from the literature, and were reviewed by several experts in SLA. Evidence for its
construct validity was also provided: (1) the spectrum of the item-to-total correlations of the VOLSI reached
from .19 to .58, with the majority of items in the area of moderate to high correlations (.30 to .50); (2) the
correlation coef#cient between the VOLSI and the SILL was .70, revealing a high relationship between the two
instruments; while the former measures vocabulary learning strategies, a more specialised area, the latter assesses
general language learning strategies. However, the VOLSI does not seem to have a solid theoretical support,
although this would not affect the usefulness of the instrument pedagogically. It would be hard for researchers to
interpret the results and then make implications for future research.
Schmitt’s (1997) list of VLS and Kudo’s (1999) VLS questionnaire

Schmitt’s (1997) list of vocabulary learning strategies contains 58 items; it however has not been validated. Based
on Schmitt’s list, Kudo (1999) developed a 44-item six-point Likert scale of vocabulary learning strategy
questionnaire, divided into four categories, i.e., Memory Strategies, Cognitive Strategies, Metacognitive
Strategies, and Social Strategies. The reliability coef#cients (Cronbach alpha) of the four strategy categories are
relatively high, ranging from .69 to .77, thus the reliability being established. Although results from a factor
analysis turned out to be consistent with Oxford’s classi#cation system, Schmitt’s (1997) VLS list and Kudo’s
(1999) questionnaire did not include any affective strategies.
Gu and Johnson’s (1996) VLQ Version 3, Section 3 – Vocabulary Learning Strategies

Gu and Johnson (1996) used a vocabulary learning questionnaire to elicit Chinese university non-English majors’
beliefs about vocabulary learning and their vocabulary learning strategies. The Vocabulary Learning Strategies

section includes 91 items, divided into two broad categories: Metacognitive Regulation, and Cognitive Strategies.
Metacognitive Regulation was further categorised into Selective Attention (7 items) and Self-initiation (5 items).
Cognitive Strategies were further categorised into 6 main groups. The internal consistency reliabilities of the
majority of the categories and subcategories in the Vocabulary Learning Strategies section were over .60
(Cronbach alpha), as suggested by Dörnyei (2003). Evidence for the validity of the instrument could be assumed
to some extent due to the fact that ‘the questionnaire, written in Chinese, reMected previous quantitative and
qualitative research (e.g., Ahmed, 1989; Oxford, 1990; Politzer & McGroarty, 1985) and item analyses that
removed redundant items from two earlier pilot versions’ (Gu & Johnson, 1996, p.648). However, it should be
noted that although this questionnaire was developed particularly for Chinese university EFL learners, it has
lacked items related to social or affective strategies.
Tseng, Dornyei and Schmitt’s (2006) Self-regulating Capacity in Vocabulary Learning

Tseng, Dornyei and Schmitt (2006) proposed a 20-item questionnaire, Self-regulating Capacity in Vocabulary
Learning, as a new approach to assessing strategic learning. This questionnaire is based on a theoretical
framework which views strategic learning from a volitional/motivational perspective, thus unavoidably leading to
focusing more on the affective domain, which relatively ignores the cognitive aspect. Besides, vocabulary learning
is a multi-dimensional process, involving a great number of intertwining factors, and accounting for the process
of vocabulary learning from all kinds of theoretical perspectives seems too dif#cult. Therefore, it would seem
understandable and natural for Tseng et al. (2006) to see strategic learning from a volitional point of view. Given
the limitations of the above empirical studies, the SIVL for this research was thus developed.
Compilation of SIVL and Initial Validation

Procedures for Compiling the SIVL
In the process of compiling questionnaire items, DeVellis’s (1991) guidelines for scale development were
incorporated in constructing the SIVL. We started generating items without any number limitations (Dörnyei,
2003, p. 51). Consequently, a total of 170 statements were initially constructed for the SIVL, 29 under
Metacognitive Strategies, 69 under Cognitive Strategies, 56 under Memory Strategies, and 16 under Socio-
affective Strategies. A 5-point Likert scale, the most commonly-used scale for self-reported questionnaire was also
adopted, ranging from 1) Never, (2) Seldom, (3) Occasionally, (4) Often, to (5) Always. Another reason for
employing such a method is that it is a method of analysis closer to the raw data than comparisons based on
average responses for each item. All of the items except one (which we invented) were from existing established
questionnaires, e.g., 89 items from Gu and Johnson’s (1996), 29 from Stoffer’s (1995), 35 adapted from Schmitt’s
(1997), 8 from Kudo’s (1999), 8 from or adapted from Oxford’s (1990) SILL. These 170 statements were
repeatedly revised regarding eight aspects, i.e., wording, repetition of the meanings between items,
appropriateness of the content of each heading, rare strategies, Chinese as the #rst language, adult EFL learners,
all items being stated positively, and consistency of the level of speci#city of items throughout the inventory. An
expert panel was then employed to ensure the content reliability of both English and Chinese versions.
Consequently, the 110-item SIVL emerged.

Re#ning and Shortening the 110-item SIVL

To develop a shorter version of the SIVL and assess its reliability and validity, a pilot study was conducted.
Procedure
The 110-item SIVL was administered to 125 randomly-selected undergraduates at a Chinese university. After an
initial elimination of unusable data, 107 valid cases remained, including 59 males and 48 females. Most students
spent about 20 minutes #nishing the questionnaires.
Three statistical methods were employed to reduce the strategy items. i.e., item analysis by using reliability
procedure, descriptive analysis, and correlation analysis. First, an item analysis for a single construct was
conducted. Items whose item-to-total correlations were less than 0.30 were considered to be removed, as
according to Denscombe (2003, p. 263), any correlation coef#cient between 0.30 and 0.70 (plus or minus) is
generally regarded as a reasonable correlation between two variables. Descriptive statistics were then used to
obtain the means of the remaining individual items; the items whose means were less than 2.35 were deleted.
The reasons for setting 2.35 as the cut-point were two-fold. First, the gap between the means above and below it
was relatively clear and wide (0.06), compared to other possible cut-off points among the individual strategies of
low frequency use. Second, after some more items were deleted at this cut-point, the remaining in the SIVL
could still reMect a comprehensive pro#le in strategy frequency use. Next, correlation analyses (Spearman rho) on
the remaining items in the SIVL were executed to test the relationships between individual items and the four
strategy categories. Each item whose correlation with its corresponding strategy category was weaker than those
with any of the other three strategy categories was then under the consideration of being omitted from the SIVL.
Lastly, reliability analysis was run to assess the reliability and validity of the newly-developed SIVL. A
combination of item analysis with correlation analysis for multiple sub-constructs was run to validate the
theoretically assumed four-fold categorization system in the SIVL (cf: Green & Salkind, 2003).
Results and Discussion

The #rst run of item analysis was conducted on the 110 items in the SIVL. Fifteen items were deleted thanks to
their weak correlations with the whole SIVL scale (less than .30), except Item 48 "I memorise a new word by
writing it repeatedly,” because of its signi#cant connection with Cognitive Strategies (r=.30) at the .01 level and
its high frequency use (mean=3.77). Next, descriptive statistics on the remaining 96 items in the SIVL were used
to obtain the individual items whose frequency use was below 2.35. Consequently, 17 items were removed.
A correlation analysis was then carried out on the remaining 79 items in the SIVL. The majority of the
items were found to have the strongest connection with their corresponding strategy category, except 3 items
whose correlation with Cognitive Strategies is either equal to or even weaker than that with the other strategy
categories. Two of the 3 items were then deleted; the item “I remember a new word by saying it repeatedly,” was
retained,” due to its popularity with Chinese EFL learners.
Lastly, 5 items under Memory Strategies were deleted from the SIVL for varying reasons. For example, 3
items had stronger relationship with other strategy categories, and 2 items had equal correlation with the other
two strategy categories.

All the procedures above resulted in a new 72-item SIVL, involving 16 items under Metacognitive Strategies,
25 under Cognitive Strategies, 24 under Memory Strategies, and 7 under Socio-affective Strategies (See
Appendix A).
Revalidating the 72-item SIVL

A much larger sample was chosen to revalidate the 72- item SIVL. Participants were 558 2 nd-year students (200
males and 358 females) randomly chosen by class from different departments of three universities. Con#rmatory
factor analysis (CFA), a special case of structural equation modelling (SEM) and exploratory factory analysis (EFA)
were employed. A total sample size of 528 were used for the analysis, as 30 cases with missing values were left
out, given that CFA requires a fully-crossed data set. There were two main reasons for CFA being chosen to
assess the construct validity of the SIVL. First, a sample of 528 participants would be large enough to produce
reliable results. Second, the instrument would be more robust if the construct validity could also be assumed by
means of a more sophisticated statistical tool such as CFA. Also, the CFA model can specify the pattern by
which each measure loads on a particular factor (Byrne, 2001, p. 12).
VLS was assumed to be a construct with four subconstructs, i.e., Metacognitive, Cognitive, Memory, and
Socio-affective Strategies. Accordingly, the model to be tested in CFA postulated a priori that VLS as the
underlying latent construct was one general factor with four indicators/subconstructs. This hypothesised model
could be assessed by looking at the extent to which it adequately described the sample data. Although there was
no consensus on the criteria of assessing a model, the three types of criteria provided by Bagozzi and Yi (1988, p.
82) seemed to be most comprehensive, i.e., 1) Preliminary Fit Criteria, 2) Overall Model Fit, and 3) Fit of
Internal Structure of a Model. How well the model #t the sample data would determine whether the instrument
had reliability and validity.
EFA using the principal axis factoring method was conducted to examine the unidimensionality of the
instrument, which is a crucial attribute of any latent variable/scale. In other words, behind every measurement
item, there should be one and only one underlying construct; that is, each measurement item should reMect only
its associated latent construct without signi#cantly reMecting any other construct (Gefen, 2003). Before computing
EFA, we examined the appropriateness of the data for factor analysis by using the two criteria suggested by Hair
et al. (1998): the Bartlett test of sphericity and the measure of sampling adequacy (MSA).
To verify the theoretically assigned subcategories within each of the four strategy categories in the SIVL,
EFA using the principal components method was run on the items within the four strategy categories respectively.
VARIMAX was used as the rotational method; a loading absolute value was set in advance to be greater than
0.30 (inclusive), which ‘is considered to be a substantial link of a factor and test’ (Hatch & Lazaraton, 1991, p.
494). Two criteria were used to determine the number of extracted factors: eigenvalues and scree test (cf: Green
& Salkind, 2003). All factors that had eigenvalues greater than 1 were to be retained. As this criterion might not
always produce accurate results, the scree test, which examines the plot of the eigenvalues and retains all factors
with eigenvalues in the sharp descent part of the plot before the eigenvalues start to level off, was used as a
complementary tool.


As mentioned above, CFA was conducted to assess the construct validity of the 72-item SIVL. Figure 1 shows the
factor loadings and squared multiple correlations of the four indicators with the latent factor “VLS”. The
numbers on the path are the factor loadings of the four strategy categories, while the numbers on the right side
of the four strategy categories are their squared multiple correlations (i.e., individual item reliabilities).
Figure 1. Con#rmatory Factor Analysis of the Hypothesised Model
Table 3 provides a detailed assessment of the model in terms of model #t criteria, levels of acceptable #t,
and evaluation of the instrument (i.e., the SIVL). In addition to Bagozzi and Yi (1988), the criteria provided in
Table 3 also referred to Bagozzi (1981), Byrne (2001), Doll, Xia, and Torkzadeh (1994), Hair et al. (1998), Marsh
and Hocevar (1985), and Tabachnick and Fidell (2001).
In terms of the preliminary #t criteria, three aspects of statistics were checked: the correlations among the
four variables were very good, ranging between .60 and .73; factor loadings fell within the acceptable range,
varying between .71 and .87, and their standard errors were appropriate. These results suggested that no
redundant variables existed; the four subscales (i.e., strategy categories) seemed to be distinguished from each
other properly, and the construct validity of the SIVL could be acceptable.
Regarding the overall model #t, three clusters of goodness-of-#t measures were adopted. The #rst cluster of
#t statistics yielded a χ2 value of 5.06 with 2 degree of freedom and a probability greater than .05 (p = .08), a
standardised RMR value of 0.01, and a RMSEA value of 0.054, thereby suggesting that the hypothesised model
was an adequate representation of the sample data and could be accepted. Besides, both GFI (.995) and AGFI
(.977), basically comparing the hypothesised model with the null model, were consistent in reMecting a good #t to
the sample data.
Regarding the second set of #t statistics, a number of incremental/comparative #t indices were all far
beyond the suggested value (>.90). Both NFI and CFI, comparing the hypothesised model with the
independence model and providing a measure of complete covariation in the data (Byrne, 2001), were .996 and .
997 respectively, as shown in Table 3, consistently indicating that the hypothesised model represented an
excellent #t to the sample data. The IFI, tackling the issues of parsimony and sample size and acknowledged to
be linked to the NFI, was .997, uniformly pointing to a well-#tting model. The TLI/NNFI, like the other indices
discussed above, produces values ranging from zero to 1.00, with values close to .95 (for large samples) reMecting

good #t (Hu & Bentler, 1999). Accordingly, its value of .992 for the hypothesised model once again suggested an
excellent #t. Although those indices discussed above evaluated a model from slightly different perspectives, they
unanimously suggested that the hypothesised model was appropriate and signi#cant.
Table 3
Evaluation of the Measurement Model
Model Fit Criteria Levels of Acceptable Fit Evaluation of the SIVL
Preliminary Fit Criteria
Correlations among variables Not too close to or greater than 1.00 Very good (.60 ~.73)
Factor loadings .50<λ<.95 Very good (see Figure 1)
Standard errors Absence of too large or small Very good (.04 ~.06)
standard errors
Overall Model Fit
Chi-square Value Nonsigni#cant with Good (χ 2=5.06, df=2, p=.08)
p-value≥.05
χ 2/df ≤2~5 Good (2.53)
Standardised Root mean square residual ≤ .05 Good (0.01)
(RMR)
Goodness of #t index (GFI) >.90 Very good (.995)
Adjusted goodness of #t index (AGFI) >.90 Very good (.977)
Incremental #t index (IFI) >.90 Very good (.997)
Normed #t index (NFI) >.90 Very good (.996)
Comparative #t index (CFI) >.90 Very good (.997)
Tucker-Lewis index (TLI) Close to .95 Very good (.992)
Root mean square error of approximation <.05 ~.08 Good (.054)
(RMSEA)
Hoelter’s Critical N (CN) Hoelter’s .05 and .01 CN values > Very good (N=625 at .05, N=961
200 at .01)
Ratio of sample size to number of free Ratio > 5:1 Very good (ratio≈66:1)
parameters
Fit of Internal Structure of a Model
Individual item reliability Pi≥.50 Good (see Figure 6.1)
Composite reliability Pc≥.70 Very good (.89)
Variance extracted ≥.50 Very good (.66)
Signi#cant parameter estimates t value > ±1.96 at p<.05, or t value Very good (all > 17 at p <.01)
con#rming hypotheses > ±2.576 at p<.01

Regarding the last set of #t statistics, Hoelter’s Critical N (CN) (labelled as Hoelter’s .05 and .01 indices), is
considerably different from those discussed earlier due to the fact that its focus is directly on the adequacy of
sample size rather than on model #t (Byrne, 2001). In other words, it is to estimate a sample size that would be
large enough to yield an adequate model #t for a test. A value over 200 indicates that a model suf#ciently
represents the sample data (Hoelter, 1983). As displayed in Table 3, both the .05 and .01 CN values for the
hypothesised model were in excess of 200 (625 and 961 respectively). This #nding indicates that the sample size
in this study (N = 528) was satisfactory. Moreover, the ratio of sample size to number of free parameters was
66:1, showing additional evidence for the hypothesised model being well-#tting and meaningful.
All the results and #ndings regarding overall model #t seemed to address an overall adequacy of the
hypothesised model; that is, this model was an excellent representation of the sample data. Nevertheless,
information on the nature of individual parameters and other aspects of the internal structure of a model did
not seem to be explicitly provided. It is critical to look at such information for the present situation, as there was
still a possibility of certain parameters corresponding to hypothesised relations being nonsigni#cant, and/or
measures of low reliability existing even when the overall model #t reMected a satisfactory model (Bagozzi & Yi,
1988). In other words, having the overall model #t is a necessary but insuf#cient proof of model adequacy.
Therefore, the #t of the internal structure of a model was scrutinised for the reliability of the construct. As listed
in Table 3, four criterion aspects were examined, i.e., individual item reliability, composite reliability of a whole
scale, average variance extracted from a set of measures of a latent variable, and signi#cant parameter estimates
con#rming hypotheses. While the individual item reliabilities, i.e., the squared multiple correlations of the four
indicators, and signi#cant parameter estimates can be obtained directly from Amos 20, the composite reliability
and average variance extracted need to be calculated manually by using the following two formulas:
1.
(Sum of standardised loadings)2
Construct reliability =
(Sum of standardised loadings)2+Sum of indicator measurement error
2.
Sum of squared standardised loadings
Variance extracted =
Sum of squared standardised loadings + Sum of indicator measurement error
(Hair et al., 1998, p. 624)
As shown in Table 3, the individual item reliabilities for the four strategy categories ranged from moderate to
high in value. The composite reliability was quite high, with the value of .89 greatly exceeding the recommended
threshold value of .70. As a complementary measure to the construct reliability value, the overall amount of
variance extracted in the four indicators (i.e., strategy categories) accounted for by the latent construct of VLS
reached 66%, which also went beyond the suggested level of .50. These results imply that the SIVL is a practical
construct with a satisfactory overall reliability. As for parameter estimates, all the parameter estimates turned out

to be signi#cant at the level of .01, with all the t-values greater than ±2.576 (all actually above 17 at p<.01). This
#nding suggests that all the four indicators were justi#able and key to the hypothesised model (Bagozzi & Yi,
1988; Byrne, 2001).
The evaluation of the measurement model discussed above established the construct reliability and validity
of the SIVL but did not explicitly provide information in terms of the unidimensionality of the scale. An EFA
using principal axis factoring method was conducted for this purpose. The EFA results revealed that the majority
of the variance was explained by one single factor (above 74%), and the eigenvlaue of the second largest factor
was marginal in comparison with the #rst (.45 vs 2.98). The factor loadings of the four strategy categories on the
one unrotated factor were .83 for Metacognitive Strategies, .87 for Cognitive Strategies, .83 for Memory
Strategies, and .71 for Socio-affective Strategies, which displayed a consistently high pattern. All the above results
provide good evidence for the unidimensionality of the scale; that is, the four strategy categories tapped into one
single underlying trait.
After the unidimensionality of the instrument was ensured, it would be more justi#able to assess its internal
consistency reliability using Cronbach alpha, as one of the assumptions of Cronbach alpha is that the
unidimensionality of a scale exists (Hair et al., 1998). As a whole, the 72-item SIVL turned out to have very good
internal consistency reliability, with the Cronbach alpha index of being .95. The theoretically assumed four
strategy categories were also revealed to statistically have consistent reliability, as the Cronbach alpha indices for
each category were acceptable, with .84 (MET), .89 (COG), .91(MEM) and .75 (SOC) respectively, all beyond the
recommended threshold level of .60 (Dörnyei, 2003).
The last stage of the validation procedures was to statistically explore the theoretically assigned subcategories
within each of the four strategy categories by running an EFA.
Within Metacognitive Strategies, four factors were retained, explaining a total variance of about 54%. Table
4 demonstrates the factor loadings of each item on its corresponding factor(s). Although four items seemed to
load on two factors, their loadings on one of the factors (e.g. .72 for Item 9 on F1) were far beyond the ones on
the other (e.g. .35 for Item 9 on F4). Therefore, Metacognitive Strategies could be decomposed into four
subcategories, which were identi#ed and labelled as follows:
 Organising and Monitoring (F1, 5 items, i.e., Items 2, 9-10, 15-16)
 Directed Attention (F2, 4 items, i.e., Items 11-14)
 Selective Attention (F3, 4 items, i.e., Items 3-6)
 Learning to Learn (F4, 3 items, i.e., Items 1, 7-8)
Consequently, the three theoretically assumed subcategories (Paying Attention (Items 1-6), Arranging &
Planning (Items 7-13), and Monitoring & Evaluation (Items 14-16) were replaced by the four newly yielded
counterparts, which seemed to be supported both statistically and practically.

Table 4
Factor Loadings of the Four Subcategories within Metacognitive Strategies
Metacognitive Strategies
Item Brief Description F1 F2 F3 F4
stra9 Plan schedule to have enough time for word study .72 .35
stra10 Have clear goals of improving vocabulary .65 .32
stra16 Self-test vocabulary .62
stra15 Think about progress in learning words .62 .33
stra2 Break lists into parts .47
stra12 Use various means to make clear unsure words .72
stra11 Care about words the teacher doesn’t emphasise .68
stra14 Aware when I incorrectly used a word and use the information to do better .66
stra13 Associate a new word with a known one that sounds similar .63
stra4 Know when a new word is essential for comprehension .72
stra3 Know when to skip a new word .72
stra5 Know important words for learning .70
stra6 Look up interesting words .46 .37
stra7 Try to #nd ways as many as possible to use new words .73
stra8 Try to #nd ways to become a better word learner .30 .72
stra1 Pay attention to vocabulary use in speech .49
Within Cognitive Strategies, seven factors were extracted, accounting for a total variance of over 63%. Seven
out of 25 items turned out to load on two factors. We decided to cluster each of them into the factor on which it
had a higher loading, although one of the 7 items (i.e., Item 26) had a high loading on both of the two factors (F4
and F6). This can be that F4 and F6 are both about referring to resources. The results turned out to be generally
consistent with the theoretically assigned subcategories, except that Using Dictionaries were split into two factors
(i.e., F3 and F4). On a closer look, the 5 items pooled together under F3 seemed to focus on referring to
dictionaries as a lexical resource, while the 4 items under F4 seemed to be more concerned about how to look up
a word. Consequently, the seven factors within Cognitive Strategies were identi#ed and labelled as follows:
 Activation (F1, 5 items, i.e., Items 37-40)
 Guessing (F2, 4 items, i.e., Items 17-20)
 Choosing Dictionaries as a Lexical Resource (F3, 5 items, i.e., Items 21-25)
 Looking Up (F4, 4 items, i.e., Items 26-29)
 Taking Notes (F5, 3 items, i.e., Items 32-34)
 Using Study Aids (F6, 2 items, i.e., Items 30-31)
 Repetition (F7, 2 items, i.e., Items 35-36)

Within Memory Strategies, #ve factors, accounting for a total variance of over 54%, were retained. 14 out of
24 items loaded on more than one factors. 13 items were located under whichever factor they had a higher
loading on. The only item (i.e., Item 61), loaded on two factors, i.e., F2 and F3, was put under F3 which had a
slightly lower loading with it than F2 (.48 and .43, respectively). As a result, the #ve factors were identi#ed and
labelled as:
 Association/Elaboration (F1, 7 items, i.e., Items 43-46, 48-50)
 Word Structure (F2, 4 items, i.e., Items 42, 51-53)
 Other Memory Strategies (F3, 6 items, i.e., Items 60-65)
 Applying Images (F4, 4 items, i.e., Items 47, 54-55, 59)
 Visual Encoding (F5, 3 items, i.e., Items 56-58)
Compared with the theoretically assumed subcategories within Memory Strategies, three factors (i.e., F1, F2
& F5) were named after three of the theoretically assumed subcategories, as they were in general consistent with
each other, although several individual items under the three subcategories were relocated. F4 was termed as
Applying Images in that the four items under it were all concerned with using images to memorise vocabulary. As
for F3, involving the three items theoretically assigned to Contextual Encoding, and the other three originally
representing three theoretically assumed subcategories (i.e., Reviewing, Using Keywords, and Paraphrasing), it
seemed unlikely to label it in the way of labelling other factors whose items were much more easily found to have
something in common. Therefore, we labelled it as Other Memory Strategies.
Within Socio-affective Strategies, two clear factors were extracted, explaining a total variance of about 61%.
This #nding turned out to be in accordance with the theoretically assumed two subcategories: Questioning for
Clari#cation (F2, 2 items, i.e., Items 66-67) and Managing Emotion (F1, 5 items, i.e., Items 68-72).
Therefore, it seems that the theoretically assumed subcategories within each of the four strategy categories
are generally supported by the results from factor analysis. On the one hand, the results provide plenty of
evidence for the existence of the theoretically assumed subcategories within Cognitive Strategies and Socio-
affective Strategies. On the other hand, the four categories within Metacognitive Strategies resulted from factor
analysis seem to be more justi#able than the original three subcategories, although at the same time, they do
share similarities to a certain extent. Memory Strategies turned out to be a more complicated category with
multiple subcategories.
Conclusion
The Strategy Inventory for Vocabulary Learning (SIVL) was developed through three stages. In the #rst stage,
170 items were pooled from various existing inventories and was reduced to 110 items. In the second stage, the
instrument was shortened by mainly using the results of descriptive statistics and item analysis, and was then
validated by using reliability analysis, and a combination of item analysis and correlation analysis. As a result, a
shorter version of the 72-item SIVL emerged with good reliability and content-related and construct-related
validities. Finally, the psychometric properties of the re#ned SIVL were assessed by using con#rmatory and
exploratory factor analyses. The results revealed that the SIVL had satisfactory psychometric features and that
the hypothesised theoretical model had a good #t to the sample data. This con#rming evidence implies that the

SIVL can serve as a reliable and valid research instrument for evaluating Chinese EFL learners’ vocabulary
learning strategy use at the tertiary level.
Limitations and Implications

Limitations of this research have some implications for future research. For example, this research is a
questionnaire study, which might not be able to truly reMect what a learner actually does while dealing with a
word (Nation, 2001). Given the quantitative nature of this study, it is recommended future researchers look at
both the quantity and quality of learners’ strategy use, and the correlation between the quality and quantity of
strategy use. Considering that the SIVL was situated in the Chinese tertiary context, future research can employ
it at primary and secondary levels in the Chinese EFL context, as well as other similar EFL contexts.
The SIVL can serve as a diagnostic tool for teaching and learning. Among a number of common ways
of strategy elicitation methods the questionnaire is one of the more controlled and systematic ways of collecting
information on strategy use. Therefore, the validated strategy inventory, the SIVL can serve as a valuable
resource for teachers and students equally. It will take only 10 minutes to complete the SIVL. Once the results
are obtained, the students can be aware of the general pattern of their vocabulary strategy use, and then reMect
upon their strategy use in learning. In other words, the information obtained from the SIVL can be used by
teachers to help their students, or by the learners to raise their awareness of what strategies they use and what
strategies they need to take training on. Once the reMection is done, teachers can provide their students training
or the learners themselves can #nd online training resources for some particular strategies which are believed to
be worth trying out to facilitate word learning in the future; for example, some Memory Strategies such as the
keyword method which has been found popular with the western learners can be singled out for training Chinese
learners. However, as results revealed in previous studies (Chester#eld & Chester#eld, 1985; Schmitt, 1997) that
the pattern of strategy use can change over time; this procedure can be recycled, for example, once in a couple
of months, although the frequency depends on each individual learner’s situation and needs in their vocabulary
learning.
References
Ahmed, M. O. (1989). Vocabulary learning strategies. In P. Meara (Ed.), Beyond Words (pp. 3-14). London: CILT.
Bagozzi, R. P. (1981). An examination of the validity of two models of attitude. Multivariate Behavioural Research,
16, 323-359.
Bagozzi, R. P., & Yi, Y. (1988). On the evaluation of structural equation models. Journal of the Academy of Marketing
Science, 16(1), 74-94.
Byrne, B. M. (2001). Structural Equation Modeling with AMOS: basic concepts, applications, and programming. Mahwah,
New Jersey: Lawrence Erlbaum Associates.
Carter, R. (1998). Vocabulary: applied linguistic perspectives (2nd ed.). London: Routledge.
Chester#eld, R., & Chester#eld, K. B. (1985). Natural order in children's use of second language learning
strategies. Applied Linguistics, 6(1), 45-59.
Coady, J., & Huckin, T. (Eds.). (1997). Second Language Vocabulary Acquisition. Cambridge: CUP.

Dörnyei, Z. (2003). Questionnaires in Second Language Research: construction, administration, and processing. Mahwah, New
Jersey: Lawrence Erlbaum Associates, Inc.
Dörnyei, Z. (2005). The Psychology of the Language Learner: individual differences in second language acquisition. Mahwah,
NJ: Lawrence Erlbaum.
Denscombe, M. (2003). The Good Research Guide: for small-scale social research projects (2nd ed.). Buckingham: Open
University Press.
DeVellis, R. F. (1991). Scale Development Theory and Applications. London: Sage Publications.
Doll, W. J., Xia, W., & Torkzadeh, G. (1994). A con#rmatory factor analysis of the end-user computing
satisfaction instrument. MIS Quarterly, 18(4), 453-461.
Gefen, D. (2003). Assessing unidimensionality through LISREL: an explanation and example. Communications of
the Association for Information Systems, 12, 23-47.
Green, S. B., & Salkind, N. J. (2003). Using SPSS for Windows and Macintosh: analyzing and understanding data (3rd ed.).
Upper Saddle River: Prentice Hall.
Gu, Y., & Johnson, R. K. (1996). Vocabulary learning strategies and language learning outcomes. Language
Learning, 46(4), 643-679.
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate Data Analysis (5th ed.). Upper
Saddle River, New Jersey: Prentice-Hall.
Hatch, E., & Lazaraton, A. (1991). Design and Statistics for Applied Linguistics: the research manual. New York: Newbury
House Publishers.
Hoelter, J. W. (1983). The analysis of covariance structures: goodness-of-#t indices. Sociological Methods & Research,
11, 325-344.
Hsiao, T., & Oxford, R. (2002). Comparing theories of language learning strategies: a con#rmatory factor
analysis. The Modern Language Journal, 86, 368-383.
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for #t indexes in covariance structure analysis: conventional
criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1-55.
Kudo, Y. (1999). L2 vocabulary learning strategies (NFLRC NetWork #14) [HTML document]. Honolulu:
University of Hawai`i, Second Language Teaching & Curriculum Center. Retrieved [2003-06-24], from
http://www.nMrc.hawaii.edu/NetWorks/NW14/
Manyak, P. C. (2010). Vocabulary instruction for English learners: Lessons from MCVIP The Reading Teacher,
64(2), 143-147.
Marsh, H. W., & Hocevar, D. (1985). Application of con#rmatory factor analysis to the study of self-concept:
#rst- and higher order factor models and their invariance across groups. Psychological Bulletin, 97(3), 562-
582.
Meara, P. (1980). Vocabulary acquisition: a neglected aspect of language learning. Language Teaching and Linguistics:
abstracts, 15(4), 221-246.
Meara, P. (1995). The importance of an early emphasis on L2 vocabulary. The Language Teacher, 19(2), available at
http://jalt-publications.org/tlt/#les/95/feb/meara.html (date of access: 19 January 2005).
Meara, P. (2005). Lexical frequency pro#les: A Monte Carlo analysis. Applied Linguistics, 26(1), 32-47.
Nation, I. S. P. (1990). Teaching and Learning Vocabulary. Boston: Heinle & Heinle.

Nation, I. S. P. (2006). Vocabulary: Second Language. In K. Brown (Ed.), Encyclopedia of Language & Linguistics
(2nd ed., pp. 448–454). Oxford: Elsevier.
Nation, I. S. P., & Newton, J. (1997). Teaching vocabulary. In J. Coady & T. Huckin (Eds.), Second Language
Vocabulary Acquisition (pp. 238-254). Cambridge: CUP.
O'Malley, J. M., & Chamot, A. U. (1990). Learning Strategies in Second Language Acquisition. Cambridge: CUP.
Oxford, R. (1990). Language Learning Strategies: what every teacher should know. Boston: Heinle & Heinle.
Oxford, R., & Burry-Stock, J. (1995). Assessing the use of language learning strategies worldwide with the
ESL/EFL version of the strategy inventory for language learning (SILL). System, 23(1), 1-23.
Politzer, R., & McGroarty, M. (1985). An exploratory study of learning behaviors and their relationship to gains
in linguistic and communicative competence. TESOL Quarterly, 19, 103-123.
Read, J. (2000). Assessing Vocabulary. Cambridge: CUP.
Schmitt, N. (1997). Vocabulary learning strategies. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: description,
acquisition and pedagogy (pp. 199-227). Cambridge: CUP.
Schmitt, N. (2000). Vocabulary Language Teaching. Cambridge: CUP.
Schmitt, N., & McCarthy, M. (Eds.). (1997). Vocabulary: description, acquisition and pedagogy. Cambridge: CUP.
Schunk, D. H. (2001). Social cognitive theory and self-regulated learning. In B. J. Zimmerman & D. H. Schunk
(Eds.), Self-regulated Learning and Academic Achievement: theoretical perspectives (2nd ed., pp. 125-151). Mahwah,
NJ: Lawrence Erlbaum Associates.
Stoffer, I. (1995). University Foreign Language Students' Choice of Vocabulary Learning Strategies as Related to Individual
Difference variables. Doctoral Dissertation, the University of Alabama, Tuscaloosa, Alabama.
Tabachnick, B. G., & Fidell, L. S. (2001). Using Multivariate Statistics (4th ed.). Boston: Allyn and Bacon.
Tseng, W., Dörnyei, Z., & Schmitt, N. (2006). A new approach to assessing strategic learning: the case of self-
regulation in vocabulary acquisition. Applied Linguistics, 27(1), 78-102.
Waring, R. (2002). Basic principles and practice in vocabulary instruction Retrieved 19 January, 2005, from
http://www.jalt-publications.org/tlt/articles/2002/07/waring
Yang, N. (1999). The relationship between EFL learners' beliefs and learning strategy use. System, 27, 515-535.
Zimmerman, B. J. (1989). A social cognitive view of self-regulated academic learning. Journal of Educational
Psychology, 81, 329-339.
Zimmerman, B. J. (2000). Attaining self-regulation: a social cognitive perspective. In M. Boekaerts, P. R. Pintrich
& M. Zeidner (Eds.), Handbook of Self-regulation (pp. 13-39). San Diego: Academic Press.

Appendix A
The 72-item SIVL
Part A: Metacognitive Strategies (MET) (16 items)

1. I pay close attention to the vocabulary use in my speech and that of others.
2. I break lists into smaller parts.
3. I know when I need to skip or pass a new word.
4. I know when a new word is essential for adequate comprehension of a passage.
5. I know which words are important for me to learn.
6. I look up words that I’m interested in.
7. I try to #nd as many ways as I can to use new English words.
8. I try to #nd out how to be a better learner of English words.
9. I plan my schedule so I will have enough time to study English words.
10. I have clear goals for improving my vocabulary.
11. I care about vocabulary items my English teacher doesn’t mention or emphasise.
12. I use various means to make clear vocabulary items that I am not quite clear of.
13. I associate a new word with a known English word that sounds similar.
14. I am aware when I have not used a new word correctly and use that information to help me do better.
15. I think about my progress in learning English words.
16. I test my vocabulary with word tests or other means.
Part B: Cognitive Strategies (COG) (25 items)

17. I make use of the logical development in the context (e.g. cause and effect) when guessing the meaning of a
word.
18. I make use of my common sense and knowledge of the world when guessing the meaning of a word.
19. I analyse the word structure (pre#x, root, and suf#x) when guessing the meaning of a word.
20. I use alternative cues and try again if I fail to guess the meaning of a word.
21. When I want to con#rm my guess about a word, I look it up.
22. When looking up a word in a dictionary, I pay attention to sample sentences illustrating its various meanings.
23. I look for phrases or set expressions that go with the word I look up.
24. When I want to know more about a word that I already have some knowledge of, I look it up.
25. When I get interested in another new word in the de#nitions of the word I look up, I look up this word as
well.
26. If the new word I try to look up seems to have a pre#x or suf#x, I will try the entry for the stem.
27. If the unknown word appears to be an irregularly inMected form or a spelling variant, I will scan nearby
entries.
28. If there are multiple senses or homographic entries, I use various information (e.g. part of speech,
pronunciation, style, collocation, meaning, etc.) to reduce them by elimination.

29. I try to integrate dictionary de#nitions into the context where the unknown was met and arrive at a
contextual meaning by adjusting for complementation and collocation, part of speech, and breadth of
meaning.
30. I use audio, video, computer aids to learn or consolidate my vocabulary.
31. I learn words written on commercial items.
32. I make a note of the meaning of a new word when I think it is commonly-used or interesting.
33. I take notes when I look up a word.
34. I make notes when I want to help myself distinguish between the meanings of two or more words.
35. I remember a new word by saying it repeatedly.
36. I memorise a new word by writing it repeatedly.
37. I try to read as much as possible so that I can make use of the words I tried to remember.
38. I make up my own sentences using the words I just learned.
39. I try to use the newly learned words as much as possible in speech and writing.
40. I try to use newly learned words in real situations.
41. I try to use newly learned words in imaginary situations in my mind.
Part C: Memory Strategies (MEM) (24 items)

42. I group new words by grammatical class, e.g. nouns, verbs, adjectives, etc.
43. When I meet a new word, I search in my memory and see if I have any synonyms and antonyms in my
vocabulary stock.
44. I remember a group of new words that share a similar part in spelling.
45. I associate a group of new words that share a similar part in spelling with a known word that looks or sounds
similar to the shared part (neighbour, sleigh, weigh).
46. I create a sentence in Chinese when I link a new word to a known word.
47. I learn new words by relating them to myself or my personal experience.
48. I connect the new word to its synonyms and antonyms (blossom/Mower; wet/dry).
49. I associate the word with its coordinates/subordiantes/superordinates (apple/peach; animal/dog;
spinach/vegetable).
50. I use ‘scales’ for gradable adjectives (cold, cool, warm, hot).
51. I deliberately study word-formation rules in order to remember more words.
52. I remember a word’s part of speech.
53. I learn the words of an idiom together.
54. I create a mental image of the new word to help me remember it.
55. I create mental images of association when I link a new word to a known word.
56. I visualise the new word to help me remember it.
57. I remember the spelling of a word by breaking it into several visual parts.
58. I remember together words that are spelled similarly.
59. I try to create semantic networks in my mind and remember words in meaningful groups.
60. I remember a new word together with the context where it occurs.

61. I deliberately read books in my areas of interest so that I can #nd out and remember the special terminology
that I know in Chinese.
62. I associate a new word with its preceding/following words to remember it better.
63. I review new words soon after the initial meeting.
64. I link new words to similar sounding Chinese words.
65. I paraphrase the word’s meaning.
Part D: Socio-affective Strategies (SOC) (7 items)

66. I ask teachers or others for the meaning of a new word.
67. I ask teachers or others for paraphrases or synonyms of a new word.
68. I try to relax whenever I am afraid of using a word.
69. I encourage myself to use new words even when I am afraid of making mistakes.
70. I give myself a reward or treat after I have successfully recalled new words.
71. I feel successful when having learned new words.
72. I enjoy learning new vocabulary.
About the Authors

Dr. Xuelian Xu received her PhD in Applied Linguistics from the University of Nottingham ，UK. She worked
as Assistant Professor at Macau University of Science and Technology. Currently, she is teaching in the
Language Centre, Xi’an Jiaotong-Liverpool University. Her main research focus is on vocabulary, motivation
and EAP.
Dr. Wen-Cheng Hsu obtained his PhD from the University of Nottingham. His teaching experience spans more
than 15 years across various levels and cultures. He is now teaching in the Language Centre, Xi’an Jiaotong-
Liverpool University. His research interests revolve around learning strategies, learner autonomy, teacher
autonomy, and EAP.

“I Used Them Because I Had to . . .”: The Effects of Explicit

Instruction of Topic-Induced Word Combinations on ESL Writers
Jelena Colovic-Markovic*
West Chester University of Pennsylvania, USA
Abstract
This study attempts to determine whether the students who receive explicit instruction make more gains in their abilities to
use topic-induced phrases in their writing than those who do not. Additionally, through interviews with a selected group of
students from the treatment group, the study attempts to glean insights into the approaches learners use for written
production of the target phrases. Data was collected from 54 ESL students in high-intermediate writing classes at an IEP
who were assigned to the contrast (N=19) and treatment (N=35) groups based on their class enrollment. Over a period of
four days, the treatment group received training on 15 target structures. The contrast group received no vocabulary
instruction. Both groups were exposed to the target phrases through reading the same course materials and discussing them
in class. The data included the scores participants received on the production of the target structures in their essays at the
beginning and end of term. A repeated-measures ANOVA revealed that while both groups made improvement, it was the
treatment group that made more signi1cant gains in their abilities to produce topic-induced phrases than the contrast. The
interviews’ 1ndings indicated the students’ perceptions of the usefulness of the target structures may in3uence whether or
not learners employ them in writing. The study 1ndings suggest that explicit instruction is helpful for the writers’ abilities to
produce topic-induced phrases. These 1ndings have implications for ESL writing pedagogy.
Key words: explicit instruction, topic-induced phrases, topic-related vocabulary, ESL writing.
The Role of Vocabulary in ESL Writing

Vocabulary is very important in the successful writing of L2 learners (Ferris, 2015; Folse, 2008; Hinkel, 2004;
Nation, 2005). With a rich vocabulary repertoire, ESL writers are better able to express complex ideas using
advanced language and to produce readable, coherent prose their readers expect (Folse, 2008; Hinkel, 2004). On
the other hand, with impoverished vocabulary, learners are likely to make both poor lexical choices and lexical
errors in their writing. Santos (1998) considers lexical errors the most serious errors in ESL students’ writing
pointing out that “[i]t is precisely with this type of error that language impinges directly on content; when a
wrong word is used, the meaning is very likely to be obscured” (p. 48).
Empirical evidence suggests that vocabulary utilized in L2 students’ writing may in3uence the overall
quality of an essay (Barkaoui, 2010; Engber, 1995; Ferris, 1994; Harley & King, 1989; Linnarud, 1986; McClure,
1991; Santos, 1988; Song & Caruso, 1996) and that effective lexical choices are contributing factors in the quality
of an ESL student’s text (Engber, 1995; Harley & King, 1989; Linnarud, 1986; McClure, 1991). Vocabulary is
frequently included as a separate component in the rubrics developed as writing assessment tools (see, for
example, Jacobs, Har1eld, Hughey, & Wormeth, 1981). Similarly, vocabulary is regarded as a factor in3uencing
the overall score an essay receives on high-stakes English language tests (see, for example, the scoring guides for
the written portions of the IELTS or iBT/Next Generation TOEFL tests).
* Tel: + 1 610-436-3371; E-mail: jmarkovic@wcupa.edu; 233 Mitchell Hall, West Chester University of Pennsylvania, West
Chester, PA 19383, United States of America

The importance of vocabulary in writing is also seen from the perspective of ESL learners. In a survey that
Leki and Carson (1994) employed with 128 ESL undergraduate students to gather data on the student perceived
effectiveness of an English for academic purposes writing course, learners reported that it was vocabulary
instruction that they had needed the most. Similarly, and more recently, in an interview that Coxhead (2012)
conducted with learners of English as an additional language in New Zealand, learners reported the need for
technical, academic, or professional words to express their ideas in writing.
The evidence coming from literature on vocabulary and writing, research on the factors contributing the
ESL essay quality, assessment tools used in evaluation of ESL essays, and students’ perceptions of what needs to
be included in the ESL writing instruction emphasize the need for focused attention on vocabulary in ESL
writing instruction.
Formulaic Language in The Prose of Expert Writer And ESL Learners

Research is increasingly showing that vocabulary of English written texts is made up of phrases larger than single
words that go by various names including formulas, lexical bundles, formulaic sequences, collocations,
(multi)word combinations, and (lexical) phrases. Attempts have been made at categorizing the types of recurrent
lexical phrases (see, for example, Biber, Johansson, Leech, Conrad, & Finegan, 1999; Lewis, 1997; Thornbury,
2002). However, because of the very many different kinds of multi-word units, there is often overlap between
categories (Coxhead & Byrd, 2008; Lewis, 1997). Both comparative analyses of the writing of ESL and native
English speakers (Granger, 1998; Yorio, 1989) and comparative studies of ESL and expert, published writers in
English (Howarath, 2001; Scott & Tribble 2006) report on the frequent use of multiword combinations in written
English. Vocabulary research in corpus linguistics showed that written discourse, speci1cally academic prose, is
marked by frequent and functional use of lexical phrases (Biber, Johansson, Leech, Conrad, & Finegan, 1999;
Cortes, 2006; Hyalnd, 2008). Erman and Warren’s (2000) investigation of prefabricated language of English
speech and writing found that 52.3% of written (and 58.6% of spoken) discourse consisted of multiword
combinations.
In the area of research on formulaic language (collocations) in ESL writing, Erman (2009) argues that a
writing topic necessitates the use of speci1c word combinations in a text that adequately examines the topic. She
illustrates that in a text on the topic of environment protection, it is expected to encounter phrases such as protect
the environment, damage the ozone layer, ecological change, or economic problems/perspective. In an event that the text is
lacking phrases motivated by a topic, the writing is likely to be viewed as impoverished in content and, as such,
ineffective in the treatment of the subject matter. Similarly, Ferris (2015) points out that the use of content
vocabulary is important in the writing of ESL learners.
It follows that learning to write effectively on a particular topic requires the knowledge of the phrases
necessitated by the topic. However, Erman’s (2009) study found that ESL learners employed fewer topic-induced
collocations in their essays than the native English speakers did in their writing. Moreover, a further comparison
of the compositions written by ESL learners and the essays written by native English speakers in terms of the
distribution of different groups of collocations (lexical function, socio-cultural, topic-induced) revealed that it was
the production of topic-induced phrases by the ESL students that was particularly low and 3awed.
The results of Erman’s (2009) study echo the 1ndings of a body of research suggesting that ESL writers’
production of multiword combinations is problematic (Granger, 1998; Howarth, 1998; Li & Schmitt, 2009; Scott
& Tribble, 2006; Yorio, 1989). ESL writers, speci1cally those at the lower levels of English language pro1ciency,
tend to copy lexical phrases from writing prompts (Ohlrogge, 2009). Even with extensive writing practice in the
target language, many ESL learners continue to produce non-target-like multiword combinations (Li & Schmitt,
2009; Scott & Tribble, 2006), which suggests that learning to use formulaic sequences effectively may be dif1cult
for ESL writers. To facilitate the process of learning to use lexical phrases in writing, the research has been
suggesting that ESL writers be provided with explicit instruction (Biber & Barbieri, 2007; Coxhead & Byrd, 2007;
Ellis et al., 2008; Folse, 2008; Nattingen & DeCarrico, 1992; Simpson-Vlach & Ellis, 2010).

Explicit Teaching of Lexical Phrases in ESL Writing

In the literature on second language vocabulary teaching, many attempts have been made to help ESL teachers
tease apart the complex processes involved in teaching vocabulary (i.e., single words and multiword
combinations) and offer ideas for addressing learners’ vocabulary needs (see, for example, Coxhead, 2014; Lewis,
1997; Nattingen & DeCarrico, 1992; Thornbury, 2002; Zimmerman, 2009). Research examining ESL students’
written production provides some support for possible facilitative effects of explicit teaching. In their longitudinal
study investigating how the use of formulaic sequences in the writing of a Chinese graduate student at an
English-speaking university develops over a period of one academic year, Li and Schmitt (2009) noted that for
their participant, explicit instruction was one of the major sources of acquisition for new formulaic sequences.
Mainly, according to the participant’s account, thirty-one percent of the total number of different types of newly
acquired formulaic sequences used in writing came from the explicit instruction she had received. Twenty years
earlier, Yorio’s (1989) study that compared the texts written by two groups of learners performing the same task
under the same conditions, one consisting of immigrant students residing in US and the other of English majors
residing in Argentina found that the learners who had received formal instruction, that is, those in the latter
group, showed more accurate use of the formulaic sequences in their writing.
Only few studies have investigated the effects of the instructional intervention in writing. Within the L1
context, Cortes (2006) found that the writing of the university-level students who were taught a selected set of
lexical bundles in 1ve short lessons given over a 10-week period did not show major improvement in the use of
lexical bundles. However, the survey data collected after the treatment indicated that the students increased their
awareness about the importance of the use of lexical bundles in academic writing. Based on the results, the
researcher inferred that the students possibly needed more instruction time and more exposure to the lexical
bundles to show improvement. In the ESL context, Jones and Haywood (2004) reported that the ESL learners in
an English for Speci1c Purposes program who were explicitly taught certain formulaic sequences during a period
of 10 weeks showed no gains in their ability to use formulaic sequences which was measured from writing
samples collected at the beginning and end of the term.
Even more limited is the research on the effects of explicit teaching of topic-induced phrases on ESL
writing. Although both Lee’s (2003) and Lee and Muncie’s (2006) studies investigated learners’ written
production of lexical phrases and reported positive effects o f instructional intervention on vocabulary use in
writing, this research examined lexical phrases in conjunction with single-word topic-induced vocabulary and not
separately.
Research Questions
To 1ll this gap in literature, the present study aims to answer the two questions presented below. The 1rst
question is addressed through quantitative and the second through qualitative data elicitation and analysis, as
described in the next section of the paper.
1. Do the students who receive explicit instruction make more gains in their abilities to use topic-induced
phrases in their writing than those who do not?
2. If so, how do the students receiving explicit instruction go about producing topic-induced phrases in
their writing?
Methodology
Overview of the research design
This study was a part of a larger study investigating the effects of explicit teaching of multi-word phrases on ESL
writers. The research project uses a quasi-experimental design in which the study participants are assigned to

treatment and contrast groups based on the class in which they are enrolled. The study was conducted in writing
classes for intermediate-level pro1ciency students at an Intensive English Program (IEP). Instructional periods in
the IEP are divided into terms of eight weeks, with two terms occurring each semester. The writing teacher
taught the contrast group 1rst and then the treatment group.
The class focused on writing argumentative essays. For the writing course, participants wrote three multi-
draft essays. Both groups followed the same syllabus. They read and discussed the same reference materials prior
to submission of the 1nal draft of each essay. They completed the same activities from the textbook for the
course and were taught by the same instructor, who was different from the researcher, to reduce the effects of the
teacher variable on the results. The contrast and treatment groups were given the same composition assignments.
As noted previously, there were three multidraft essays students wrote for the class. This study concerns the
students’ essays written on the topic examined third for the class. The treatment group was taught topic-induced
phrases on the topics of the two other essays prior to submission of their respective 1nal drafts.
The contrast group received no explicit instruction on topic-induced phrases. The teacher was directed to
instruct students in this group as she had been doing prior to the participation in the present study. It is possible,
however, that the teacher explained the meaning of speci1c vocabulary including the target items when students
asked about or appeared confused by some words during post-reading activities and in-class discussions. The
group was exposed to the target phrases only through reading, in-class discussions and textbook activities, that is,
in a manner of delivery that the teacher had been using prior to the present study.
Besides gathering data for quantitative analysis, the study attempts to glean insights into the approaches
learners used for written production of the target phrases, speci1cally the strategies that would distinguish
between the learners whose production was limited (low performing) from those whose production was more
extensive (high performing). For this purpose, individual semistructured interviews were conducted at the end of
the treatment with a subset of students from the treatment group. The interview questions have a reference to
the students’ writing topic and are as follows:
a) How did you go about incorporating the phrases about international adoptions in your 1rst/second
essay?;
b) In the writing class, your teacher used many different activities to help you learn the phrases on the topic
of international adoptions. In your opinion, which of these activities helped you learn the phrases best?; and
d) Which of the activities were not helpful to you?
The wording and sequence of the interview questions remained the same for each informant; however,
probes were used to elicit additional information as the need arose.
Participants
Data was collected from 54 ESL students from 1ve intact high-intermediate writing classes at an Intensive
English Program (IEP) in the western United States. The ESL courses at the IEP are designed to support
development of language skills for academic studies primarily, but also professional communication. The study
participants had all taken a standardized English pro1ciency placement exam for the IEP. Some directly placed
in the high-intermediate level class by the internal placement test and others moved from the intermediate to the
high-intermediate level after successfully passing the 1nal exams in the previous level. There were 19 students in
the contrast and 35 students in the treatment group. The participants came from various language backgrounds
(Arabic=11, Bambara=1, French=1, Japanese=26, Korean=6, Mandarin=1, Portuguese=1, Russian=1,
Spanish=2, Thai=2, and Turkish=1). 41% were male and 59% were female. 46% of the participants were under
the age of 20, 50% were between the ages of 21 and 30, 2% were between the ages 31 and 40, and 2% were
over the age of 41.
Target Items
The target topic-induced word combinations were taken from the passages in Numrich’s (2009) Raise the Issues: An
integrated approach to critical thinking, Unit 3. The texts were a part of required reading materials on the topic of

international adoption. The target items were initially located by using KeyWords extractor v.1 (2007), N-gram
Phrase Extractor v.4 (Cobb, 2010) and subsequently submitted to manual investigation. Both programs were
available at no cost at www.lexitutor.com. KeyWords Extractor v.1, a lexical computer software used to identify
single words that with unusual frequency appear in a text when compared to a reference, calculates the word
frequencies on a per million word basis and uses the Brown corpus, a corpus of one million words of American
English, as a reference. The N-gram Phrase Extractor program generates a list of n-grams occurring with the
frequency of two and higher in the texts under investigation.
The words identi1ed through KeyWords Extractor v. 1 (2007) were manually compared to the formulaic
sequences produced by N-gram Phrase Extractor (Cobb, 2010) analysis because there were several instances in
which the words the former program identi1ed as key words were not found in the list generated by the N-gram
Phrase Extractor program. Since the writing assignment required that the students write in favor of or opposition
to international adoptions, it was important to select word combinations that could be used in support of both
sides of the controversial issue. The subsequent manual investigation yielded additional word combinations that
were included in the 1nal list of target vocabulary items. There were 15 topic-induced word combinations used
in explicit instruction (i.e., a victim/s of violence; adoption agency/ies; corruption in a country/adoption; criteria for adoption;
foreign adoptions; inter-country adoptions; international adoptions; orphaned children; place a child for adoption; place a child in a
foreign family; prospective adoptive parents; reopen adoption to foreigners; requirements for adoption; to be adopted into; to be placed
with a family/families).
Materials and Procedures

The pretest was administered at the start of the term to cause the least disruption to the course schedule. On the
last day of the 1rst week of classes, the participants were directed to read three texts from the third unit in Raise
the Issues: An integrated approach to critical thinking (Numrich, 2009), the textbook for the course. The fourth text was a
completed 1ll-in-the-missing-words passage from the same textbook section. The assigned readings were on the
topic of international adoptions. The readings were accompanied by a set of comprehension questions that were
mainly included to improve the likelihood of students’ reading the assigned texts prior to class. The students were
further encouraged to complete the readings by being informed in the directions for homework that their
preparation for the class was expected and that their success in class was dependent on their completion of
homework assignments. On the 1rst day of the second week of instruction, as the course instructor devised, the
pretest was administered. The students were given an in-class 40-minute argumentative essay to write on the
topic of international adoptions. They had access to the reading materials on the topic. Essays were handwritten
and collected in the classroom.
The treatment was conducted at the conclusion of Week 7 and start of Week 8, the 1nal week of the term
as presented in Table 1. Over a period of four days, the treatment group received training on 15 target
structures. The intervention began when the treatment group was given a separate copy of the texts from Unit 3
in Raise the Issues: An integrated approach to critical thinking (Numrich, 2009) in which the target topic-induced word
combinations were bolded. The students were explicitly told that the bolded phrases were important in effective
discussions of the topic of international adoptions and were instructed to read the texts carefully outside of class.
In class, subsequently, the students were 1rst engaged in completing activities aimed at their ability to produce
the topic-induced phrases in controlled situations and then in the activities that allowed students to produce them
in their speech and writing.
Table 1
Overview of the Research Design
Week Treatment group Contrast group
2 Data collection (pre-test)
7-8 Explicit teaching No explicit teaching
8 Data collection (post-test)

Over a period of four classes, the students completed 1ve multi-step activities (see Table 2). They spent
about 60 minutes of class time on the activities. The teacher referred to the topic-induced word combinations as
“phrases”, monitored students’ production of the target phrases and provided feedback when necessary.
At the start of the 1rst class, students were given a passage of 249 words on the topic of international
adoptions to read as many times as they could within a 1ve-minute time frame. The passage was created by the
researcher based on the reading materials from the course textbook. After the students read the text, they were
presented with the same text but with segments of the target vocabulary removed. They were asked to 1ll in the
missing word parts and upon completion to compare answers with a partner. Next the students were presented
with a set of questions to productively recall the target items.
At the end of the second class, the treatment group was asked to do a matching cloze-type activity
consisting of selected topic-induced word combinations offered in a box and referred to as a “word bank” and
sentences with blanks. The activity required that students a) examine selected phrases in a word bank and
sentences below the phrases and b) complete the sentences using the items in the word bank. They were directed
to make changes to the phrases in order to produce grammatical sentences. Students worked in pairs.
In the next class, students were engaged in the 2/1/30 activity which is a modi1ed version of 4/3/2
activity (Nation & Gu, 2007). They sat in two rows facing one another. The learners sitting in one row were
assigned the role of a speaker and those siting in the other row the role of a listener. They were given a copy of
the text used in the activity from previous day to read as many times they could within three minutes. After
reading the passage, the speakers were directed to retell the passage to one partner within 2 minutes, to another
within one minute, and 1nally to the third within 30 seconds. The listeners were directed to listen, take notes, and
not to interrupt the speakers. Having delivered their speeches to three different partners, learners changed roles.
When done, learners were asked to brie3y review their notes and compare their own performance to the
performance of their partners when doing the speaking task.
The 1nal activity was entitled “Build an argument.” It was a two-part writing activity. The students worked
in pairs. First, the students were directed to utilize a selected subgroup of target word combinations in building
three arguments and write them down. One argument had to be written in support of international adoption.
The second argument had to be created in opposition to international adoption. For the third argument, students
could choose whether to support or refute the controversial issue. Once the writing was completed, students
underlined the target phrases in the written arguments and exchanged them with another pair of students for
peer review. In the second part of the activity, students were asked to revise and edit the three arguments
completed by the other group to the best of their abilities. They were asked to focus their attention to the use of
the target phrases (those that had been underlined in the arguments).
All of the activities as a type (e.g., matching, 1ll in the blanks, build an argument, etc.), with the exception
of 2/1/30 activity, were piloted with a group of high-intermediate students not included in the study. Based on
the input received from the teacher, the matching activity was modi1ed from a group to a pair activity and less
material was removed from target phrases in the 1ll-in-the blanks activities.
Table 2
Overview of the Activities by Lessons for the Treatment Group
Lesson Treatment group
1 5-minute read and word completion
Answering questions
2 Matching cloze
3 “2/1/30” activity
4 “Build an argument” activity
The posttest was administered at the end of the treatment, which coincided with the end of the term.
During the posttest, just as writers have access to their writing resources, participants in the study were allowed

access to the reading materials on the topic of international adoptions required for the course. The texts
accessible to the treatment group had no target phrases in bold type. The students wrote essays by hand. The
essays were collected in the classroom.
The contrast group, as noted previously, read and discussed the same texts as the treatment group. While
the treatment group was receiving explicit instruction, the contrast group was engaged in extended discussion
tasks based on the content of the reading materials, analysis of the arguments presented in the texts, and brief
writing-oriented tasks, as the teacher devised.
Instruments for Quantitative Data Elicitation And Evaluation

To examine the possible gains in learners' abilities to produce topic-induced word combinations in their writing,
students were given an in-class 40-minute argumentative essay to write on the following prompt from the
textbook:
“Some people agree with Thomas Atwood, the President of America’s National Council for Adoption,
who states: “National boundaries should not prevent abandoned children from having families.” Others
take the position that orphaned children should remain in their home countries. What is your stand on the
issue of international adoption? Should a country allow international adoptions or limit adoptions to
domestic adoptions only?”
For the assessment of the production of topic-induced word combinations in writing, a rubric, based on the scale
developed by Jones and Haywood (2004), was designed (see Figure 1). The rubric follows:
3 - correct phrase; spelling issues possible but cannot be mistaken for the issues with in3ectional
and/or derivational af1xation;
2 - correct phrase; problems with in3ectional morphology (e.g., reopen adoption to
foreigner instead of reopen adoption to foreigners)
1 - incorrect phrase but an attempt at production of correct phrase evident which can be described
as one of the following:
a) Problems with derivational morphology (e.g., victims of violent instead of victims of violence)
b) Substitution of a preposition (e.g., place a child of adoption instead of place a child for
adoption)
c) Omission of a function word inside the phrase (e.g., place a child adoption instead of place a
child for adoption)
0 - no attempt to produce a target phrase OR any combination of the issues described under the
rating of 1.
Figure 1. Scale for Measuring the Production of Topic-Induced Word Combinations in Writing.
Scoring and Analysis

The data for the study included the scores students received on the pre- and posttests on the production of topic-
induced word combinations in an unannounced in-class 40-minute argumentative essay. The pre- and posttest
essays were collected from the students, typed, and saved on a computer. An independent evaluator and the
researcher compared the electronic versions of the essays to the handwritten essays to ascertain that they were
entered correctly. The examination of the typed and handwritten essays revealed some minor inconsistencies.
These inconsistencies were corrected so that the essays used in the subsequent analysis accurately represented the
content of the handwritten in-class essays. The essays were coded and mixed to keep the data blind to the
researcher. The researcher conducted a lexical analysis of the pre- and posttest essays to extract the target topic-
induced word combinations. The average number of words per essay produced by the students in the contrast
group was 281 at the pretest and 366 at the posttest. The average number of words per essay for the treatment
group was 277 for the pretest and 359 for the posttest.

Two computer programs, namely, Text-Lex Compare v.2. 2 (Cobb, 2010) and Microsoft Windows version
2007, were used for identi1cation of the target topic-induced word combinations in participants' compositions.
While the former was employed to detect the presence of the target items in the students’ texts, the latter, with its
search feature “Find”, was used to identify the location of the target structures in the participants’ compositions.
Each time the target structure was located, the researcher examined the topic-induced word combination to
determine whether a) the form and use of the structure matched the form and use of the target item; b) the word
combinations were a part of students’ prose or the quoted and/or unquoted reference materials; c) there were
instances of an overlap of two or more target items. The researcher bolded all of the target items in the
document and recorded her notes in the table along with the results of the Text-Lex Compare program.
After the topic-induced word combinations identi1ed by the Text-Lex Compare program were located and
marked in bold in the text, the researcher continued the examination of the compositions using the Microsoft
Word program and its feature "Find" to locate possible 3awed structures (e.g., issues with spelling, problems with
morphology, dropped words within the formulaic sequences). The search was conducted by entering partially
realized forms of the target items as search criteria. To illustrate, when the essays were examined for the
occurrences of victims of violence, the following search criteria were submitted: victim and violen. The topic-induced
phrases that appeared in the essay prompt (orphaned children, international adoption) were included in the analysis.
The process of identi1cation of the target items in the students’ compositions was repeated three times
over a period of two days to assure the reliability of scoring of data. The researcher took 15- to 30-minute breaks
between searches after every 5 target items.
After the researcher located and bolded the target structures in the students' compositions, she reviewed the
essays to exclude from the analysis the word combinations that appeared to be a part of the material borrowed
from reference sources and not student-generated text. The researcher evaluated the formulaic sequences using
the scoring guide presented in Figure 1. The 1nal score given to an essay was a sum of the scores given to each
phrase occurrence in the text. If there were multiple occurrences of the same topic-induced word combination,
an average of scores assigned to each occurrence was computed and included in the calculations of the 1nal
score.
The data for the study included the scores students received on the pre- and posttests on the production of
topic-induced word combinations in an unannounced in-class 40-minute argumentative essay.
Results
Differences Between Contrast And Treatment Groups
Table 3 offers the means and standard deviations for the scores participants received on the production of topic-
induced word combinations in essays at the start and end of the term.
Table 3
Mean Scores and Standard Deviations for both Measures by Group
Measure n M SD n M S
D
Contrast Treatment
Pretest 19 4.08 2.16 35 3.79 1.90
Posttest 19 4.84 2.27 35 8.71 5.40
The research question that motivated this study was whether the students who received explicit instruction
improved their abilities to use the target topic-induced phrases in writing more, from pre-test to posttest, than
those who did not. To compare the gains over time between the two groups, an ANOVA with repeated measures
was performed with time (pretest vs. posttest) as a within and group (treatment vs. contrast) as a between subjects
factors. The assumptions of normal distribution of data and the homogeneity of variances were not met.

Larson-Hall (2016) explains that the problem with violating these assumptions is that statistical differences that
exist between groups of participants may not be found (p.100). The analysis for this study 1nds statistically
signi1cant results as described below.
There was a statistical interaction between group and time, meaning that the groups did not perform the
same way at the two time periods (F(52,1=10.84, p=.0017886, generalized eta-squared =.08). The interaction
between group and time accounted for 8% of the variance in the model. Because there were only two choices
(one for time and another for group), data about sphericity was not offered.
There was also statistical effect for time (F(52,1)=32.75, p<.0001, generalized eta-squared=.20). In this
model, time makes a bigger difference to the variance, accounting for 20% of the variance. Since there are only
two times tested, from the mean scores (see Table 1), it is concluded that the participants did better on the
posttest than the pretest. There was a statistical effect for group (treatment vs. contrast), ( F(52,1)=5.27, p=.03,
generalized eta-squared=.06). The effect for group was not as great as the effect for time, accounting for 6% of
the variance. Since there are only two groups, from the mean scores (see Table 1) it is concluded that the
treatment group performed better than the contrast group.
The results suggest that that both groups made gains in their abilities to produce topic-induced word
combinations from pretest to posttest, but that the treatment group had greater gains than the contrast. Such
1ndings suggest that, at least for the intermediate ESL writers, those students who receive direct instruction seem
to improve their abilities to employ the topic-induced word combinations in their compositions more than the
learners who do not.
Interviews
Follow-up interviews were conducted with a subset of participants from the treatment group who were selected
on the basis of their abilities to produce topic-induced word combinations on the posttest. Three informants
were male and two were female. Interviews followed a semistructured guide comprised of open-ended questions
about the students’ backgrounds, academic goals, English language training, and, more importantly, about the
strategies students applied to producing the phrases and the attitudes towards the instructional intervention (see
section Overview of the research design for speci1c questions). Interviews were conducted and tape-recorded by the
researcher. The researcher listened to the information as many times as was necessary in order to represent the
information accurately and take notes while listening. The researcher analyzed the data from the interview by
looking for patterns in the responses of the informants. Pseudonyms are used for all of the informants to ensure
con1dentiality. Their language and education pro1les are presented in Table 4.
Table 4
Informants’ Language and Education Pro@les
Al Jumi Jack Jihan Ju
Language Japanese Arabic Turkish Portuguese
Educational High-school diploma from a home county Master’s degrees in Bachelor’s degree in
Background business and in business from a home
business country
administration from a
home country
Future plans University University education in Employment in home Employment in the US
education in the US country
home country
Al was a recent high-school graduate from Japan. He had lived in the US for two months. His academic
goal was to pursue a degree in teaching English as a foreign language in his native country. Prior to enrolling in
the ESL classes in the US, he had had little opportunity to write extensively in English. He reported that when
writing the in-class essay at the beginning of the semester, he was focused on the content of his composition and

the ideas to use in support of his position on international adoptions; however, when writing in-class on the same
topic at the end of the semester and after having been taught the target phrases, he was focused more on the
vocabulary, paying attention not only to what to say but also how to say it. He used the target phrases in the end-
of-the-semester timed essay because they were important in the discussion of the topic and because he felt that
the phrases could help him express ideas clearly. His attitude towards all of the in-class vocabulary activities was
positive. He felt that all activities provided substantial practice in production. From the course syllabus and
previous writing experience in class, he knew that he would be expected to write another essay, so he paid
attention to the activities in class.
Jumi, similar to Al, recently graduated high-school in Japan. She had been studying English in the U.S. for
10 months during which time she had completed four terms at the IEP. Her plan was to study sports medicine at
a university in the United States. Although she had taken four writing classes prior to participation in the study,
she found writing dif1cult. She explained that the lack of knowledge on the essay topic and of the words to use to
discuss the topic were reasons for a limited use of the topic-induced phrases in the in-class pretest essay. This was,
however, not the case on the posttest when she employed the target phrases in her composition. Among the
activities used in teaching topic-induced phrases in the writing class, Jumi found the one with an immediate
connection to her own writing the most useful. When discussing the pros and cons of international adoptions,
she was in favor of inter-country adoptions, which is why she found useful writing an argument for foreign
adoptions, a segment of the Build an Argument activity and why she viewed negatively the other activity segment
asking her to write against inter-country adoptions. Speaking under time constraints for the 2/1/30 activity was
not enjoyable. The remainder of the activities used in class she found moderately useful.
Ju was a female participant from Brazil. She held a bachelor’s degree in business from her native country
and had been attending ESL classes at the IEP for thirteen months with the goal of 1nding employment in the
US. Similar to Jumi and Al, Ju was searching for ideas to use in the essay paying limited attention to the
vocabulary to use. On the posttest, however, having realized that the target phrases were important and necessary
in a discussion of the topic, she purposely used the target phrases and alternated synonymous phrases (e.g., inter-
country adoption, foreign adoption) to improve the quality of her text. Ju reported learning the target phrases on the
topic of adoption in class and was proud that at posttest, she was able to write them down from memory. She
found the phrases taught in class very useful because they related to the topic of the essay she would be asked to
write next. She pointed out how the teacher had been using them in class, how the peers produced them in class
discussions, and how the authors employed them in the texts she read. She saw a purpose in using the topic-
induced phrases in her writing. Similar to Al, she reported that the phrases were important. They helped her
express ideas clearly and talk about the same idea without repeating the same phrase. She concluded her answer
to the question on how she went about using the target phrases in a downward tone indicating that there was
nothing more to be said except I used them because I had to use them! Ju reported that among the activities used in
teaching topic-induced formulaic sequences the activity Build an Argument was the most useful and could not think
of any activities used in instruction that were not helpful to her.
Jihan was a male participant from Turkey. He held one master’s degree in business and another in
engineering. He had been in the US for about nine months. He had completed four sessions of ESL classes. His
professional goal was to 1nd employment in a prestigious foreign 1rm in his home country. He explained that the
phrases taught in class were not the vocabulary he felt he needed to learn. The vocabulary he explored and
focused on, was the vocabulary he self-selected either because the items were new or interesting to learn. He said
that when writing essays for the class, he was focusing on creating a well-organized, uni1ed, and coherent essay; it
was a problem for him to focus on vocabulary. His approach was to think in his native language and then
translate to English, paying special attention to the writing conventions taught in the writing class. Although
Jihan’s attitude towards the instruction on topic-induced phrases in the writing class was generally negative, he
thought that Build an Argument activity was useful.
Jack was a male student from the United Arab Emirates. He had been in the United States for a year and
three months. His goal was to continue his academic studies in the United Sates. He did not use the target

phrases in his writing because he wanted to talk about adoptions in general not necessarily about international
adoptions. His attitude towards the activities used in instruction of topic-induced phrases was generally neutral
but he, similar to other informants, had a more positive attitude towards the activity Build an Argument.
In summary, while each interview participant reported dif1culties in focusing on the vocabulary aspect of
their writing at pretest, only those with a positive attitude towards instructional intervention at posttest, knew the
content enough to allocate attention to the use of target phrases. This group of students concurred that the
topic-induced word combinations taught in class helped them express their ideas better and clearer, which is one
of the main reasons why students employed them in writing. On the other hand, those who failed to recognize
the contribution the target phrases make to the discussion of the topic as well as to appreciate most of the in-
class vocabulary-focused activities, also failed to use the topic-induced phrases in their writing. Interestingly
enough, when a vocabulary-focused activity was both integrated with the writing task and also closely aligned
with the major writing assignments, all of the interview participants expressed appreciation for the teaching
strategy.
Discussion
The 1ndings of this study suggest that ESL learners can improve their abilities to use topic-induced word
combinations in writing when reading texts on a given topic and discussing their content with peers in class.
These results are not surprising. The target phrases the study considered are essential for an effective discussion
of a topic (Erman, 2009); that is, when relatively few topic-induced phrases are used, they key a reader into the
content of the text. Students seemed to recognize this. Also, in class they had exposure to the target items; for
four days, they read on the topic and discussed the readings. Because the students had access to the reading
materials as they wrote their essays both at pretest and posttest, it cannot be claimed that they produced the
target phrases from memory. This may apply particularly to the production of the two target phrases ( international
adoption and orphaned children) that were additionally present in the writing prompt (see section Instruments for
qualitative data elicitation and evaluation) given that the previous research reports that ESL writers often borrow lexical
phrases from the writing prompts they are given (e.g., Ohlrogge, 2009), There were also no participants from the
contrast group interviewed to provide further evidence on how they went about using the phrases in their essays.
What we do know, however, is that through extended exposure they became familiar enough with the target
phrases to recognize their usefulness and employ them in their own writing. What we still need to 1nd out
through a qualitative analysis is which types and forms of the target topic-induced phrases the students in the
contrast group used in their essays.
The study 1ndings also suggest that the ESL students who receive explicit instruction improve their abilities
to employ the topic-induced word combinations in their compositions more than the learners who do not receive
this instructional intervention. These 1ndings, to an extent, support the 1ndings of Lee (2003) and Lee and
Muncie (2006) on the positive effects of direct instruction on the topic-related vocabulary use in writing. The
1ndings of this study were a result of carefully planned explicit instruction consisting of giving students reading
materials with topic-induced word combinations in bold type; stressing the contribution of the target phrases to
the message of a text; having students produce the topic-induced phrase in controlled situations; directing them
to read, listen, speak, and write the target phrases in an activity under time constraints; and asking them to use
the target phrases in a writing task that is aligned in purpose with the very next major written assignment.
Another very important feature of the instructional intervention was that the target phrases were assumed to be
useful to L2 writers because they had an immediate application to their writing. The 1ndings of the study
support the call for integration of the explicit teaching of vocabulary in writing (i.e., Coxhead & Byrd, 2007;
Folse, 2008; Schmitt, 2000), particularly the teaching of vocabulary students need for their writing (Folse, 2008).
Where discrete differences in the use of the target topic-induced phrases lie between the two groups of students
may be more directly observed through a qualitative analysis of the types and forms of the target topic-induced
phrases in the students’ essays. It might be that the students receiving direct instruction were able to use a greater

variety of the phrases with, perhaps, better accuracy in their end-of-the-term essays. If so, it might be that the
instructional intervention helped students improve the vocabulary use overall in their essays. However, whether
or not the use of the topic-induced phrases helped students in the contrast group improve lexical quality of their
writing remains to be investigated.
As noted previously, the students were allowed reading materials as they wrote their essays, so it may or it
may not be the case that they were producing the phrases from memory. What we do know is that due to explicit
teaching, they recognized the relevance and utility of the target phrases to their own writing more than the
students in the contrast group did; and thus incorporated the phrases better in their compositions written at the
end of the term. This suggests, as previous research within the contexts of L1 (Cortes, 2006) and L2 (Jones and
Haywood, 2004) academic writing has indicated, that due to direct instruction, students may increase their
awareness about the importance of the use of multi-word combinations in writing. It is possible, however, that
some of the students in the present study learned, due to the treatment, the target phrases well enough to
produce them from memory. One of the informants who was considered a high performing participant based on
her ability to use the target phrases at posttest, claimed to have recalled the target items from memory.
The interview data provided details about the students’ strategies for production of the target phrases in
writing, their attitudes towards the target phrases, and the activities used in explicit instruction. The participants
concurred that their written production was affected by their perceived need to employ the target phrases in their
writing. The informants who understood how relevant the target phrases were to the topic their essays examined,
were those who employed them more in their writing, and those who did not, chose, for the most part, to
disregard them. In addition, it seemed that most of the time, the production of the target phrases was motivated
by students’ intention to showcase knowledge on the topic.
Additionally, helping students realize the utility of the topic-induced phrases in the reading materials on a
speci1c subject is worth noting. Some students were alerted to the importance of the topic-induced phrases upon
receipt of the reading materials with the target phrases in bold type reoccurring in a single and/or across
multiple texts.
With respect to the strategies for production of the target phrases in writing on the 1rst timed essay,
students grappled with generating content for their essays which ultimately affected the vocabulary choices they
made, so fewer target phrases were used. On the second timed essay, the high performing students felt they knew
the content enough to pay attention to how to convey meaning with precision and clarity that topic-induced
phrases allowed.
Relative to the activities used in the instructional intervention, the interview data indicate that high
performing students value all of the activities focusing on the topic-induced phrases while low performing
students enrolled in writing classes appreciate activities with a direct connection to their own writing. All of the
informants, low performing and high performing alike, noted that one activity that resembled the upcoming
major assignment in purpose and content was most useful.
There are several limitations to be noted in the present study. First, the number of participants in the study
was small and they were all at one level of language pro1ciency (i.e., high-intermediate). To obtain more
generalizable results and to compare the effect of treatment across pro1ciency levels, future research would need
to include more participants at various levels of language pro1ciency. In addition, since the reading materials
were accessible to the students during the writing sessions, the study could not gather the information on the
effects of explicit instruction on the students’ abilities to produce topic-induced phrases in free production. Third,
in an effort to minimize the task effects on the students in the treatment group and also to avoid possibly alerting
students to the study begin conducted, the target phrases related to the topics of the two other essays were
explicitly taught prior to submission of their respective 1nal drafts. Although the topic-induced phrases
concerned topics different from the one used in data collection, the explicit teaching sessions were similar to the
treatment activities before the data collection in that that the students received reading material with the target
phrases marked in bold and completed activities that focused on the production of the target phrases. Future
research could control for this variable. Fourth, in an attempt to minimize the teacher-investigator variable in the

study, the course instructor was different from the study investigator. The researcher was present on the days
when the data for the study was collected. She was in regular contact with the course instructor to provide
materials for the study, to con1rm with the teacher that vocabulary was not explicitly taught during the data
collection from the contrast group, and to receive reports on the delivery of the explicit teaching sessions;
however, observations of actual teaching were not conducted. Future research should consider including
observations of the teaching sessions or possibly recording the session for later viewing and review. Fifth, the
present study did not examine descriptively the types and forms of the target topic-induced phrases in the essays
written by the contrast and treatment groups nor did it explore whether and to what extent the treatment had an
impact had on the students’ quality of writing. Further research on the aforementioned limitations is warranted
to re1ne our understating of the effects of explicit teaching of the topic-induced phrases on ESL writers.
Conclusion and Implications for Practice

Many ESL writing teachers would agree that in order to write well, their students not only need to use words of
various sorts but also know the ways in which these words combine with other words in context. As students
generate content on a speci1c topic, if they are to achieve precision and maturity in writing, they need to use
topic-induced lexical phrases. The present study was carried out to investigate the effects of explicit teaching of
topic-induced multi-word combinations on ESL writers. The study found that, while both groups of ESL writers
improved over time, it was the students who received explicit instruction that made more gains in their abilities to
use topic-induced multi-word combinations in their writing that those who did not. Through interviews with
selected students from the group receiving instructional intervention, the study found that learners’ perceptions
of the usefulness of the topic-induced phrases may in3uence whether or not students employ them in writing.
With respect to the activities used in explicit teaching of topic-induced phrases, the interview’s 1ndings indicated
that students appreciated tasks most closely aligned in purpose with the upcoming major written assignments.
In terms of instructional practice, the present study brings attention to the role explicit teaching
contributes to ESL students’ abilities to produce topic-induced phrases in their writing. Teachers should be aware
that when provided alongside opportunities to read and discuss reading materials on the topic of writing, direct
teaching of topic-induced phrases seems to bene1t ESL writers. While some learners may employ the target
phrases primarily to improve clarity of expression and ideas and others to springboard their own writing, they all
generally consider topic-induced phrases important to the discussion of the topic in their essays.
In order to plan for explicit instruction, teachers need to identify and select target topic-induced multi-
word phrases. The target structures in the present study were extracted from the materials students read in
preparation for the writing assignment. The reading materials were, thus, used both as scaffolds in the writing
process as well as the contexts of use of the target topic-induced phrases. In the process of identi1cation of the
target topic-induced phrases, teachers may, among others, use the software programs noted in this study. Writing
teachers who may feel apprehensive about the use of corpus-based tools could, perhaps, working together with
one or more colleagues review selected reading materials with the following two questions in mind: a) Which
words related to the content of this text, would I use in my composition on the topic of . . . and b) Which word
partnerships do these words hold? The 1rst question, teachers would complete by including the topic of the
essay assigned to the students. The second question requires that they read individual sentences in which the
target words appear to examine the contexts both to the right and to left of the target word looking for
partnerships the target words hold. Upon review, the instructors would compare their data to select for explicit
instruction those topic-induced multi-word combinations that are shared between the lists.
Having created a topic-induced phrase inventory, writing instructors would then plan how to go about
teaching the phrases explicitly to stimulate students’ production of topic-induced phrases in compositions. With
respect to the approach to explicit teaching of topic-induced phrases, this paper suggests that writing teachers
provide students with reading material in which target structures were made salient, direct learners to read and
re3ect on the texts, engage students in activities in which they read, listen, speak, and write the target phrases.

Relative to the design of tasks that integrate vocabulary and writing, teachers may want to link them as closely as
possible to the purpose for which students are writing their major assignments. By so doing, they are more likely
to contextualize explicit teaching of the topic-induced word combinations thus making instruction meaningful to
the students.
References
Barkaoui, K. (2010). Do ESL Essay Raters’ Evaluation Criteria change with experience? A mixed-methods,
Cross-Sectional study, TESOL Quarterly, 44, 1, 31-57.
Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for Speci1c
Purposes 26, 263-86.
Biber, D., & Conrad, S. (1999). Lexical bundles in conversation and academic prose. In H. Hasselgard and S.
Oksefjell (Eds.), Out of Corpora: Studies in Honor of Stig Johansson (pp. 181-190). Amsterdam:
Rodopi.
Biber, D., Johansson, D., Leech, G., Conrad, S., & E. Finegan, E. (1999). Longman grammar of spoken and
written English. London: Longman.
Cobb, T. (2010). N-gram Phrase Extractor (Version 4) Retrieved from http://lextutor.ca/tuples/eng/
Cobb, T. (2010). Text Lex Compare (Version 2.2). Retrieved from http://www.lextutor.ca/text_lex_compare/
Cortes, V. (2006). Teaching lexical bundles in the disciplines: An example from a writing intensive history class.
Linguistics and Education, 17, 391-406.
Cowie, A.P. (1992). Multiword lexical units and communicative language teaching. In Arnaud & H. Bejoint
(Eds.), Vocabulary and applied linguistics. London: MacMillan.
Coxhead, A. (2000). A new academic word list, TESOL Quarterly, 34, 213-238.
Coxhead, A. (2008). Phraseology and English for academic purposes: Challenges and opportunities. In F.
Meunier and S. Granger (Eds.), Phraseology in foreign language learning and teaching (pp. 149-162).
Amsterdam, PA: John Benjamin’s Publishing Company.
Coxhead, A. (2012) Academic vocabulary, writing and English for academic purposes: perspectives from second
language learners. RELC Journal, 43, 137–45.
Coxhead, A. (2014). New Ways in Teaching Vocabulary. Alexandria, Virginia: TESOL Inc.
Coxhead, A., & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar of academic
prose. Journal of Second Language Writing, 16, 129–147.
Ellis, N., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second language speakers:
Psycholinguistic, corpus linguistics, and TESOL. TESOL Quarterly, 42, 375-396.
Ellis, R. (2009). The Study of Second Language Acquisition. Oxford: Oxford University Press.
Engber, C. A. (1995). The Relationship of Lexical Pro1ciency to the Quality of ESL Compositions, Journal of
Second Language Writing, 4, 139-55.
Erman, B. (2009). Formulaic Language from a learner perspective: What the learner needs to know. In R.
Corrigan, E. A. Moravcssik, H. Ouali, and K.M. Wheatley (Eds.), Formulaic Language, Volume 2, (pp.
323-346) Philadelphia: John Benjamin Publishing Company.
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20, 29–62.
Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 pro1ciency.
TESOL Quarterly, 28 (2), 414-420.
Ferris, D. (2015). Supporting multilingual writers through the challenges of academic literacy: Principles for
English for academic purposes and composition In N. W. Evans, N.J. Anderson, W. G. Eggington., (Eds.),
ESL Readers and Writers in Higher Education: Understanding Challenges. New York: Routledge.
Folse, K. (2008). Myth 1: Teaching vocabulary is not the writing teacher's job. In J. Reid (Ed.), Writing Myths:
Applying Second Language Research to Classroom Teaching (pp.1-17). Ann Arbor, MI: University of Michigan
Press.

Granger, S. (1998). Prefabricated patterns in advanced EFL writing. Collocations and formulae. In A. P. Cowie
(Ed.), Phraseology: Theory, analysis, and applications (pp. 146-160). Oxford: University Press.
Harley, B., & King, M. L. (1989). Verb lexis in the written compositions of young L2 learners, Studies in Second
Language Acquisition, 11, 415-439.
Hinkel, E. (2004). Teaching academic ESL writing: Practical techniques in vocabulary and grammar. Mahwah,
NJ: Lawrence Erlbaum Associates.
Howarth, P. (1998). The phraseology of Learners’ Academic Writing. In A.P.Cowie (Ed.), Phraseology: Theory,
analysis and applications (pp. 161-187). Oxford: Clarendon Press.
Hyland, K. (2008). Lexical clusters: Text pattering in published and post-graduate writing, International Journal
of Applied Linguistics, 18, 41-61.
Jacobs, Har1eld, Hughey, & Wormeth (1981). Testing ESL composition: A practical approach. Boston: Newbury
House.
Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sequences. In N. Schmitt, (Ed.),
Formulaic sequences (pp. 269-300), Philadelphia: John Benjamins Publishing.
KeyWords Extractor (Version1)
Larson-Hall, J. (2016). A Guide to Doing Statistics in Second Language Research Using SPSS and R (2nded). New York:
Routledge.
Lee, S. H. (2003). ESL learners’ vocabulary use in writing and the effects of explicit vocabulary instruction.
System. 31, 537-561.
Lee, S. H. & Muncie, J. (2006). From receptive to productive: Improving ESL learners’ Use of vocabulary in a
Postreading Composition Task. TESOL Quarterly, 40, 295-320.
Leki, I., & Carson, J. (1994). Students’ perceptions of EAP writing instruction and writing needs across the
discipline. TESOL Quarterly, 28, 81-101.
Lewis, M. (1997). Pedagogical implications of the lexical approach. In J. Coady 8: T. Huckin (Eds.), Second
language vocabulary acquisition: A rationale for pedagogy (pp. 255—270). Cambridge: Cambridge University
Press.
Li, J., & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: A longitudinal case study.
Journal of Second Lang Writing, 18, 85-102.
Linnarud, M. (1986). Lexis in composition: A performance analysis of Swedish learners’ written English.
Malmö, Sweden: Liber Fölag Malmö.
McClure, E. (1991). A comparison of lexical strategies in Ll and L2 written English narratives. Pragmatics and
Language Learning, 2, 141-154.
Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press.
Nation, I. S. P. (2005). Teaching and learning vocabulary. In E. Hinkel (Ed.), Handbook of Research in Second
Language Teaching and Learning (pp. 581-596). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Nation, I.S.P. & Gu, P. Y. (2007). Focus on Vocabulary. Sydney, Australia: National Center for English Language
Teaching and Research Macquarie University.
Numrich, (2009). Raise the Issues: An integrated approach to critical thinking. Pearson Education ESL.
Ohlrogge, A. (2009). Formulaic expressions in intermediate EFL writing assessment. In R. Corrigan, E. A.
Moravcssik, H. Ouali, and K.M. Wheatley (Eds.) Formulaic Language, Volume 2, (pp. 375-386)
Philadelphia: John Benjamins Publishing Company.
Santos, T. (1988). Professors' reactions to the academic writing of nonnative-speaking students, TESOL
Quarterly, 22, 69-90.
Scott, M., & Tribble, C. (2006). English for academic purposes: Building an account of expert and apprentice
performances in literary criticism. In M. Scott and C. Tribble (Eds.), Textual patterns: key words and
corpus analysis in language education (pp. 131-159). Amsterdam, PA: John Benjamin’s Publishing
Company.

Simpson-Vlach, R., & Ellis, C. N. (2010). An academic formulas list. New methods in phraseology research.
Applied Linguistics, 31, 487-512.
Song, B. & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native-English
speaking and ESL students? Journal of Second Language Writing, 5, 163-182.
Thornbury, S. (2002). How to Teach Vocabulary. Harlow: Pearson Longman.
Yorio, C. A. (1989). Idiomaticity as an indicator of second language pro1ciency. In K. Hyltenstam & L.K. Obler
(Eds), Bilingualism across the lifespan (pp.55-72), Cambridge: Cambridge University Press.
Zimmerman, C. B. (2009). Word knowledge: A vocabulary teacher's handbook. Oxford: Oxford University Press
About the Author

Jelena Colovic-Markovic is an Assistant Professor of TESOL at West Chester University, West Chester, PA. Her
research interests focus primarily on the knowledge and instruction of second-language vocabulary, L2 writing,
and applied corpus linguistics.

The Effect of Input Enhancement on Vocabulary Learning:

Is There An Impact upon Receptive And Productive Knowledge?
Christian Jones
University of Liverpool, UK*
Daniel Waller
University of Central Lancashire, UK**
Abstract
This article reports on a quasi-experimental study investigating the effectiveness of two different teaching approaches,
explicit teaching and explicit teaching combined with textual and aural input enhancement used to teach lexical items to
elementary level learners of Turkish in a higher education context. Forty participants were divided into two equal groups
and given a pre-test measuring productive and receptive knowledge of nine targeted lexical items naming common types of
food and drink. Each group was then given sixty minutes instruction on ‘restaurant Turkish’, using a direct communicative
approach. Group one (contrast group) received explicit teaching only, while group two (treatment group) received the same
teaching but also used a menu where the target words were bolded (textual input enhancement) and listened to the target
words modelled by the teacher three times (aural input enhancement). Following the treatment, tests measuring productive
and receptive knowledge of the target items were administered. This process was repeated with a delay of two weeks
following the treatment. Analysis of gain scores for receptive and productive tests made at the pre-, post- and delayed stage
reveal larger gains for the treatment group in each test. These were statistically signi(cant when compared with the contrast
group’s scores for production at the immediate post-test stage. Within group tests showed that each treatment had a
signi(cant impact on receptive and productive knowledge of vocabulary targeted, with a larger short term effect on the
treatment group. Previous studies in this area have tended to focus on the use of input enhancement in relation to the
learning of grammatical forms but these results demonstrate some clear bene(ts when teaching lexis, which have clear
implications for further research and teaching.
Key words: Input enhancement; textual enhancement; aural enhancement; Turkish vocabulary; beginners
Introduction
The importance of learning vocabulary explicitly from the early stages of studying a second language is now
well-established (e.g., McCarthy, 1999; Schmitt, 2000.) While there has also been a great deal of research which
gives clear suggestions and about how many and which lexical items and chunks may be of primary importance
to teach learners (e.g., O’Keeffe, McCarthy & Carter, 2007; Shin & Nation, 2007), there is less consensus about
how instruction can best aid this process. There is evidence that explicit teaching of grammatical and lexical
items has a greater impact upon learning than implicit teaching (Norris & Ortega, 2000, 2001; Spada & Tomita,
2010) but as yet there are no de(nitive answers to the type of explicit teaching which results in the most effective
learning of second languages. It may also be the case that the effects of explicit teaching can be increased
* Tel. 44 1517942724, Email: Christian.jones2@liverpool.ac.uk, Department of English, 19-23 Abercromby Square, University
of Liverpool, Liverpool, L69 7ZG
**Tel. + 44 1772 893672, Email: dwaller@uclan.ac.uk, School of Language and Global Studies, University of Central
Lancashire, Preston, PR1 2HE

through making the input learners receive as salient as possible. One area of consistent focus in the research has
been upon the use of input enhancement (IE) as a means of promoting noticing and learning and in particular
upon the use of textual enhancement (TE) of various kinds. TE commonly involves enhancing a text though
making target items bold, italicised or underlined.
The impact of TE has been researched in regard to a range of second languages alongside forms of
explicit teaching (e.g., Alanen, 1995) and as a variable in its own right (e.g., Petchko, 2011) but results have been
mixed (Han, Park & Combs, 2008). Aural enhancement (AE), whereby listening texts are manipulated to increase
the saliency of target items (such as making the recording of those items louder or repeating target items) has
been researched a great deal less, and what results which are available are similarly inconclusive (e.g., Reinders &
Cho, 2011). However, much TE research has focused upon grammatical structures as opposed to lexical items
and AE and TE have been under-researched in combination with explicit teaching. This study is an attempt to
(ll this gap and provide some evidence that TE can be a useful addition to vocabulary learning, as it can quickly
draw learners’ attention to the form and use of a word, something Nation (1999) suggests can be helpful. As a
strategy, TE has the bene(t of being potentially extremely versatile. It could be used for incidental learning or
targeted learning based on resources such as Coxhead’s Academic Word List (2000) or even used by learners
themselves as a deliberate learning strategy. The use of such strategies by learners has been identi(ed by Folse
(2004) as an essential feature of successful vocabulary learning.
Input Enhancement
The term ‘input enhancement’ is credited to Sharwood Smith (1991, 1993), who suggested that some form of
enhancement may be helpful to make input more salient to learners. Without such salience, he suggests, learners
may fail to notice forms within the input they receive because much input is likely to be processed for meaning.
Noticing, as described by Schmidt (1990, 1995, 2001, 2010) can be de(ned broadly as ‘conscious registration of
attended speci(c instances of language’ (Schmidt, 2010, p.725). It is this conscious registration which is
considered to be the (rst step needed to convert input into intake and input enhancement may be viewed as one
type of ‘consciousness raising’ (Sharwood Smith, 1981) activity, which teachers and researchers can use to help
learners notice forms within input they comprehend. Sharwood Smith (1993) suggests a number of methods
which might be used to enhance input, including the bolding of texts for visual input and repeating targeted
items for aural input. The use of input enhancement would seem to be of particular relevance in the instruction
of beginners since in the initial stages of learning a new language all input is potentially signi(cant to the learner
and there is often little indication as to which pieces of language are essential or more useful in the long term.
There have been a number of studies aimed at investigating the effect of input enhancement upon the
learning of targeted forms. Many studies of this nature have focussed upon textual enhancement (TE), either as
a variable of its own or in combination with other variables such as input Aood or explicit rule-based instruction.
In a review of the research in this area Han, Park and Combs (2008) found that most research has sought to
compare TE with another form of instruction such as explicit rule-based instruction or output practice and that
TE has often been combined with additional means to augment its effect, such as asking learners to attend to the
targeted form. Studies have generally focused on grammatical forms in a variety of languages, including English
relative clauses (Doughty, 1991), Spanish impersonal imperative (Leow, 2001), French past participle agreement
in relative clauses (Wong, 2003) and English passive forms (Lee, 2007), although some studies have concentrated
upon lexical items (e.g., Kim, 2003; Petchko, 2011). Treatments have varied greatly in length from (fteen minutes
to two weeks as have sample sizes, which have varied from fourteen to two hundred and (fty nine participants
(Han, Park & Combs, 2008). Generally, studies have employed an experimental design, employing a pre-test,
treatment and immediate post-test design, with a tendency not to include a delayed-test. Results have mostly
been measured by analysing productive and receptive tests statistically, although there is inconsistency in the type
of test employed. For example, some studies (e.g. Reinders & Cho, 2011) have just used one receptive test type,
commonly a grammaticality judgement test. Several studies have attempted to measure noticing through

measures such as think-aloud protocols ( e.g., Alanen, 1995; Rosa & O’Neill, 1999) and to employ such data to
demonstrate that learners who noticed aspects of the targeted language achieved better results in tests.
Perhaps because of the varied nature of the studies, results indicating positive effects for TE have
themselves been mixed in terms of its impact upon noticing and learning of the targeted forms. Doughty (1991),
Shook (1994) and Alanen (1995) for example, all report that TE had some positive effects on learning of the
targeted forms, whilst Izumi (2002) and Wong (2003) report that there were no positive effects on learning. Other
studies report mixed results, with TE having a positive impact in the area of noticing but not in terms of learning
(e.g., Izumi 2002), and in some cases there were no discernible effects with regard to noticing or learning (e.g.,
Leow, Egi, Nuevo & Tsai, 2003; Petchko, 2011). Jourdenais, Stauffer, Boyson and Doughty (1995) did (nd that
TE had a signi(cant impact upon noticing and immediate production of the targeted forms but the lack of
delayed test makes it dif(cult to suggest the forms were acquired. A possible cause of the mixed results may also
be simply that not all studies have sought to measure both noticing and learning (Han, Park & Combs, 2008,
p.602) and there has often been a presumption that TE will cause noticing and therefore learning will follow and
thus it is only noticing of the forms in focus which needs to be measured. This is in itself not entirely
unreasonable if we accept Schmidt’s often quoted assertion that ‘noticing is the necessary and suf(cient condition
for converting input to intake’ (Schmidt, 1990, p.129) but there is a case for suggesting that researchers need to
differentiate between what learners have noticed and what they appear to have acquired and are able to produce.
Although these are not mutually exclusive, we would suggest that not every aspect of language which learners
notice will be acquired, in the sense that learners will be able to produce it. Measuring an internal process
(noticing) is also not without dif(culty and the use of measures such as think-aloud protocols have been criticised.
Barkaoui (2011, p.53) identi(es the issue of veridicality in such approaches where relationships between the
unconscious processing and the measurement process are indirect at best and relationships can only be inferred.
Dornyei (2007, p.148) notes that thinking aloud whilst performing a task it is not a natural process and therefore
requires some training. This training may result in what Stratman and Hamp Lyons (1994) term reactivity in that
it inAuences the kind of data produced so that learners produce more (or fewer) instances of noticing than they
would otherwise do. The method also relies on a learner’s ability to verbalise what they have noticed and it will
clearly be the case that some learners may be more con(dent at expressing this in a written form, either as they
notice, or after noticing. For these reasons, as we will discuss, we suggest that noticing can be measured through
testing receptive knowledge and learning through testing productive knowledge and areas of crossover can then
be analysed.
While studies in TE have been frequent, those employing aural enhancement (AE) have been much less
frequent. H. Y. Kim (1995) reports on an early study which attempted to explore whether AE could inAuence the
phonological aspects which learners perceive in connected speech. Two groups of Korean learners of English
were asked to listen to a series of short texts and complete a visual comprehension task, choosing a picture which
best matched each passage. For one group, the speed of speech was slower with more frequent pausing at phrase
boundaries, while the other group listened to the texts at normal speed. Immediately following the listening,
students were asked what they had heard and why they chose each picture. Student reports suggested that the
elements of speech which students comprehended most easily were words which contained tonic syllables within
a tone unit, suggesting that slower speech may allow a greater chance to perceive these elements. However,
results indicated that there were no statistically signi(cant differences between the groups in terms of
comprehension.
There have also been a number of studies conducted using enhanced listening materials, particularly with
video (e.g., Baltova, 1999; Hernandez, 2004; Grgurović & Hegelheimer, 2007). However, these studies have
focussed upon different effects of TE upon listening, such as the extent to which listening comprehension and
intake can be aided by subtitled video or by using transcripts while listening. Although the effects have been
positive in some cases (e.g., Baltova, 1999) these results have not been consistent across a number of studies
(Perez & Desmet, 2012). In addition, use of subtitles and transcripts is perhaps better described as TE and not
AE because nothing has been done to enhance the recordings themselves.

Jensen and Vinther (2003) did examine the use of repetition of listening materials as a form of AE.
Students learning Spanish were played the same DVD material three times and given different treatments. Each
group heard the clip three times, either fast-slow-fast, fast-slow-slow or fast-fast-fast as treatment between pre-
and post-tests. No signi(cant differences were found between the treatment types but there was a signi(cant effect
of all treatments when compared to a control group. This suggests that all forms of repetition as AE had a
positive effect in this study. Reinders and Cho (2010, 2011) conducted a study using digital technology to aurally
enhance adverb placement and passives with sixteen Korean learners of English. The volume was raised on each
instance of the targeted structures in an audio (le given to students, whilst a contrast group was given the same
audio (le but without the targeted structures being enhanced. Each group was asked to listen to the audio (le
once and were given no further instructions. Despite the interesting nature of the study, no statistical differences
were found in the test results of each group and some participants even reported that the raised volume was
distracting.
Whilst the body of research in TE in particular is plentiful, there are clearly some elements which have
been under-researched and aspects of study design which have been inconsistent. The (rst of these is the failure
in some cases to provide both receptive and productive tests as a measure of the treatment given, something
Schmitt (2010) suggests is vital when assessing receptive and productive knowledge as aspects of vocabulary
learning. Clearly, if a learner can recognise a correct form in a measure such as a grammaticality judgement test,
this only provides evidence of receptive knowledge of the item in question. It cannot be equated with an ability
to produce the target items. Providing both types of test can help us to measure noticing (receptive tests) and
learning (productive test)s. Tests of lexis also have to be developed in order to assess the aspects of lexical
knowledge that are relevant to the situation. Nation (1999, p.340) sets out a table detailing different levels of
word knowledge for both reception and production both in terms of written and spoken contexts. At the early
stages of learning, such as the situation within which the learners in this study were in, assessment is most likely
to be mainly receptive in nature with only limited production being possible.
Secondly, as we have noted, not all studies have not employed a longitudinal element, in the form of a
delayed-test, something Schmitt (2010) also suggests is essential if we wish to provide evidence of durable
learning. This weakness is also one which Han, Park and Combs (2008) recognise and one which they argue must
be addressed if we hope to provide more reliable results in future TE studies. Although there is disagreement
about what constitutes an acceptable delay, it is generally recognised that a week or more is needed after
treatment in order to establish longer term effects of any intervention (Schmitt, 2010).
Thirdly, there have been notably fewer studies which have attempted to assess the impact of TE and AE
on the learning of lexical items. Those that have focussed upon lexical items (e.g., Bishop, 2004, Choi, 2016;
Y.Kim, 2003; Petchko, 201) have not employed both TE and AE as treatment variables and have also found little
effect for TE. Y. Kim (2003) sought to investigate the effect of TE and implicit, explicit or no lexical elaboration
(explicit = meaning plus de(nition, implicit = appositive phrase, following the target items) on two hundred and
ninety seven Korean learners of English. The (ndings show that TE alone did not have a signi(cant effect on
learners’ ability to recognise form or meaning of the lexical items, whilst lexical elaboration of both types aided
meaning recognition of the item. Bishop (2004) assessed the effects of TE on noticing formulaic sequences in a
reading text and overall comprehension of that text. Two groups were compared—a control group which read
an unenhanced text and an experimental group which read a text with targeted formulaic sequences
typographically enhanced. Students were able to click on words or sequences they were unsure and these were
often provided with an explanation of the meaning. They then answered a series of comprehension questions on
the text. Results showed that the TE group clicked on the enhanced formulaic sequences signi(cantly more than
single words and they also performed signi(cantly better on the comprehension test, when compared with the
control group. Petchko (2011) explored the impact of TE upon incidental vocabulary learning whilst reading
with forty seven intermediate students of English as a foreign language. Students in the treatment group had
twelve non-words enhanced, whilst the control group did not. Non-words were chosen to ensure that the
meaning of the treatment alone was measured. Both groups were given productive and receptive tests to measure

the effects of the treatment upon their recognition of word meaning and recall of the target items’ meanings.
Although both groups made gains when recognising form and recalling meaning in post-tests, there were no
statistically signi(cant differences found between the groups’ scores in either test. Cho (2016) investigated the
effect of TE on the learning of collocations. Two groups were compared – one which read a passage with target
collocations enhanced in the text and another group which read the text without the collocations being
enhanced. Groups received a post-test on the target collocations following the reading and a test to check their
recall of the whole text. They also had their length of eye (xation measured using eye tracking software. Results
showed that the TE group performed signi(cantly better than the contrast group on the target collocations test
and also spent more time looking at the enhanced forms. However, they also recalled signi(cantly less of the non-
enhanced text. This suggests that while TE can increase noticing of targeted lexis, the increased attention on
these items may reduce the ability to recall texts. As these results found mixed effects for TE alone, there seems to
be a clear need for more studies which attempt to investigate the impact of TE alongside AE on the learning of
lexical items. Such attempts are particularly merited when we consider the argument that one important way for
learners to increase their vocabulary is to notice form and meaning (Schmidt 1990) as much as possible when
they encounter them and TE and AE are one way this could be achieved. This would seem to be particularly the
case when investigating the impact upon beginners learning an L2, as a large part of their time can usefully be
spent trying to acquire a basic vocabulary as quickly as possible (McCarthy, 2004). Nation (2006) emphasises the
need for a deliberate approach to the learning of vocabulary and TE and AE potentially offer a way to direct
learning to the most important vocabulary. While the current study investigates the use of TE and AE in the
classroom, both types of input enhancement could also form the basis for self-directed study or independent
learning strategies.
Research Questions
To our knowledge, no studies have attempted to combine AE and TE with explicit instruction. Whilst the effects
of TE are mixed and AE has been under-researched, there is a great deal of evidence which demonstrates the
bene(ts of explicit instruction in language teaching, in developing lexical, pragmatic grammatical and pragmatic
competency (e.g., Alsadhan, 2011; Halenko and Jones, 2011; Norris & Ortega, 2000, 2001; Spada & Tomita,
2010). The current study is an attempt to address some of these issues through a focus on comparing TE/AE
alongside explicit vocabulary teaching, in comparison to explicit vocabulary teaching alone. It also attempts to
address the lack of a longitudinal element in some studies though the inclusion of a delayed-test, which can
provide evidence of durable learning (Schmitt, 2010, p.268) and to measure both receptive and productive
knowledge through these tests. The study seeks to answer the following research questions:
1. To what extent does TE and AE+ explicit teaching improve the receptive knowledge of the target lexical
items when compared to explicit teaching alone?
2. To what extent does TE and AE+ explicit teaching improve the productive knowledge of the target
lexical items when compared to explicit teaching alone?
Methodology
Participants
The participants consisted of two groups of 20 (rst year undergraduate students. All students were studying for a
degree in TESOL and Modern Languages combining TESOL with Arabic, Chinese, French, German, Japanese
or Spanish as their main second language. English was the (rst language of all participants. The research was
conducted as part of four hours of classes which students undertook in order to experience learning a second
language through Direct Method teaching, as beginners. Students had undertaken just two hours of classes in
Turkish prior to the study taking place and none had studied the language previously. In total there were

nineteen male and twenty one female participants, with a mean age of 21.5 in the contrast group and 22.6 in the
treatment group. Participants were randomly assigned to each group.
Research Design
The study followed a quasi-experimental classroom research design, as outlined by Dornyei (2007) and Cohen,
Canion and Morrison (2011) and here described as such because there was no control group employed but rather
two groups who received different types of instruction, which took place within a classroom setting. Although a
control group (receiving no instruction but undertaking each test) would have been an addition to the study, this
was not possible, as the participants undertook instruction as part of their undergraduate programme. In
addition, the intention was to measure the effects of a key variable in the instruction upon the learning of the
targeted lexis (in this case types of input enhancement) and not whether instruction itself has any effect. The
study employed a pre-test, treatment, post- and delayed test structure, with the delayed tests taking place two
weeks after instruction and representing the longitudinal aspect of the study. The design can be summarised in
Table 1.
Table 1
Research Design
Pre-test Treatment Post-test Delayed post-test
(2 weeks after
instruction)
Contrast Receptive and One hour of explicit Receptive and Receptive and
group productive teaching only productive productive
N = 20 vocabulary tests focussed on vocabulary tests vocabulary tests
Focused on target ‘restaurant Turkish’ Focused on target Focused on target
items e.g. ayran (a including food and items items
drink made from drink items tested in Turkish e..g. ayran (a Turkish e.g. ayran (a
yoghurt, salt and the pre-, post and drink made from drink made from
water) delayed tests yoghurt, salt and yoghurt, salt and
water) water)
Treatment Receptive and One hour of explicit Receptive and Receptive and
group productive teaching with textual productive productive
N = 20 vocabulary tests and aural input vocabulary tests vocabulary tests
Focused on target enhancement for the Focused on target Focused on target
items e.g. ayran (a target lexical items items e.g. ayran (a items e.g. ayran (a
drink made from focussed on drink made from drink made from
yoghurt, salt and ‘restaurant Turkish’ yoghurt, salt and yoghurt, salt and
water) including food and water) water)
drink items tested in
the pre-, post- and
delayed tests
A number of items were included in tests, based upon several factors. Firstly, two items were chosen as
they contained a potential cognate (salata [salad] and alkollu [alcoholic]) but also contained a word which would
not be recognisable to the learners. The second set of items were not recognisable but were used multiple times
in various forms (içecekler [drinks]) and (nally words were chosen which would be entirely unfamiliar and would
not be easily translatable into English (ayran [a drink made from yoghurt, salt and water]) and beyaz/kırmız şarap
[white/red wine]. All students were (rst given a productive and then a receptive test focussing on the target

items, for reasons outlined in the literature review (see appendix A for the target items and appendix B for a
sample of the tests). The productive test entailed learners translating the target items into English and the
receptive test entailed learners reporting whether they believed they knew the target item or not. As noted earlier,
tests of lexis have to be developed in order to assess the aspects of lexical knowledge that are relevant to the
situation. As the classes focused on learners at beginner level, this meant that the test needed to centre on
establishing meaning of new lexis and then the linking of form to this (Batstone & Ellis, 2009) and thus the focus
was on whether learners were able to recognise the words and link them to the appropriate forms. To ensure
reliability each receptive test also contained an equal number of real and invented words following Nation’s
(1999) format for vocabulary recognition tests. The addition of these words reduces the likelihood of participants
simply ticking all of the options. The order of items was changed for each test.
Each group received an hour of explicit instruction about ‘restaurant Turkish’ using a direct,
communicative method, meaning all instruction was delivered in the target language. Implicit teaching was
taken to be ‘learning without awareness of what has been learned’ whilst explicit teaching was taken to mean
‘the learner is aware of what has been learned’ (Richards & Schmidt, 2002, p.250) .This was realised by the
teacher explicitly stating the aims and intended outcomes of the class before it started. The lesson followed a
presentation and practice framework. Students were (rst shown pictures of the items and drilled on them.
Explanations of items which were not immediately obvious from the picture were given in Turkish (e.g. ayran [a
drink made from yoghurt, salt and water]). Later on in the lesson the menu was presented in enhanced and
unenhanced form (see appendix A). Finally, the students did a short role-play based on a model dialogue where
they took the part of customers in a café while the teacher took the role of the waiter.
The treatment group were given identical materials to the contrast group but each targeted word was
bolded for this group, in order to operationalise TE. Aural enhancement was operationalised by the instructor
modelling each targeted item three times for the experimental group and only once for the contrast group. This
procedure was intended to replicate the oral repetition which Sharwood Smith suggests (1991) can be used for
aural input enhancement. Students in the treatment group were not given any additional instruction, such as
asking them to pay attention to the enhanced words. Both groups were asked not to revise the words between
classes.
Test data was analysed for statistical signi(cance using between group and within group measures. To
answer the two research questions, gain scores at pre-post, post-delayed and pre-delayed stages were compared
using an independent samples t-test to compare groups. Productive and receptive gains were also compared for
each group using paired samples t-tests. Effect sizes were measured where signi(cance was found, using Pearson’s
r, which Cohen (1988) suggests can be considered in the following ways: small effect = 0.10, medium effect =
0.30, large effect = 0.50.

Research Question 1: To what extent does TE and AE + explicit teaching improve the receptive
knowledge of the target lexical items when compared to explicit teaching alone?
Table 2 gives the descriptive statistics for the receptive tests, for group 1 (the contrast group, who received explicit
teaching only) and for group 2 (the treatment group, who received explicit teaching and AE/TE).
Table 2
Receptive Test Results
Pre-test Post-test Delayed test
Group1 (contrast) M = 1.5000 M = 6.5550 M = 5.6500
N = 20 SD = 1.73205 SD = 2.21181 SD = 260111
Group2 (treatment) M = 1.5000 M = 8.2000 M = 6. 9000
N = 20 SD = 1.60591 SD = 1.43637 SD = 1.94395
Note. Maximum score = 9

It is clear from this data that both groups made gains from pre- to post and pre- to delayed tests. Paired sample t-
tests show that these gains were signi(cant for both groups. For the contrast group, gains at the pre-post stage
were most positive (M =5.05000, SD = 2.68475) t(19)= 8.412, p = <.001, r = 0.88 and were maintained to
some degree at the pre-delayed stage ( M = 4.15000, S D = 2.49789) t (19) = 7.430, p = <.001, r = 0.86.
Although there was notable attrition from the post to delayed stage (M =-.9000, SD = 2.82657) , this was not
found to be signi(cant. For the treatment group, gains were largest at the pre-post stage (M = 6.70000,
SD = 2.51522) t (19) = 11.913, p <.001, r = 0.94 and were maintained to some degree at the pre-delayed stage
(M = 5.4000, SD = 3.20197) t (19) = 7.542, p =.001, r = 0.87. There was also attrition at the post to delayed test
stage (M = -1.3000, SD = 1.97617), but this was not found to be signi(cant. These results show that both types of
instruction had a durable bene(t for the receptive knowledge of the target lexis. They also show that the gains
were larger in general and the effect sizes larger at the pre-post and pre-delayed stages for the treatment group,
indicating a clear short term bene(t for explicit teaching combined with TE and AE. However, despite these
notable gains, when compared with independent samples t-tests, no statistically signi(cant differences were found
between the groups at any of the test stages.
Research Question 2: To what extent does TE and AE+ explicit teaching improve the productive
knowledge of the target lexical items when compared to explicit teaching alone?
Table 3 gives the descriptive statistics for the productive tests, for group 1 (the contrast group, who received
explicit teaching only) and for group 2 (the treatment group, who received explicit teaching and AE/TE).
Table 3
Productive Test Results
Pre-test Post-test Delayed test
Group1 (contrast) M = .9000 M = 4.9500 M = 3.6000
N = 20 SD = 9.6791 SD = 2.13923 SD = 2.23371
Group2 (treatment) M = .0000 M = 6.3500 M = 3.5000
N = 20 SD = .0000 SD = 2.32322 SD = 2.94690
Note. Maximum score = 9.
It is again clear from this data that both groups made gains from pre to post and pre to delayed tests. Paired
sample t-tests show that these gains were also signi(cant for both groups. For the contrast group, gains at the pre-
post stage were most positive (M =4.0500, SD = 2.13923) t(19)= 8.467, p <.001, r = 0.88 and were
maintained to some degree at the pre-delayed stage ( M = 2.70000, S D = 2.22663) t (19) = 5.423, p<.001,
r = 0.78. Again there was notable attrition from the post to delayed stage (M = - 1.3500, SD = 2.51888), and this
was found to be signi(cant, t (19) = - 2.397, p = .027, r = 0.47. For the treatment group, gains were again largest
at the pre-post stage (M = 6.35000, SD = 2.32322) t (19) = 12.224, p <.001, r = 0.78 and were maintained to
some degree at the pre-delayed stage (M = 3.50000, SD = 2.94690) t (19) = 5.132, p <.001, r = 0.47. There was
also attrition at the post to delayed test stage (M = -2.85000, SD = 2.79614) and this was found to be signi(cant, t
(19) = - 4.558, p <.001, r = 0.72.
These results show that both types of instruction had a durable bene(t for the productive knowledge of
the target lexis. They also again show that the gains were larger in general at the pre-post stage for the
experimental group indicating a clear short term bene(t for experimental teaching combined with TE and AE.
An independent samples t-test also revealed that there was a signi(cant difference (with a medium effect size )
between the groups in terms of their pre-post gains, demonstrating the superiority of the results for the treatment
group (Contrast group: M = 4.0500, SD = 2.3923; treatment group; M = 6.3500, S D =2.32322)
t (38) = -3.257, p =. 002, r = 0.46).

Overall, results for both tests show what we might expect at this level, both types of treatment helped
learners to improve their receptive and productive knowledge of the target lexical items. The effects of the
instruction were not sustained over time but gains made at pre-delayed were signi(cant for both groups and for
both test types. The greater gains for the treatment group in general and at the post test stage in particular,
indicate that experimental teaching plus AE/TE had a stronger effect in this study, particularly in terms of
productive knowledge. This suggests that an addition of AE/TE to explicit teaching can aid learning of lexis and
could heighten noticing and retention of targeted lexis. The absence of signi(cant differences between the
groups at the delayed tests stages may be due to the fact that TE/AE are a relatively implicit form of input
enhancement (Gasgoine, 2006) and may impact on learners for a short time only. To ensure a longer lasting
effect, students at elementary levels in particular, may need very explicit forms of TE and AE to accompany
explicit teaching. Gasgoine (2006), for example, found a positive effect for explicit input enhancement in a study
investigating diacritics in beginners learning French and Spanish. Her study found that learners who were asked
to re-type a passage in either French or Spanish and given keycodes showing them how to produce diacritics had
a signi(cantly higher recall of diacritics than a control group. This suggests that explicit measures such as asking
students to pay attention to the enhanced forms may be more effective at this level, particularly if combined with
repeated and longer exposure to the targeted items. Lastly, it is possible that administering a post-treatment
questionnaire to assess whether the AE and TE did in fact draw learners’ attention to the targeted items could
have demonstrated the impact of these enhancements upon noticing. White (1998), for example, found in a study
of TE with French texts that participants in her study believed that TE did make them attend to the targeted
forms. If there is evidence that learners are paying more attention to the targeted forms as a result of TE then it
can be argued that this is likely to lead to more noticing and durable learning.
Conclusion
The results of this study demonstrate that TE and AE did, to some extent, produce a more positive effect upon
durable learning than explicit teaching alone. When the groups were compared, this was signi(cant in the short
term in the gains of productive knowledge for the experimental group and for both measures, gains were larger
for the treatment group. Within group tests demonstrated that instruction had a positive and signi(cant impact
on both receptive and productive knowledge for both groups, when we compare gains made from the pre-post
and pre-delayed test. Given that both groups were beginners, we would of course hope and expect that this
would happen. However, the results do indicate that the use of enhanced input, particularly for beginners, could
be extremely bene(cial. Koprowski (2005) makes the salient point that materials often present learners with
possible language without any signal of which language may be more useful. For example, the chunk ‘play
football’ is more likely to be useful than ‘do judo’. The issue is that at the outset of learning a language all words
and phrases presented are potential input and the learner does not necessarily know which words or patterns are
more worthy of attention. Enhanced input, directed to high-frequency/highly useful lexis would seem to provide
a potential way of signalling to learners that certain pieces of language are noteworthy, as well as guiding
teachers to provide particular emphasis on these.
As mentioned in the literature review, TE and AE also have the potential to be utilised not only by
teachers to guide explicit vocabulary learning in class, but as a possible strategy for independent study for
language learners. This could be done informally, with learners simply highlighting post-reading lexis that they
feel is useful to them. It could also be carried out in conjunction with the use of word lists such as Coxhead’s
Academic Wordlist (2000) or the lists provided by English Pro(le (2014). There are a number of tools available to
learners (and teachers) which can pro(le vocabulary using a range of input word lists such as the Compleat
Lexical Tutor (Cobb, 2017) site. AE could be carried out by learners recording (on a smartphone or similar
device) texts and pausing before the key lexis they wish to remember, or by repeating those words a number of
times.

There are, however, certain limitations of the study which may have impacted upon the results. Firstly, as
discussed above, a more explicit form of TE and AE may have produced superior long term results. This could
have been realised with more listening for the AE aspect, such as playing the experimental group dialogues with
the target items repeated a number of times and asking learners to pay attention to the items they hear most
often. For TE, their attention could also have been drawn to the bolded words by simply asking them to try and
remember those words. Although this may seem unnecessarily mechanical, it may be the case that beginners
learning a second language focus their attention on all aspects of the input they receive and implicit input
enhancement may not be processed. Secondly, although we were able to assess both receptive and productive
knowledge, it can be argued (e.g., Schmitt, 2010) that a test battery is the most effective measure of vocabulary
learning. This could involve the type of tests used plus a constrained constructed response test (such as a gap-(ll)
and a freer productive test (such an elicited role play). If vocabulary learning is measured in these ways, it can
allow for a more robust analysis and tell us under what conditions learners really know a set of target items.
It is clear that the results of this study offer some evidence that TE and AE can have a positive impact
upon learning. If this is indeed the case, and was followed with other studies which demonstrate similar results, it
would be a simple and easy change for second language teachers to make to classroom practice. Teachers could
simply use TE to enhance target language within written texts and AE to enhance listening texts. Clearly though,
more research is needed, particularly in regard to the effects of AE. Future studies could focus on a greater use of
AE realised through measures such as teacher repetition and increased volume and stress on target items in
listening texts when combined with explicit teaching. It would also be useful to replicate studies such as this at
different levels, as we would suspect that AE and TE are likely to be more effective beyond elementary levels,
when learners can begin to focus on different aspects in the input they receive.
References
Alanen, R. (1995). Input enhancement and rule presentation in second language acquisition. In R. W. Schmidt
(Ed.), Attention and awareness in foreign language learning (pp. 259–302). Hawai’i: University of Hawai’i Press.
Alsadhan, R.O. (2011). Effect of textual enhancement and explicit rule presentation on the noticing and
acquisition of L2 grammatical structures: A meta-analysis. Unpublished Master’s dissertation, Colorado
State University, USA.
Baltova, I. (1999).The effect of subtitled and staged video input on the learning and retention of content and
vocabulary in a second language. Unpublished Doctoral dissertation, University of Toronto, Canada.
Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality
and reactivity. Language Testing, 28(1), 51–75.
Batstone, R., & Ellis, R. (2009). Principled Grammar Teaching. System, 37, 194 – 204.
Bishop, H. (2004). The effect of typographical salience on the look up and comprehension of formulaic
sequences. In N.Schmitt (ed.).Formulaic sequences: Acquisition, processing and use (pp 227 –244). Amsterdam:
John Benjamens.
Cambridge English Language Assessment; Cambridge University Press; The British Council; The University of
Cambridge; The University of Bedfordshire; English UK. (2014) English Vocabulary ProGle. Retrieved from
www.englishpro(le.org.
Cobb, T. (2017). Compleat Lexical Tutor. Retrieved from www.lextutor.ca.
Cho, S. (2016). Processing and learning of English collocations: An eye movement study. Language Teaching
Research. Advanced Online Access.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York: Lawrence Erlbaum Associates.
Cohen, L., Manion, L., & Morrison, K. (2011). Research methods in education (Seventh Edition). New York:
Routledge.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.
Dornyei, Z. (2007). Research methods in applied linguistics. Oxford: Oxford University Press.

Doughty, C. (1991). Second language instruction: does make a difference? Studies in Second Language Acquisition, 13,
431–469.
Folse, K. (2004). Vocabulary myths: Applying second language research to classroom teaching. Ann Arbour: University of
Michigan Press.
Gasgoine, G. (2006). Explicit input enhancement: Effects on target and non-target aspects of second language
acquisition. Foreign Language Annals, 39(4), 551–564.
Grgurović. M., & Hegelheimer, V. (2007). Help options and multimedia listening: Students’ use of subtitles and
the transcript. Language Learning and Technology, 11(1), 45–66.
Halenko, N., & Jones, C. (2011). Teaching Pragmatic awareness of spoken requests to Chinese EAP learners in
the UK. Is explicit instruction effective? System, 39(1), 240–250.
Hernandez, S.S. (2004). The effects of video and captioned text and the inAuence of verbal and spatial abilities
on second language listening comprehension in a multimeedia environment. Unpublished Doctoral
dissertation, New York University, USA.
Izumi, S. (2002). Output, input enhancement and the noticing hypothesis: An experimental study on ESL
relativization. Studies in Second Language Acquisition,19(2), 541–577.
Jensen, E.D., & Vinther, T. (2003). Exact repetition as input enhancement in second language acquisition.
Language Learning, 53(3), 373–428.
Jourdenais, R. O.M, Stauffer, S, Boyson, B., & Doughty, C. (1995). Does textual enhancement promote noticing?
A think-aloud protocol analysis. In R. Schmidt (Ed.), Attention and awareness in second language learning. (pp.
183–216). Hawaii: University of Hawai’i Press.
Kim, H.Y. (1995). Intake from the speech stream: Speech elements that L2 learners attend to. In R. W. Schmidt
(Ed.), Attention and Awareness in Foreign Language Learning (pp. 65–84). Hawai’i: University of Hawai’i Press.
Kim, Y. (2003). Effects of input elaboration and enhancement on second language vocabulary aquisition through
reading by Korean learners of English. Unpublished Doctoral dissertation, University of Hawai’i, USA.
Koprowski, M. (2006). Investigating the usefulness of lexical phrases in contemporary coursebooks. ELT Journal,
59(4), 322–332.
Lee, S. (2007).Effects of textual enhancement and topic familiarity on Korean EFL students’ reading
comprehension and learning of passive form. Language Learning, 57(1), 87 –118.
Leow, R. P. (2001). Do learners notice enhanced forms while interacting with the L2? Hispania, 84(3), 496–509.
Leow, R, Egi, T, Nuevo, A .,& Tsai, Y. (2003). The roles of textual enhancement and type of linguistic item in
adult L2 learners’ comprehension and intake. Applied Language Learning, 13(2), 1–16.
McCarthy, M. (1999). What constitutes a basic vocabulary for spoken communication? Studies in English Language
and Literature 1, 233–249.
McCarthy, M. (2004). Touchstone: From corpus to coursebook. Cambridge: Cambridge University Press.
Nation, I.S.P. (1999). Learning Vocabulary in Another Language. Wellington: Victoria University of Wellington.
Nation, I.S.P. (2006). Language education – vocabulary. In K. Brown (ed.) Encyclopaedia of Language and Linguistics,
2nd Ed. Oxford: Elsevier. Vol 6: 494-499.
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-
analysis. Language Learning, 50(3), 417–528.
Norris, J. M., & Ortega, L. (2001). Does type of instruction make a difference? Substantive (ndings from a meta-
analytic review. In R. Ellis (Ed.), Form-Focused Instruction and Second Language Learning (pp. 157–213).
Oxford: Blackwell.
O'Keeffe, A., McCarthy, M., & Carter, R. (2007). From Corpus to Classroom. Cambridge: Cambridge University
Press.
Petchko, K. (2011). Input enhancement, noticing and incidental vocabulary acquisition. The Asian EFL Journal
Quarterly, 3(4), 228–255.
Perez, M. M., & Desmet, P. (2012). The effect of input enhancement in L2 listening on incidental vocabulary
learning: A review. Procedia –Social and Behavioural Sciences, 34, 153–157.

Reinders, M., & Cho, M. (2010). Extensive listening and input enhancement using mobile phones: Encouraging
out-of-class learning with mobile phones. TESL E-Journal 14(2).Available from http://www.tesl-
ej.org/wordpress/issues/volume14/ej54/ej54m2/
Reinders, M., & Cho, M. (2011). Encouraging informal language learning with mobile technology: Does it work?
Journal of Second Language Teaching and Research, 1(1), 3–29. Available from www.uclan.ac.uk/jsltr
Richards, J. C., & Schmidt, R. W. (2002). Longman dictionary of language teaching and applied linguistics (Third Edition).
Harlow: Pearson Education Limited.
Rosa, E., & O'Neill, M. (1999). Explicitness, intake and the issue of awareness: Another piece to the puzzle.
Studies in Second Language Acquisition, 21(4), 511–566.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158.
Schmidt, R. W. (1993). Awareness and second language acquisition. Annual Review of Applied Linguistics, 13, 206–
226.
Schmidt, R. W. (1995). Consciousness and foreign language learning: A tutorial on the role of attention and
awareness in learning. In R. W. Schmidt, (Ed.), Attention and Awareness in Foreign Language Learning (pp. 1–
63). Hawai’i: University of Hawai’i Press.
Schmidt, R. W. (2001). Attention. In P. Robinson (Ed.), Cognition and Second Language Instruction (pp. 3–32).
Cambridge: Cambridge University Press.
Schmidt, R. W. (2010). Attention, Awareness and Individual Differences in Language Learning. In W. M. Chan,
S. Chi, K. N. Cin, J. Istanto, M. Nagami, J. W. Sew, T.Suthiwan., & I. Walker (Eds.), Proceedings of CLaSIC
2010, Singapore, December 2–4 (pp. 721–737). Singapore: University of Singapore Centre for Language
Studies.
Schmitt, N. (2000). Vocabulary in language teaching. Cambridge: Cambridge University Press.
Schmitt, N. (2010). Researching Vocabulary: A Vocabulary research manual. London: Palgrave Macmillan.
Sharwood Smith, M. (1981). Consciousness-Raising and the second language learner. Applied Linguistics, 2(2),
159–168.
Sharwood Smith, M. (1991).Speaking to many different minds: On the relevance of different types of language
information for the L2 learner. Second Language Research ,72, 118-32.
Sharwood Smith, M. (1993). Input enhancement in instructed SLA. Studies in Second Language Acquisition, 15, 165-
179.
Shin, D., & Nation, P. (2007). Beyond single words: the most frequent collocations in spoken English. English
Language teaching Journal, 62 (4), 339 –348.
Shook, D. (1994). FL/L2 reading , grammatical information and input-intake phenomenon. Applied Language
Learning, 52, 57–93.
Spada, N., &Tomita, Y. (2010). Interactions between type of instruction and type of language feature: A meta-
analysis. Language Learning, 60(2), 263–308.
Stratman, J.F., & Hamp-Lyons, L. (1994). Reactivity in concurrent think-aloud protocols: Issues for research. In
P. Smagorinsky (Ed.), Speaking about writing: ReMections on research methodology (pp. 89–111). Thousand Oaks,
CA: Sage Publications.
VanPatten, B. (1990). Attending to form and content in the input: An experiment in consciousness. Studies in
Second Language Acquisition, 12, 287-301.
White, J. (1998). Getting the learners’ attention: A typographical input enhancement study. In C. Doughty, & J.
Williams(Eds.), Focus on form in classroom second language acquisition (pp. 85-1 13). Cambridge: Cambridge
University Press.
Wong, W. (2003).The effects of textual enhancement and simpli(ed input on L2 comprehension and acquisition
of non-meaningful grammatical form. Applied Language Learning, 14, 109–32.

Appendix A
Sample Enhanced menu (target items in bold, translations not given to learners)
Içecekler [drinks]
Sıcak Içecekler [hot drinks]
Çay [tea]
Kahve [coffee]
Soğuk Içecekler [cold drinks]
Kola[cola]
Fanta [fanta]
Maden suyu [mineral water]
Ayran [salty yoghurt drink]

Appendix B
Sample pre-and posttests
Alkollu Içecekler [alcoholic drinks]

Tavuk şiş [chicken kebab]
Bira [beer]
Iskembe çorbası[tripe soup]
Beyaz şarap [white wine]
Balık[fish]
Kırmızı şarap [red wine]
Peynır salatası [cheese salad]
Vodka [vodka]
Patlıcan salatası[aubergine salad]
Rakı [raki]
Karpuz [watermelon]
Cips [chips]

Appendix B
Sample tests
Productive test: Write what you think is the English equivalent of each word.
Bira …………………………………………………..
Fanta …………………………………………………..
Çay …………………………………………………..
Içecekler …………………………………………………..
Soğuk içecekler …………………………………………………..
Sıcak içecekler …………………………………………………..
Kola …………………………………………………..
Maden suyu …………………………………………………..
Ayran …………………………………………………..
Beyaz şarap …………………………………………………..
Kırmızı şarap …………………………………………………..
Vodka …………………………………………………..
Alkollu içecekler …………………………………………………..
Kahve …………………………………………………..
Rakı …………………………………………………..
Tavuk şiş …………………………………………………..
Iskembe çorbası …………………………………………………..
Balık …………………………………………………..
Peynır salatası …………………………………………………..
Patlıcan salatası …………………………………………………..
Receptive test: Tick the appropriate box next to each word.

I think I know what this word means. I don’t know what this word means.
bira
navra
fanta
aşir
linon
içecekler
soğuk içecekler
ayran
tomurcen
mantıl
I think I know what this word means. I don’t know what this word means.
maden suyu
artı polat
beyaz şarap
peynır salatası
kıffa
höçeri
sıcak içecekler
çay
cele
örumce tan

About the authors

Christian Jones works at the University of Liverpool in the Department of English. He has worked as a teacher
and teacher trainer in several countries, including Japan, Thailand and the UK. He completed his PhD at the
University of Nottingham and his research interests are in spoken language, lexis and lexico-grammar, corpus
linguistics and instructed second language acquisition.
Daniel Waller works at the University of Central Lancashire in the School of Language & Global Studies. He has
taught in various countries including Turkey, China and Cyprus. He completed his PhD in Language Assessment
at the Centre for Research in English Language Learning and Assessment at the University of Bedfordshire and
his research interests include written assessment, corpus linguistics and lexis.

Vocabulary Teaching: Insights from Lexical Errors
Mª Pilar Agustín-Llach*
Universidad de La Rioja, Spain
Abstract
This paper offers a theoretical approach to vocabulary instruction from the evidence provided by lexical errors as the main
sources of difculty in the EFL acquisition process, it reviews previous research and from it suggests new ways of dealing
with lexical errors in the classroom. Some practical implications are concluded which rely on lexical error categories
identied in previous studies. Our main starting point is that lexical errors can serve as a guideline for teachers and
researchers to improve vocabulary instruction. Identifying the main causes of lexical errors can help teachers understand the
difculties of their learners and assist them in planning and designing lessons and materials for the vocabulary class.
Embarking from this premise, we have reviewed the main lexical error sources identied in the literature and provided some
suggestions for vocabulary instruction.
Keywords: lexical errors, cross-linguistic in'uence (CLI) in vocabulary, remedying strategies, vocabulary
instruction, explicit teaching
Introduction
Previous research on lexical errors has revealed a series of difculty areas within lexical acquisition. Descriptive
studies reporting on lexical errors allow researchers, teachers or material designers to identify the nature as well
as the origin or source of lexical errors. We believe that this information can be used to act upon the problematic
aspects identied and help deal with them. Lexical learning is a difcult and lifelong task and lexical errors are
most undesirable since they distort communication and can have a negative impact on the image of the learners.
However, they are also positive signs of vocabulary development. We believe that teaching learners the origin
and causes of their lexical misuse and how to remedy and prevent it, is a good start for successful and effective
lexical acquisition (Agustín-Llach, 2004, 2015; Hemchua & Schmitt, 2006). This paper intends to compile main
ndings and tendencies drawn from lexical error analysis in English as a Foreign Language (EFL) vocabulary
acquisition as a starting point to propose a set of actions to help learners overcome those difculties.
Analysis of these studies shows that we need to go further into detail beyond simple L1 versus L2
in'uenced errors. In fact, these studies show that considering the L1 as a unitary source of in'uence is an
oversimplication. L1 in'uence intermingles and collaborates with other sources, mainly L2 in'uence via
overgeneralization or confusion, in originating lexical errors. Descriptive studies of lexical errors have achieved a
renement in etiologies which has allowed us to identify the most problematic areas which should be dealt with
in the foreign language classroom.
In what follows, we intend to, rst, give account of the most frequent lexical error types found by
previous research and of their outstanding role in vocabulary acquisition, and then to propose some pedagogical
interventions or actions aimed at teaching vocabulary and remedying and preventing lexical errors in the
interlanguage of EFL learners in the light of those previous ndings.
*Tel: (34) 941 299435; Fax: (+34) 941 299433; E-mail: maria-del-pilar.agustin@unirioja.es; C/ San José de Calasanz, 33,
26004, Logroño, La Rioja, Spain

Lexical Errors in Learners’ Productions

Lexical errors have only recently started to capture the attention of researchers as objects of study on their own
(e.g. Agustín-Llach, 2011; Bouvy, 2000; Celaya & Torras, 2001; Hemchua & Schmitt, 2006, James, 1998;
Ringbom, e.g., 2001; Zimmermann, 1986). General studies on errors traditionally focused on grammar errors,
since they were considered easier to systematize, classify, generalize, and remedy. Ferris (1999) even made a
distinction between grammar or “treatable” errors and lexical or word choice errors, which she considered
“untreatable”. Hemchua and Schmitt (2006) also believe the line between lexical and grammatical errors is
rather blurred. However, research specically dealing with errors in vocabulary could distinguish different types
of lexical errors, design explanations for the source and origin of the errors, and systematize into patterns the
instances of lexical errors (Hemchua & Schmitt, 2006; Warren, 1982). From this systematization teachers and
researchers can develop instructional approaches to vocabulary acquisition. This is what we attempt to do here.
First, we provide a general review addressing the role of lexical errors in second language acquisition. We
continue then to list and explain the main sources of lexical errors as have been described in previous studies.
With this information, we develop further pedagogical implications in the last section.
Lexical Errors and SLA

Not only are lexical errors very frequent in learners’ language – even commoner than grammatical ones (cf.
Bouvy, 2000; Jiménez Catalán, 1992; Meara, 1984) - they also play a relevant role in the second language
vocabulary acquisition process. There are three reasons that make lexical errors crucial in Second Language
Acquisition (SLA) á la Corder (1967).
First, they are an important source of information for teachers and researchers of the L2 vocabulary
acquisition process, since they serve as evidence of the said process. In this sense, lexical errors reveal the
underlying processes of L2 vocabulary acquisition, and they contribute to a better understanding of the
organization of the mental lexicon (Ellis, 1994; Meara, 1996). The different types of lexical errors can provide
information about the relationships established in the mind of the learners when performing in EFL. For
instance, semantic confusions might reveal that lexical items are stored according to their meaning relations,
formal confusion or misformation, however, may indicate that lexical items are also associated via formal
resemblance (orthographic or phonological). Similarly, L1 in'uenced-errors, overgeneralizations, or errors
derived from faulty application of rules might be pointing to processes, strategies, or principles followed during
foreign language vocabulary acquisition.
Second, they are useful for learners to realize the gaps between their lexical knowledge and their
communicative needs. Calling students’ attention to the lexical errors they produce serves as a way for awareness-
raising. Learners have to realize and notice the gap between what they want to transmit, i.e. the message they
need to get across, and the linguistic or lexical tools they have at their disposal. Noticing the gap between actual
and required knowledge has been considered the rst step in successful learning (cf. Schmidt, 2001). Learners,
therefore, can and should learn from their lexical errors.
And nally, lexical errors have pedagogical implications, because they indicate to teachers the problem
areas of lexical learning. They also provide information about the strategies or stratagems learners use to
overcome these problems, but only when they result in faulty outcomes; lexical errors do not provide hints about
felicitous use of vocabulary strategies. Moreover, lexical errors have also been found to serve as predictors of
language quality and prociency level (Albrechtsen, Henriksen & Faerch, 1980; Engber, 1995), and can thus help
establish objective evaluation criteria (also see below).
Main Sources of Lexical Error Production

The types of lexical errors found in the literature delimit the areas of lexis where EFL learners have been found
to have the most problems and thus they point to the main sources of difculty for EFL learners within an

educational context. Establishing the source or main causes of lexical errors in EFL productions will allow us to
conclude some pedagogical implications for vocabulary instruction as hinted above. Among the most frequent
and important lexical error types in EFL, previous ndings highlight the following (Agustín-Llach, 2011; Bouvy,
2000; James, 1998; Warren, 1982):
1) Borrowings, which are bare L1 insertions into the L2 syntax; for instance, from Spanish L1:
My ciudad is very big (Eng. city).
We need to acknowledge that, while use of native words is a very frequent cause in EFL learners with
typologically closer L1s to English like French, Spanish, or German; it is a much rarer cause of
interference or difculty in learners who speak native languages which are distant from English such as
Chinese, Thai, Hebrew, or Arabic. Nevertheless, code switching from the L1 is a communication strategy
to overcome lexical lack of knowledge, and to scaffold their acquisition process. In this sense borrowings
tend to be marked in the students’ productions with e.g. inverted commas, capital letters, change of
intonation or pronunciation, or underlining. If the teacher and students share L1, then inserting L1 words
into the L2 discourse is a communication strategy which can result in successful message transmission
disregarding the source L1.
2) Lexical adaptation of an L1 word to the L2 morphological or phonological rules so that it sounds or looks
English (Celaya & Torras, 2001, p.7). An example of such lexical error appears in the following sentence:
My favorite deport is football (Eng. sport, Sp. deporte).
Psychotypological perceptions of similarity or rather of transferability (e.g. Kellermann, 1979) might
explain these types of adaptations. If learners perceive a lexical item can be transferred or is similar to the
L2 target, then they will try to tailor it to the L2 norm. Success of this strategy is certainly frequent, e.g.
contribution from contribución (Sp.) or come from kommen (G.).
3) Semantic confusion originates when the learner confounds two words which are semantically related in
the L2 such as for example in
My uncle’s name is Ana (for aunt) or in
In my city there are very shops (for many).
Especially conspicuous is the confusion of two auxiliary verbs: to have and to be. It is frequent to nd
sentences in learners’ data in which these two verbs are confused:
I’m an older sister, her name is Ana (for I have), or
I have eleven years old (for I am).
Some instances of this confusion can be traced back to L1 in'uence, however in some other cases the
explanations are unfortunately not so straightforward and nding a plausible interpretation for this
confusion is extremely difcult. Confusions can also have a formal origin thus giving rise to lexical errors of
the type:
I’m board (for bored) or
I lake playing basketball (for like).
We tend to call them phonetic or formal confusions. Semantic and formal confusions reveal a certain
degree of word knowledge, incomplete or imperfect knowledge, though. We might wonder whether the
learner knows both the target and the error word, and confuses them because of their similarity or whether
they ignore the target word and use a proximal, close word they have knowledge of. The rst example
might illustrate the rst case, and the second example the latter:
My hear is blond (for hair)
My favourite eat is pasta with meat (for food)
4) Learners also tend to calque L1 words or expressions when they lack exact lexical knowledge
of the L2 equivalents. A calque or literal translation originates when a learner literally translates
a L1 word and transfers the semantic and even syntactic properties of the L1 word into a L2

equivalent which has a different contextual distribution (cf. Zimmermann, 1986). Adjectival and
verbal structures or word order in compounds of phrases are likely candidates for literal
translation. The following sentences are good examples of this phenomenon:
I like ballhand (for Eng. handball, Sp. balonmano) and
My favourite plate is pasta and rice (from Sp. plato, Eng. dish)
5) Previous research with (for example) Spanish EFL learners has revealed that they display
wrong cognate use such as in the sentence, In the evenings, I go to an academy (Eng. private tuition
school, Sp. academia), where the word is used as it is in Spanish with the semantic and contextual
restrictions of the L1 and not of the L2. German EFL learners display a similar behaviour and
tend to use cognates in the L1 sense (Agustín-Llach, 2014) (for examples from other languages
see e.g. Bouvy, 2000; Ringbom, 2001; Warren, 1982). This type of lexical error could also be
considered as an extension or particular manifestation of word confusion (see above).
6) Spelling problems are probably the most frequent category of lexical errors in EFL learners’
writings (cf. Bouvy, 2000; Fernández, 1997; Lindell, 1973). These are violations of the
orthographic conventions of English. The lack of congruence between spelling and
pronunciation so characteristic of the English language is mostly responsible for these difculties.
EFL learners face the problem of having to cope with the complicated English encoding system
in which one sound, especially vowel sounds, can be rendered in multiple ways, i.e. through
different letters, and vice versa where one letter can be pronounced in different ways. Double
letters, silent letters, or triphthongs also cause problems for learners. Thus, we nd the following
misspellings as an example: beautifull, verday, ritting, inteligent for beautiful, birthday, writing, and
intelligent, respectively. A particular type of spelling error arises as the result of what is called
phonetic spelling, i.e. writing the words the way they are pronounced. Thus, we nd the
following examples that illustrate this phenomenon: Reichel for Rachel, keik for cake, spik for speak,
braun for brown, or saebyet for subject.
7) Construction errors make up the last category of lexical errors. These are the result of a faulty
use of constructions regarding for instance, choice of prepositions, re'exivity, transitiveness. Very
recent research trends within cognitive linguistics have identied constructions as central units of
the language, and take, therefore, a relevant role in SLA (cf. Goldberg, e.g. 2006). Constructions
represent the lexical-grammatical interface and thus errors in the arguments of the verb could
be termed “construction errors”. Learning a new language implies learning new ways of
encoding or conceptualizing reality, hence errors with transitive and re'exive verbs, with
prepositions, phrases or characteristics of verb arguments (e.g. animate/inanimate) tend to be
frequent, especially at higher levels of prociency (Verspoor et al., 2012). In previous lexical
error-related research, we were able to identify some lexical errors which could originate in
constructions (Agustín- Llach, 2015):
I donate at poor, for I donate to the poor.
I can relax me, for I can relax.
I am writing to introduce you myself, for I am writing to introduce myself (to you).
I meet friends for play, for I meet friends to play.
He visit to me always, for He visits me always.
Films romantic doesn’t love with me for I do not like/love romantic lms.
In the examples above, we observe a misuse of a preposition in the rst one, a re'exivization of
a non-re'exive verb in the second one, the wrong use of the dative in the third one, the wrong
preposition in a nality clause in the fourth example, the transformation of a transitive verb into

a non-transitive one in the fth example sentence, and in the last one the learner uses an
inanimate subject to the sentence where an animate subject is necessary.
Constructions could also traditionally have fallen under the heading literal translations. The main
difference is founded on the fact that construction errors pertain to more xed expressions, whereas calques or
literal translations appear in freer word combinations or compound words.
Focusing on these tendencies of lexical inconsistencies identied in previous research on lexical errors, we
are going to propose some instructional actions to tackle these problems in the classroom on the way to L2
vocabulary teaching and acquisition. This is not a treaty about error correction, but rather our intention is to
take a deep look into the vocabulary areas which cause major problems for Spanish EFL learners and describe
possible pedagogical interventions to remedy them. We have departed from identifying lexical errors to learn
from them and use them as a starting point for lines of vocabulary instruction. The following section offers some
suggestions for remedial and preventive vocabulary instruction.
Suggestions for Vocabulary Instruction in the Classroom

Lexical errors inform researchers, teachers, and learners about how lexical development is proceeding; they
highlight the steps learners go through in the acquisition of new L2 words, and they make evident the difculties
learners face in this L2 vocabulary acquisition process; thus dening and delimiting the way vocabulary should
be taught.
Approaching Vocabulary Teaching

A double-fold perspective with two steps guides this proposal for vocabulary teaching from lexical errors, in
particular: awareness-raising explanations followed by practice activities to remedy lexical errors. Remedial
actions, basically, follow from prior identication of lexical inconsistencies, and are aimed at remedying or
eliminating those errors. This, in a way, could re'ect the Focus on Form methodology, where learners’ attention is
called to the items that give rise to problems in the 'ow of classroom communication or task performance (e.g.
Long, 1991; Richards, 2008). By contrast, preventive interventions are conducted to prevent lexical errors from
happening, taking as a reference point lexical errors committed by other similar student populations previously
studied.
Explicit explanations of the lexical errors produced are the rst step towards remediation and/or
prevention. Learners should be presented with the erroneous and the correct lexical item and be told the exact
nature of the lexical error. Only by noticing the gap or the mismatch between their actual production, i.e. their
interlanguage, and L2 norms, can they learn vocabulary (Schmidt, 2001, the Noticing Hypothesis). Glossing, either
in the form of L1 translation or via denitions, can also be an example of input provision in the form of explicit
instruction. In this sense, Solís Hernández (2011) proved that raising learners awareness contributed to
remedying their lexical errors. In a like way, Hemchua and Schmitt (2006) believe that a good pedagogical
approach banking on lexical errors is the explanation of the reasons that lead to the error and to then establish
comparisons between L1 and L2 lexical systems.
Once awareness concerning specic lexical errors has been raised, learners should be encouraged to
practice these lexical items in oral and written form, in context and in isolation. Contextualized activities can help
introduce new vocabulary and consolidate word knowledge through meaningful learning. Since words in context
create their meaning in solidarity with the surrounding words, they are easier to learn and retain. Exposing
learners to language-rich environments such as book reading, television watching, or internet surng can help
them learn new words and practice already known words in meaningful communicative situations (cf. Graves et
al., 2012). These additional activities can help consolidate words in memory and enable a more effective
contextual use. Instances of calques and misspellings where learners display some knowledge of the words at stake,
but fail to remember the form-meaning link adequately can especially benet from a focus on forms approach

where lexical items are presented and practiced in isolation deprived of communicative context. Computer assisted
instruction can be very useful to implement this focus on forms approach to minimize the effect of lexical errors.
Computer resources can enhance and facilitate vocabulary teaching, as well. We illustrate some possibilities of
computer enhanced vocabulary teaching below for the corresponding lexical problem. This manifold practice
approach should be the basis of an effective vocabulary teaching intervention.
This double-step approach mirrors the input-output orientation (cf. interactionist SLA perspectives e.g.
Long, 1996). First, learners are provided with input in the form of explicit explanations of the causes of errors
and of how the correct version should look like. These explanations trigger noticing. Dictionaries, corpora,
thesauruses can also be used to provide learners with explicit lexical input (e.g. McWhinney, 2005). Additionally,
promoting self-discovery and developing learners’ autonomy are two crucial steps to remedy their lexical errors.
Then, they are encouraged to produce L2 lexical items, i.e. pushed output. This approach is believed to enhance
lexical learning and provide learners with multiple opportunities for acquisition. Furthermore, pushing learners
further towards lexical progress can also help prevent fossilization and help them move over a possible “plateau
effect”. If learners’ attention is not called over recurrent errors, they might just be unable to spot and correct
them. Similarly, either conscious or most frequently unconsciously, learners stop developing their lexical accuracy
when they have reached communicative success (cf. Richards, 2008). They need to be urged to continue learning
and to be accurate.
Still, we can think of a transversal approach which is central in lexical learning and lexical error
prevention, namely explicit vocabulary strategy training. Lexical errors are on many occasions the result of a
faulty application of vocabulary learning or communication strategies. 1 In this sense, it is recommended to train
learners in the use of effective vocabulary strategies to improve their lexical production. Using cognate
knowledge, using word-parts (in'ectional or derivational prexes or sufxes, Latin and Greek roots), or using the
dictionary sensibly will arguably result in fewer lexical errors and better lexical use (Graves et al., 2012).
McWhinney (2005) proposes together with dictionary use, two other strategies to maximize learners’ full learning
potential, namely recoding or constructing new images or new concepts for new words or phrases and linking
word forms and meanings relating them to L1 equivalents, such as in the keyword method. We believe with
McWhinney (2005) that these strategies can be very helpful to cope with learners’ lexical learning challenge.
Finally, learning collocations and chunks or xed expressions is highly recommended to prevent and remedy
lexical errors and to increase learners’ vocabulary knowledge (cf. Richards, 2008).
Learning Vocabulary from Lexical Errors

We will try to tackle each main lexical error category generalized from research related studies and to propose
remedial and preventive actions, respectively. Furthermore, we have to take into account the conceptualization of
lexical errors within a teaching approach where the lexical error is not perceived as a failure, but as a positive
indicator of learning progress. Within this approximation to vocabulary instruction, learners’ lexical needs are
prioritized and there is an effort to increase learners’ condence in their accurate use of English. Frustration
should be avoided and cooperation, autonomy and condence encouraged. Such a vocabulary teaching
approach based on lexical errors will reduce anxiety and stress levels and increase motivation. In this sense, we do
not talk of lexical error correction, but rather the lexical error is a helper, a window into learners’ mental lexicon
which will aid further lexical learning. We learn from the lexical errors.
1, 2, 3, 4) Lexical creations, borrowings, and calques have their origin in L1 in'uence. The in'uence of the
native language is pervasive during the whole L2 acquisition process, but it is even stronger in the rst stages.
Counteracting the effects of the L1 is very difcult, if not impossible, so it would be a far better option to ally
with it for L2 vocabulary instruction. In this sense, the L1 can serve as a scaffold or support towards independent
lexical use.
Offering students contrastive explanations can be the rst step, since the L1 has been found to be active
during L2 processing even at high levels of prociency (Schmitt, 2008; Sunderman & Kroll, 2006). Raising
learners’ awareness of the differences and similarities between L1 and L2 lexico-semantic systems is crucial for

successful vocabulary acquisition as Hemchua and Schmitt (2006) probed in their analysis L1 originated errors in
learners’ compositions. Moreover, warning learners of the dangers of literal translation and of lack of
straightforward semantic and contextual equivalence between L1 and L2 words or expressions is essential (cf.
Warren, 1982). This is of special relevance, since as Schmitt (2008) notes learners rmly believe that translating
will help them learn vocabulary words, idioms, and phrases.
5) Nevertheless, English shares a number of cognates with other languages, not to speak of international
words most of which come from English, which can be very helpful in articulating discourse. Learners can be
instructed on cognates and international words so that they can take advantage of these similarities and use them
to their advantage. Instructing learners to take advantage of their L1 lexical knowledge by resorting to cognate
use is a good way to increase learners’ vocabulary competence. Moreover, teaching them false friends will
presumably prevent erroneous word meaning inference.
Translation activities could also promote instances of positive L1 in'uence at the lexical level. Laufer and
Girsai (2008) found translation to be a particularly successful instructional condition, since it is an ideal task for
pushed output and fosters the mobilization of linguistic resources such as contrastive comparisons. Schmitt
(2008) also points to the benets of using the L1 to establish initial form-meaning links, and since at the rst
stages of acquisition learners are unlikely to absorb much contextualized knowledge about the words, there are
few possible negative effects of L1 use. However, at more advanced stages of acquisition the value of the L1
lessens and words should be presented in context, because learners can learn more from this (Schmitt, 2008).
Phonetic or formal confusions arise when two similarly looking or sounding words are mixed up. Warren
(1982) suggests that learners should be taught the form-meaning link of both words: the incorrectly used word
and the target word, contrasting them. Furthermore, teachers should also instruct learners on homophones and
give them examples. Homophones, or words which sound the same but have a different meaning, can be a
potential source of formal confusion. Similarly, words which have a similar meaning but a slightly different
contextual distribution in the L1 and L2 are also strong candidates for explicit instruction. The teaching of
formally, and especially of semantically similar words, has been prey to some controversy. Some authors have
claimed for a simultaneous teaching of semantically related words such as synonyms, antonyms, or hyponyms
(e.g. Nation, 2001; Tagashira et al., 2010). The idea that these semantic webs re'ect the way the mental lexicon is
organized underlies and justies this technique (Nation, 2001; Stoller & Grabe, 1995). However, a different trend
in research (Nation, 1990; Waring, 2007) has highlighted the higher likelihood of confusion when related words
are taught together at the same time, and advocate for teaching one member of the pair/tryad rst, and the
other only when the rst one has been properly mastered. From the evidence of lexical error production, we
believe that contrasting formally or semantically similar words and teaching them accordingly might be an
adequate approach to solve these problems of confusion. In this line of reasoning, we agree with Warren (1982)
when she calls for the identication of the common semantic trait(s) or semma(s) of the confused words and the
isolation of the distinguishing feature(s) to understand the confusion. This identication can proceed in two ways,
either the teacher gives explicit account of it, or they let learners deduce those features from a series of
contextualized examples. Using pictures, as we suggest below, can be an efcient way of contextualizing lexical
items.
Creating a meaningful context in which lexical learning is related to feelings and experiences is a technique
which will surely enhance vocabulary acquisition through deep processing (cf. Arnold & Foncubierta, 2013).
Establishing emotional links between the lexical items and learners’ personal memories and experiences will not
only help them remember better the words they wish to learn, but will also presumably prevent lexical errors.
This trend of exploring sensory-emotional intelligence and linking it to lexical learning has been recently
brought to light by some researchers such as Arnold and Foncubierta (2013) who propose tasks and exercises that
exploit this relationship in the FL classroom. We certainly believe this is a very fertile avenue for lexical
instruction and lexical error remediation.
We can think of a series of activities that can help learners reinforce the form-meaning link of the new
words and activate prior knowledge and contrast word meanings (cf. Graves et al, 2012). Semantic or conceptual

maps with both pictures and word forms evidence the relationships between words and make the specic traits
patent (cf. Barreras Gómez, 2004); in this sense, of special interest are virtual tools that can help the teacher and
the students also create conceptual mappings allowing for different colors, sizes, or movement. Semantic feature
analysis has been found to lead to robust word learning surpassing traditional vocabulary instruction (Bos, Allen ,
& Scanlon, 1989). This technique would allow learners to dissect the meaning(s) of the target and the error word
and compare and contrast them. Additionally, a semantic eld bingo (food, family, school) can be a fun tool to
practice related words highlighting common and distinctive features (cf. Barreras Gómez, 2004). Finally,
providing learners with the L1 equivalent of the target and error word might be the most effective intervention
(Warren, 1982). In this, Webb and Kagimoto (2011) found a benecial effect of providing glosses to help learners
learn collocations in the L2.
6) Spelling problems are, as commented on above, very numerous in EFL learners’ productions, including
the particular group of phonetic spelling. In traditional EFL classrooms, spelling and the link between spelling
and pronunciation was not paid much attention to. However, more recent teaching methodologies include
explanations concerning the different written renderings of vowel and consonant sounds as well as the multiple
plausible pronunciations of specic letters.
Grouping words according to their spelling and/or pronunciation is a good activity to learn how to write
and pronounce them. These kinds of explanations and subsequent exercises can help learners become familiar
with the grapho-phonological rules of the English language and thus overcome the problems posed by the
discordance between spelling and pronunciation. Using morphological knowledge of e.g. in'ectional sufxes,
derivational prexes or sufxes or knowledge of Latin or Greek roots can greatly enhance spelling abilities and
reduce misspellings considerably. Teaching learners English morphology, morphological patterns, building words
from word parts: roots plus afxes, breaking words into morphemes, identifying lexical units within compound
words, and teaching how this relates to lexical knowledge, and how to apply this to avoid lexical errors can be a
useful and interesting idea, for instance, morphemes such as -less, -ful, -able, in-, im-, un-, roots such as “tract” or
“voc”, or the units of complex or compound words such as screwdriver, schoolbag, blackboard.
Computer assisted vocabulary instruction can be very useful to prevent learners from committing
misspellings and phonetic spelling errors. By using a sound and recording device, learners can be encouraged to
produce the problematic lexical items and to check the gap between the native pronunciation, their
pronunciation, and the written rendering of the words. Similarly, using still and motion graphics and colors to
highlight new or difcult orthographic patterns, e.g. double consonants, silent letters, afxes can also be very
interesting.
But teaching cannot stop with controlled and guided focus on forms activities; increasing free written and oral
production within communicative tasks would mean a great step towards remedying and preventing misspellings.
Furthermore, if these communicative tasks include (language) games or ludic activities, such as crosswords, word
search puzzles, or hangman, their effectiveness towards the desired EFL learning outcomes could be augmented.
EFL teaching to young learners is starting to incorporate the “Jolly Phonics” method. This method was
designed by Lloyd and Wernham (1992, 2012) and has traditionally been used in English language teaching to
native children. It consists in relating pronunciation and spelling, joining isolated sounds to make up larger sound
combinations and form words. The segmentation of words into sounds is the other alternative of the method.
When generalizations or systematizations do not work, e.g. with words defying graphophonetic rules, then
learners are encouraged to practice those words in extra activities. This method is especially appropriate for
children, since it presents words whose meanings can be inferred from actions, mimicry, pictures, 'ashcards, or
objects. But its multisensory character, which links new words with learners’ multiple intelligencies such as
musical, kinesthetic, intrapersonal, or spatial (e.g. García de Celis, 2005; Gardner, 1994) makes it a good
candidate technique for vocabulary teaching at all levels. Furthermore, this relates to the above mentioned idea
of linking new words to old experiences and making vocabulary teaching acquisition an experiential and sensitive
activity (cf. Arnold & Foncubierta, 2013). With these considerations in mind, we might contend that this method
might be instructionally more helpful than previous attempts at teaching the pronunciation-spelling link.

7) Researchers working in cognitive linguistics and construction theory also advocate for explicit instruction
of the new and problematic structures in the lexico-grammar continuum. They have given this approach the
name of “Pedagogical grammar” (e.g. Dirven 2001). Together with explicit explanations and L1-L2 comparisons,
cognitive linguistic and sociocultural approaches to language pedagogy call for other techniques and activities
such as input enhancement to increase the perceptual salience of the lexical items to help retention or to
highlight their communicative relevance (Della Putta, 2015; Della Putta & Visigalli, 2012). We agree with Della
Putta (2015) in the need to help learners “unlearn” certain linguistic features and encourage them to
reconceptualize the reality around them according to the rules and codes of the L2. We need to clarify the way
the L2 embodies reality by explicit explanations, mimicry, or pictures, by promoting interaction and meaning
negotiation (e.g. Long, 1996), and by manipulating the input to lead learners to notice the new lexical items.
Not only do lexical errors have teaching pedagogical applications, we can also think of them as quality
reference. In general, their presence in learners’ productions would make them score lower. However, the
correlation is not straightforward. Lexical creations or misspellings do not represent important communication
breakdowns (cf. Agustin-Llach. 2011), but borrowings or calques are relevant communication disturbers. Their
seriousness resides in whether they cause intelligibility problems (Hughes & Lascatatou. 1982; Johansson. 1978).
Their relative importance also derives from the acquisition stage at which the learner nds him or herself.
Research has been able to associate lexical error types with specic acquisition stages (cf. Agustín-Llach, e.g.
2011, Hemchua & Schmitt, 2006), so that if a learner commits lexical errors typical of further stages of
acquisition, they cannot be considered as serious. Lyster et al. (2013) also highlight the importance of lexical
errors as crucial instruments in comprehension and call for the need to target them in L2 vocabulary instruction.
Conclusion
This is an exploratory theoretical paper in which we try to join two research trends. First, the examination and
systematization of lexical errors constitutes a major research area within SLA and lexical studies. Here, we have
not accounted for lexical error results of a particular population, but have rather presented general ndings of
some previous main lexical error studies of EFL learners. Research-based generalizations of lexical error
production will lead our pedagogical implications. Thus, secondly, banking on these frequent developmental
lexical errors, we have tried to propose some lines for pedagogical actions and interventions in vocabulary
instruction in EFL - a vivid line of research.
In the present study, we give no frequencies of lexical errors because we base on generalizations of
previous studies. The review of lexical error types comes from the need to nd tendencies or systematizations of
those lexical error categories and mainly of their causes, since they are going to be the stepping stones upon
which we are going to propose some pedagogical actions. Similarly, we do not intend to rank lexical errors or
vocabulary teaching activities or tasks according to their impact in vocabulary acquisition, but rather show some
general possibilities for the EFL classroom, always with the information of problem areas from lexical errors in
mind. The systematization of causes of lexical errors in EFL learners allows us to suggest some vocabulary
instruction and practical implementations.
This paper is of theoretical stance with some aspiration for practical application. Likewise, have not
conducted a specic study with actual informants and derived our proposal from the ndings. Rather, we have
generalized ndings from previous studies addressing the exploration of lexical errors in actual EFL learners’
productions and have extracted common tendencies and devised some lines for vocabulary instruction based on
those observed trends. Among the main conclusions to be drawn from this theoretical paper, however, we can
highlight one which affects foreign language teaching policies and refers to the need of explicit instruction of
vocabulary not only as concerns the form-meaning link exclusively, but also its relation to the L1 equivalents, the
spelling-pronunciation link, and its contextual distribution in syntactic, semantic, and pragmatic contexts. From
these observations and considerations, we also agree with Schmitt (2008) who concludes that evidence of

research studies suggests that different teaching methods may be appropriate at different stages of vocabulary
learning.
Further research should focus on experimentally testing these suggestions in the EFL classroom to check
for their effectiveness in vocabulary acquisition. A thorough analysis of lexical errors which extends through
several years can help us better understand the process of lexical development. Moreover, identifying the
variables that affect such process such as learner age, gender, native language, instructional approach, or
intralexical factors will be of great help to maximize lexical learning. Applying the results of such studies to
practical vocabulary instruction is a task which should receive far more attention in future research.
Endnotes
1
At this point we need to make two clarications. First, lexical errors can also derive from lack of word knowledge simply,
faulty rule application, or overgeneralization or transfer. Second, the application of vocabulary learning and communication
strategies does not necessarily lead to the commission of a lexical error. On the contrary, myriad are the examples of
successful application of vocabulary strategies that result in correct language use.
References
Agustín Llach, M.P. (2004). “Pedagogical implications and application of lexical inconsistencies errors in second
language classroom” en APAC of News, 52: 34-39.
Agustín-Llach, M.P. (2011). Lexical errors and accuracy in foreign language writing. Bristol: Multilingual Matters.
Agustín-Llach, M.P. (2014). Early Foreign Language Learning: The case of Mother Tongue In'uence in
Vocabulary use in German and Spanish Primary-School EFL Learners. European Journal of Applied
Linguistics. 2 (2), 287-310.
Agustín Llach, M.P. (2015). Lexical errors in writing at the end of Primary and Secondary Education:
description and pedagogical implications. Porta Linguarum. 23, 109-124.
Albrechtsen, D., Henriksen, B., & Faerch, C. (1980). Native Speaker Reactions to Learners’ Spoken
Interlanguage. Language Learning 30, 365-396.
Arnold, J. & Foncubierta, J.M. (2013). La atención a los factores afectivos en la enseñanza del español. Madrid: Edinumen.
Barreras Gómez, A. (2004). Vocabulario y edad: pautas para su enseñanza en las clases de Inglés de Educación
Primaria. Aula Abierta, 84, 63-84.
Bos, C. S., Allen, A. A., & Scanlon, D. J. (1989). Vocabulary instruction and reading comprehension with
bilingual learning disabled students. National Reading Conference Yearbook, 38, 173-179.
Bouvy, C. (2000). Towards the construction of a theory of cross-linguistic transfer. In J. Cenoz & U. Jessner (Eds),
English in Europe. The acquisition of a third language. (pp. 143-156). Clevedon: Multilingual Matters.
Celaya, M.L. & Torras, M.R. (2001). L1 in'uence and EFL vocabulary: do children rely more on L1 than adult
learners? Proceedings of the 25th AEDEAN Meeting. December 13-15, University of Granada. 1-14.
Corder, S.P. (1967). The Signicance of Learner’s Errors. IRAL 5, 161-170.
Della Putta, P. (forthcoming). How to discourage constructional negative transfer: Theoretical aspects and
classroom activities for Spanish learners of Italian. In K. Masuda & C. Arnett (Eds.), Cognitive Linguistics
and Sociocultural Theory in Second and Foreign Language Teaching, Mouton De Gruyter.

Della Putta, P. & Visigalli, M. (2012). A Classroom-based Study: Teaching the Italian Noun Phrase to
Anglophones. Paper presented at the Ireland International Conference on Education, 16th -18th April 2012,
Dublin.
Dirven, R. (2001). English phrasal verbs: theory and didactic application. In M. Pütz et al. (Eds.), Applied Cognitive
Linguistics II: Language Pedagogy. (pp. 3-27) Berlín/Nueva York: de Gruyter.
Ellis, R. (1994). The study of second language acquisition. Oxford: Oxford UniversityPress.
Engber, C.A. (1995). The Relationship of Lexical Prociency to the Quality of ESL Compositions. Journal of
Second Language Writing 4, 139-155.
Fernández, S. (1997). Interlengua y análisis de errores en el aprendizaje del español como lengua extranjera. Madrid: Edelsa.
Ferris, D. R. (1999). The case for grammar correction in L2 writing classes: A response to Truscott (1996). Journal
of Second Language Writing, 8, 1–10.
Garcia de Celis, G. (2005). Inteligencias múltiples y didáctica de lenguas extranjeras. Iberpsicología: Revista
Electrónica de la Federación española de Asociaciones de Psicología, 10, 7.
Gardner, H. (1994). Multiple intelligences theory. In R. J. Sternberg (Ed.), Encyclopedia of human intelligence (Vol. 2).
(pp. 740-742). New York: Macmillan.
Goldberg, A. (2006). Constructions at Work: the nature of generalization in language. Oxford University Press.
Graves, M. F., August, D. & Mancilla-Martínez, J. (Eds.). (2012). Teaching vocabulary to English language learners. NY:
Teachers College Press.
Hemchua, S. & Schmitt, N. (2006) An analysis of lexical errors in the English compositions of Thai learners.
Prospect 21 3, 3-25.
Hughes, A. and Lascaratou, C. (1982). Competing criteria for error gravity. ELT Journal 36 3, 175- 182.
James, C. (1998). Errors in language learning and use. Exploring error analysis. London: Longman.
Jiménez Catalán, R.M. (1992). Errores en la producción escrita del inglés y posible factores condicionantes . Madrid: Editorial
de la Universidad Complutense. Colección Tesis Doctorales n° 73/92.
Johansson, S. (1978). Problems in studying the communicative effect of learner’s errors. Studies in Second Language
Acquisition 1, 10, 41-52.
Kellerman, E. (1979). Transfer and Non- Transfer: Where We Are Now. Studies in position of the node word, and
synonymy affect learning? Applied Linguistics 32,3, 259−276. Second Language Acquisition 2,1, 37-57.
Laufer, B. & N. Girsai. (2008). Form-focused instruction in second language vocabulary learning: a case for
contrastive analysis and translation. Applied Linguistics 29,4, 694−716.
Lindell, E. (1973). The Four Pillars: On the Goals of a Foreign Language Teaching. I n J. Svartvik (Ed), Errata:
Papers in Error Analysis. (pp. 90-101). Lund: GWE Gleerup.
Lloyd, S & Wernham, S. (1992). The Phonics Handbook. Essex: Jolly learning Ltd.
Lloyd, S & Wernham, S. (2012). Guía para padres/profesores. Essex: Jolly learning Ltd.
Long, M. H. (1991). Focus on form: A design feature in language teaching methodology. In K. de Bot, R.B.
Ginsberg & C. Kramsch (Eds.), Foreign language research in cross-cultural perspective. (pp. 39-52). Amsterdam:
John Benjamins,.
Long, M. H. (1996). The role of linguistic environment in second language acquisition. In W. Ritchie & T. Bhatia
(Eds.), Handbook of research on second language acquisition. (pp. 413−468). Malden, M.A.: Blackwell.
Lyster, R., K. Saito & M. Sato. (2013). Oral corrective feedback in second language Classrooms. Language Teaching
46, 1−40.
McWhinney, B. (2005). Emergent Fossilization. In Z. Han & T. Odlin (Eds), Studies of Fossilization in Second
Language Acquisition. (pp. 134-156). Clevedon, U.K.: Multilingual Matters.
Meara, P. (1984). The Study of Lexis in Interlanguage. In A. Davies, C. Criper & A. P. R. Howatt (Eds),
Interlanguage. (pp. 225-239). Edinburgh: Edinburgh University Press,.
Meara, P. (1996). The dimensions of lexical competence. In G.Brown, K. Malmkjaer & J. Williams (Eds),
Performance and Competence in Second Language Acquisition. (pp. 35-53). Cambridge: CUP.
Nation, P. (1990). Teaching and Learning Vocabulary. Boston: Heinle and Heinle Publishers.

Nation, P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Richards, J. (2008). Moving Beyond the Plateau From Intermediate to Advanced Levels in Language Learning. Cambridge
University Press: New York.
Ringbom, H. (2001). Lexical Transfer in L3 Production. In J. Cenoz, B. Hufeisen & U. Jessner (eds) Cross-linguistic
InMuence in Third Language Acquisition: Psycholinguistic Perspectives. (pp. 59-68). Clevedon: Multilingual Matters.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and Second Language Instruction. (pp. 3−32).
Cambridge University Press.
Schmitt, N. (2008). Teaching Vocabulary. Pearson Longman. Available online:
http://www.longmanhomeusa.com/content/FINAL-HIGH%20RES-Schmitt-Vocabulary
%20Monograph%20.pdf Accessed 22nd December 2014.
Solís Hernández, M. (2011). “Raising Student Awareness about Grammatical and Lexical Errors via Email”
Revista de Lenguas Modernas, 14, 263-281.
Stoller, F. & Grabe, W. (1995). Implications for L2 vocabulary acquisition and instruction from L1 vocabulary
research. In T. Huckin, M. Haynes & J. Coady (Eds), Second language reading and vocabulary learning. (pp. 24-
45). Norwood, N.J.: Ablex Publishing Corporation.
Sunderman, G. & Kroll, J.F. (2006). First language activation during second language lexical processing. Studies in
Second Language Acquisition 28, 387-422.
Tagashira, K., Kida, S., & Hoshino, Y. (2010). Hot or gelid? The in'uence of L1 translation familiarity on the
interference effects in foreign language vocabulary learning. System, 38, 412-421.
Verspoor, M., Schmid, M.S., & Xu, X. (2012). A dynamic usage based perspective on L2 writing. Journal of Second
Language Writing 21, 239–263
Waring, R. (1997). The negative effects of learning words in semantic sets: A Replication. System 25,2, 261-274.
Webb, S. & Kagimoto, E. (2011). Learning collocations: do the number of collocates, position of the node word,
and synonymy affect learning? Applied Linguistics, 32(3), 259-276.
Zimmermann, R. (1986). Classication and Distribution of Lexical Errors in the Written Work of German
Learners of English. Papers and Studies in Contrastive Linguistics 21, 31-40.
Acknowledgements
This research has been funded by the Spanish Ministry of Economy and Competitiveness through grant number
FFI2010-19334/FILO.
About the author

María Pilar Agustín Llach (PhD) is a tenured associate professor at the Department of Modern Philologies at the
Universidad de La Rioja, where she teaches courses on foreign language acquisition and teaching. Her main
research interests are second and foreign language acquisition, particularly vocabulary acquisition.

Lexical transfer in the writing of Chinese learners of English

Marina Dodigovic*
American University of Armenia, Armenia
Chengchen Ma
Song Jing
Australian National University, Australia
Abstract
This study aims to further the understanding of rst language (L1) lexical transfer within the context of L1 Chinese learners
of English. Previous transfer research has often focused on a small subset of grammar errors, without examining how lexical
choices, especially in collocations and multi-word units (MWU), might have been in)uenced by L1 or L1-based assumptions
about vocabulary use. There is therefore a need to look for evidence of L1 transfer or word-for-word translation from the
native language in L2 production at each of the three levels: individual words, collocations and MWU. Such errors points to
subordinate bilingualism, which is rooted in translation as a teaching/learning method (Cook, 2014), which is common in
China (Edmunds, 2013). Therefore this paper addresses the following research questions: 1) To what extent does the transfer
of L1 word polysemy, collocations, and MWU impact Chinese learners’ English vocabulary use? 2) Are more advanced
learners as prone to L1 lexical transfer errors as the less advanced ones? The approach used here is corpus-linguistic. The
main research task is to examine an existing corpus of Chinese student writing in English and analyze and classify the
identied lexical transfer errors. The ndings indicate that the most common of these are errors caused by L1 polysemy in
individual words, followed by MWU and collocation errors. More advanced learners appear to be slightly but not
signicantly less prone to lexical transfer errors. Instruction which follows the recommendations made in this paper is likely
to prevent the onset of such errors.
Keywords: lexical transfer, polysemy, collocation, multi-word unit, subordinate bilingualism
Introduction
First language (L1) transfer, sometimes also called L1 interference or ‘cross-linguistic in)uence’ (Jarvis &
Pavlenko, 2008), has been found at different linguistic levels, from phonology and spelling to discourse. Although
transfer research has so far mainly focused on grammar, MacWhinney (1992) suggested that a signicant number
of L1 transfer cases were found at the lexical level as well. However, only a modest amount of research has been
conducted to investigate the impact of L1 transfer on L2 vocabulary acquisition. Yet, vocabulary is of
tremendous importance, as ‘without vocabulary nothing can be conveyed’ ( Wilkins, cited in Thornbury, 2002).
Words build language structures and convey L2 learners’ intended meaning, but only when they are
appropriately selected and used (Shalaby, Yahya & Kl-Komi, 2009). Inappropriate use of words could lead to
errors and miscommunication. Such errors are called lexical errors (Augustin Llach, 2011).
Errors have served as indices of writing quality in formal contexts. In other words, there is ‘a negative
correlation between quality writing and linguistics errors in general and lexical errors in particular’ (Augustin
Llach, 2011, pp.67). In order to be able to deal adequately with lexical errors found in learner writing, language
teachers should be aware of the sources of such errors. This is particularly important in Chinese contexts, where
*Tel.: +374 60612-740; Email: mdodigovic@aua.am; 40 Marshal Baghramyan Ave., Yerevan, 0019, Armenia
2017 TESOL International Journal Vol. 12 Issue 1  ISSN 209

corpus studies have already recognized a signicant presence of lexical errors (Chan, 2010; Liu, 2011; Edmunds,
2013).
Relevant research (Hemchua & Schmitt, 2006; Zhou, 2010; Xia, 2013) suggests that lexical errors do not
only occur at single word level, but also at collocation and multi-word unit (MWU) levels (Gray & Biber, 2013).
Among those are lexical transfer errors which have been identied at every level (Yang, Ma & Cao, 2013; Li,
2005; Yamashita & Jiang, 2010), although not necessarily all within the same study. In short, lexical transfer
errors are lexical errors caused by L1 transfer. The identication of lexical transfer errors in the English output
of Chinese learners coincides with the nding that grammar-translation method happens to be the prevalent L2
instruction approach in China (Edmunds, 2013). If conrmed, such ndings could have profound implications
for language teaching practice in Chinese contexts. Hence, the present study aims to identify and analyze the
cases of negative lexical transfer from Chinese to English caused by 1) Chinese word polysemy at single word
level, 2) Chinese collocations, and 3) Chinese MWU.
Previous Research on Lexical Transfer

The literature on L2 lexis is based on two broad traditions: one is applied linguistic and the other is
psycholinguistic (Schmitt & Meara, 1997). Two types of lexical studies are generally found in applied linguistics,
one being the study of vocabulary size and the other being that of vocabulary depth. Vocabulary size, or the
number of words known (Schmitt, 2000), is indeed a critical indicator of a learner’s lexical competence, but a
number of researchers now believe that vocabulary depth is the crucial factor in measuring L2 vocabulary
knowledge (Folse, 2006). This includes the meaning(s) of a word, its written form, spoken form, grammatical
behavior, the collocations it can build, the register it is associated with, the associations it attracts, as well as its
frequency (Schmitt, 2000).
Central to the psycholinguistic study of lexis is the notion of ‘mental lexicon,’ which refers to the
representation of word knowledge in permanent memory (Carroll, 2007). According to Jiang (2000), two terms
are used to describe a lexical entry. One is lexeme (also called lexical unit or lexical item), and the other is lemma.
A lexeme consists of morphological and phonological information, which functions as a single meaning unit,
while the lemma contains semantic and syntactic information (Jiang, 2000). In L1, the four aspects of a lexical
item (semantics, syntax, morphology, and phonology) are highly integrated with each other, so that one activated
feature could activate the rest (Jiang, 2000). However, L2 lexical representation and processing in L2 learners’
mind is different (Cook, 2014).
According to Jiang (2000), there are two constraints in L2 vocabulary learning. The rst is the lack of
both adequate quantity and quality of L2 input, and the other one is the in)uence of the well-structured L1
lexical system and its associated conceptual system. The former constraint makes it difcult for L2 learners to
construct ‘semantic, syntactic, and morphological specications … [of] a word’ (Jiang, 2000, p. 49), while the
latter makes it easy for L2 learners to rely on their L1 knowledge while building the L2 language system. There
are three stages in Jiang’s (2000) L2 lexical processing model, namely the initial stage, the L1 lemma mediation
stage and the L2 integration stage. In particular at the second stage, the link between L2 words and their
associated L1 translation equivalents becomes stronger (Jiang, 2000). Due to the above constraints, many L2
learners never attain the third stage, their L2 vocabulary thus remaining dependent on L1 lexical features.
Similarly, Wolter (2006) suggested that L2 learners’ misunderstanding of L2 vocabulary is partly due to
their lack of L2 lexical knowledge, and partly because of excessive dependence on L1 lexis. Furthermore, Lewis
(1997) assumed that it is natural for L2 learners to fall back on their L1 when unable to express themselves in L2.
Finally, according to Cook (2014), an L2 word is often associated not only with a concept, but with the entire L1
word, which leads to subordinate bilingualism. Moreover, it is also deemed that less procient learners are more
likely to resort to connecting L2 words with the L1 ones (Kroll & Stewart, 1994; Jiang, 2002).
However, Pienemann et al. (2005) claimed the opposite, i.e. that advanced learners are more likely to
exhibit the signs of subordinate bilingualism in terms of L2 lexis than novice learners. These authors hold the
view that learners do not transfer elements from their L1 until they have acquired an adequate amount of L2

knowledge. This does not surprise in view of Pienemann’s (2003) processability theory, which posits that L2
acquisition is beholden to predictable developmental sequences, allowing learners to acquire only those linguistic
forms which are appropriate for their developmental stage. Thus, Pienemann et al. (2005) are more inclined to
believe that early errors are a consequence of stagewise L2 development, i.e. that they are developmental.
What Cook (2014) calls subordinate bilingualism is akin to the notion of L1 transfer. The concept of
‘transfer,’ a product of behaviorism, was used extensively in the early years of SLA to refer to the process in
which the learners’ L1 in)uences L2 in a positive or negative way (Gass, 2013). ‘Transfer’ or ‘cross-linguistic
in)uence’ is preferred by linguists in SLA, while the term ‘interference’ is used more commonly in
psycholinguistic approaches to SLA (Jarvis & Pavlenko, 2008). More recently, there have been attempts to
redene the terms. Thus, Grosjean (2001) pointed out that the term ‘transfer’ should be used to refer to the
permanent in)uence of L1 on L2, while ‘interference’ should be used to describe L1 features occurring from
time to time in L2. In this study, transfer is used in the sense of the denition rendered by Gass (2013). In the
same vein, L2 errors caused by L1 transfer are called transfer errors.
Transfer errors as well as developmental errors are deemed to contribute to the state of interlanguage
(Yip, 1995) or the language of L2 learners which is found somewhere along the continuum between the L1 and
L2 (Han & Selinker, 1999). Within the interlanguage framework, those errors which defy correction and persist
despite repeated instruction are called fossilized errors (Gass & Selinker, 2008). Fossilisation resembles Grosjean’s
(2001) narrow denition of transfer. It is connected with the Multiple Effects Principle (MEP), “which predicts
that when language transfer works in tandem with one or more second language acquisition processes” (Han &
Selinker, 1999, p. 248) interlanguage structures are more likely to stabilize, leading to a permanent in)uence of
L1 on L2 (Han & Odlin, 2006).
Cases of L1 transfer can be exacerbated by polysemous L2 words or such words that have more than one
meaning sense (Schmitt, 2000). For example, Lennon (1996) found that in speech production, L2 English
learners, even the advanced ones, frequently made errors while using polysemous words such as ‘go,’ ‘put ‘ and
‘take.’ Morimoto and Loewen (2007) conducted a quasi-experimental study involving 58 Japanese high-school L2
English learners to compare the effectiveness of two kinds of vocabulary instruction, the image-schema-based
instruction and translation-based instruction, on learning of English polysemous words. The results revealed that
image-schema-based instruction is more effective than the translation-based instruction.
Conversely, few studies have addressed the in)uence of L1 word polysemy on L2 vocabulary acquisition.
Amongst the few is the study by Duan and Qin (2012), which argues that, unlike English, Chinese makes use of
the same word (character) to express a number of meanings, which allows the Chinese language to be
economical. However, this misleads Chinese learners to believe that English follows the same pattern, which
could result in L1 lexical transfer errors.
Another lexical area with the potential for L1 transfer is collocation. Collocations are described as ‘the
combinations of words which occur naturally with greater than random frequency’ (Lewis, 1997, pp.25).
Collocational knowledge as a parameter which ‘distinguishes native speakers from nonnative speakers’ (Schmitt,
2000, pp. 79) is not easy for L2 learners to acquire. Even advanced learners appear to have difculty with L2
collocations (Yamashita & Jiang, 2010).
L1 in)uence was found to play a critical role in a number of studies of L2 collocation, in which
researchers either asked learners to take elicitation tests of collocations or collected and analyzed collocations in
learner production. The former type is more common. Thus, a translation test was developed by Biskup (1990,
cited in Nesselhauf, 2005) to investigate Polish learners’ knowledge of English collocations, and it was discovered
that participants seldom made errors in translation from L2 collocations to L1, but a considerable number of
errors occurred when translating collocations from L1 to L2. Yamashita and Jiang (2010) conducted a study
involving 47 Japanese learners of English, in which participants were shown some collocations on a computer
screen, after which they needed to judge whether or not those collocations were acceptable English collocations.
The ndings showed that English collocations which are congruent with Japanese collocations are more easily
learned than those that are incongruent. Nesselhauf (2003) analyzed the use of English collocations in a learner

corpus consisting of 32 essays written by L2 learners of English whose L1 was German. The study discovered
that 56% of collocation errors could be attributed to negative L1 transfer.
Studies with Chinese speakers have yielded similar results. Based on a cloze test administered to Chinese
learners, Shei (1999, cited in Nesselhauf, 2005) concluded that advanced Chinese learners found it more difcult
to learn English collocations when compared with their European counterparts. The research by Lombard
(1997, cited in Nesselhauf, 2005) which investigated the collocations produced by Chinese learners also found
that 10% of non-native-like collocations might be due to L1 transfer, other sources being the incorrect use of
English synonyms. Wang (2011) used three collocation tests to investigate language transfer in the acquisition of
light verb + noun collocations by Chinese learners. He found that 61.84% of the participants’ uses of English
light verb + noun collocations may be traced to either positive or negative transfer from Chinese. Hence he
concluded that the in)uence of Chinese on the acquisition of English collocation was obvious and signicant. He
also suggested that the priority in teaching L2 collocations should be given to the L1 incongruent ones. In Chen
and Lin’s (2011) study, 355 rst-year college students from three different universities in Taiwan were asked to
complete a 50-item multiple-choice collocation test and questionnaire. The results showed that L1 transfer,
together with overgeneralization and misapplication of synonyms, was one of the top three factors leading to
collocation errors.
Other scholars analyzed learner writing to explore common collocation errors made by Chinese learners
(Li, 2005; Duan & Qin, 2012; Yang, Ma, & Cao, 2013). For example, Duan and Qin (2012) found that some
common English collocation errors such as ‘eat medicine,’ ‘nd an object,’ ‘pay time,’ are due to negative transfer
from Chinese collocations. More evidence of transfer of Chinese collocations comes from Yang, Ma and Cao
(2013). They argue that due to L1 transfer, Chinese learners of English frequently produce unacceptable
collocations in English, such as ‘learn knowledge’ or ‘strong competition.’
Beyond collocations, which are usually restricted to two words, there are longer strings of words called
multi-word units or MWU (Schmitt, 2000). They have been categorized into four linguistic categories: ‘phrasal
verbs,’ ‘xed phrases,’ ‘idioms’ and ‘proverbs’ (Schmitt, 2000). From the point of view of language production,
MWU are regarded as ‘formulaic expressions,’ ‘lexical phrases,’ or ‘lexical chunks’ which are stored in long-term
memory and can be easily activated (Pawley & Syder, 1983). Therefore, they are one of the key elements of
accurate, )uent and efcient linguistic production, playing a critical role in SLA (Biber et al., 1999; Erman and
Warren, 2000; Hyland, 2008).
Furthermore, except for idioms, whose meaning does not equal the meanings of their component parts,
most MWU are to some extent compositional (Nation, 2013). In other words, the meaning of each component
part in an MWU contributes to the meaning of the whole. As the selection of words in an MWU is not arbitrary,
the knowledge of its parts is conducive to the understanding of the whole MWU (Bogaards, 2001; Boers &
Lindstromberg, 2009). MWU are also regarded as being transferable because they are semantically and
syntactically compositional. L2 MWU studies (e.g. Rafee, Tavakoli & Amirian, 2011; Adel & Erman, 2012;
Karabacak & Qin, 2013) indicate that, compared with native speakers, L2 learners use fewer MWU, while also
being likely to overuse certain MWU and underuse others.
Few studies have so far addressed the L1 in)uence on the acquisition of L2 MWU. One of the
exceptions is Peromingo (2012), who explored the L1 in)uence on L2 learners’ production of both correct and
incorrect English MWU by analyzing argumentative writing from several learner corpora. The ndings suggest
that L2 learners tend to overuse the MWU which are similar to the L1 ones.
Paquot (2013) analyzed the writing by French learners of English from the rst version of International
Corpus of Learner English (ICLE) with special focus on their use of English 3-word sequences with a lexical
verb (marked). The results indicated that French learners made few errors in using English 3-word sequences
with a lexical verb. However, more errors were found in their selection of English unmarked word combinations,
whose French translation equivalents are easy to trace.
In conclusion, studies of lexical transfer have so far examined this phenomenon in three different
contexts: single word polysemy, collocation and MWU, although not necessarily all at once. These contexts
coincide with the scope of the term lexis, which subsumes not only single words, but also collocations and MWU

(Schmitt, 2000; Thornbury, 2002). Hence, any comprehensive studies of lexical transfer should not be restricted
to individual words, but would need to observe the effects of L1 on words in both collocational and MWU
contexts (Gray & Biber, 2013). In accordance with this, Dodigovic, Wei and Jing (2015) proposed the following
taxonomy of the contexts for lexical transfer from Chinese to English 1) Chinese word polysemy, 2) Chinese
collocations, 3) Chinese MWU. This taxonomy is followed in the current study.
Cook (2014) identies the grammar-translation approach to teaching as one of the causes of subordinate
bilingualism or lexical transfer. This approach appears to be the dominant L2 instruction methods in China
(Edmunds, 2013). Vocabulary teaching in this method encourages establishing links between L1 and L2 single
words, but does not necessarily pay attention to collocations or MWU, which sets the stage for word-for-word
translation and hence lexical error. Hence, it is important to investigate the evidence of lexical transfer in China,
in all of its lexical contexts. The ndings could have signicant implications for L2 teaching practice in the
Chinese speaking world. Hence, the current study aims to identify and analyze the cases of negative lexical
transfer from Chinese to English caused by 1) Chinese word polysemy at single word level, 2) Chinese
collocations, and 3) Chinese MWU. In doing so, it addresses the need for a comprehensive approach to lexis in
lexical transfer research, since corpus studies have to some extent examined individual aspect, but not necessarily
the entire scope of lexis in Chinese-English interlanguage.
Research Questions
To obtain a better understanding of the in)uence of Chinese as L1 on English vocabulary learning, the present
study attempts to address the following research questions: 1) To what extent does the transfer of L1 word
polysemy, collocations, and MWU impact Chinese learners’ English vocabulary use? 2) Are more advanced
learners as prone to L1 lexical transfer errors as the less advanced ones? Both, Chinese learners of English and
their teachers, stand to gain from the answers to these questions and the ensuing implications for the teaching
practice. It is hoped that the insights gained through this study will contribute to the increase of English language
prociency in Chinese contexts.
Methodology
Data
The learner corpus used in the present study consists of 100 samples of writing (541,482 words in total). Fifty of
those were written by rst year students and 50 by fourth year students at a Sino-British English medium
university in China. The students whose writing was included were native speakers of Chinese from the
department of English. They were aged between 18 and 23, and had been learning English for at least 6 years
(three years in middle school and three years in high school) prior to university enrolment. Research ethics was
abided by as per university protocol.
The genre and topics of year one and year four student writing are somewhat different. First year
students were required to write a 1,000-word essay to demonstrate their appreciation of a given movie, a poem
or a novel chapter, on which they were able to work for several weeks. In contrast, year four students presented
their Final Year Projects (FYP), on which they had worked for an entire year. Each FYP was at least 10,000 words
long. Despite the considerable differences, each type of assignment was level-appropriate, thus being a true
indicator of the writers’ ability in written English.
In the present study, each instance of negative lexical transfer from Chinese to English, in word,
collocation, or MWU, was counted as one error. Instances of L1 lexical transfer, highlighted and marked in the
corpus, were grouped into the three pre-dened categories: 1) those caused by Chinese word polysemy, 2)
Chinese collocations and 3) Chinese MWU. The average length of writing by rst year students was 1,000 words,
while that by nal year students was 10,000. Due to the considerable difference in word count, it was deemed
that the raw frequencies could be misleading. In order to make the results better comparable across sub-samples,

there was a need to ‘norm’ the raw frequency (Biber, Conrad & Reppen, 1998). In the present study, the counts
were normed to a basis of 1,000 words using the following formula:
Raw error frequency = (number of errors / total word count) * 1,000
The data was subsequently processed using the IBM SPSS Statistics Version 20 for both descriptive and
inferential analysis.
Procedure
The corpus of student writing was manually analysed for lexical errors by initially three and later two Chinese-
English bilinguals, who used dictionaries and the English corpora available through lextutor.ca to assess the
accuracy of their linguistic judgement. The accuracy of their judgement was subsequently veried by a native
English speaker. Based on their native speaker Chinese competence, the analysts also decided whether the
context causing error is the polysemy of the Chinese equivalent of the target English word, an underlying
Chinese collocation or a Chinese MWU. Each instance was counted as one error. For example, in cases where the
error was based on an underlying Chinese collocation which also included a polysemous Chinese word, the error
was counted as collocation error. Thus, larger lexical units took precedence over smaller ones in error count. The
accuracy of the Chinese aspect of this research was subsequently veried by an L1 Chinese rater.
In order to investigate whether the polysemous Chinese words identied in the learner corpus as the
cause of lexical transfer are frequently used by native speakers of Chinese, the corpus of ‘Texts Of Recent
Chinese,’ whose acronym is ToRCH (TORCH2009, Texts of Recent Chinese, Brown family, 2009, 2013 summer
edition) available from http://111.200.194.212/cqp/torch09/ was used in the study. The ToRCH project was
initiated under the name of CC2009 meaning Chinese corpus 2009 by Corpus Research Group at Beijing
Foreign Studies University. The current version was nalized on 20 July 2014 after the removal of some
duplicated portions of texts. The corpus contains 1,066,347 tokenized Chinese words or 1,670,356 Chinese
characters from texts of 15 types (Press: Reportage, Press: Editorial, Press: Reviews, Religion, Skill and hobbies,
Popular lore, Belles-lettres, Miscellaneous: Government and house organs, Learned, Fiction: General, Fiction:
Mystery, Fiction: Science, Fiction: Adventure, Fiction: Romance, and Humor). While the polysemy of the
Chinese words was judged by at least two native speakers of Chinese and later veried by an L1 Chinese rater,
the frequency was measured in terms of the word’s frequency ranking within the ToRCH corpus.
Quantitative Research Results

The 100 writing samples in the learner corpus analyzed in the current study yielded a total of 395 lexical transfer
errors (199 caused by Chinese word polysemy, 87 by Chinese collocations, and 109 by Chinese MWU). In order
to answer the rst research question (To what extent does the transfer of L1 word polysemy, collocations, and
MWU impact Chinese learners’ English vocabulary use?), the number of lexical errors caused by the three
categories of Chinese transfer was counted. As can be seen in the Figure 1 below, a clear difference was found in
the proportion of lexical errors caused by 1) Chinese word polysemy, 2) Chinese collocations and 3) Chinese
MWU.
The greatest number of errors was caused by Chinese word polysemy (50%), which accounts for one
half of the total number of L1 lexical transfer errors. It is followed by errors resulting from Chinese MWU (28%)
and Chinese collocations (22%).
The frequency of the Chinese words identied in the study as those causing transfer because of their
polysemy is displayed in the Table 1. As can be seen in the table, the Chinese character ‘ 有(you)’ appears 6,539
times in the corpus of ToRCH, ranking as the rst, followed by the character ‘ 都 (dou)’ with a frequency of
3,747. The third most frequent Chinese character is ‘大(da)’ with a frequency of 3,078. It would appear that the
more frequent the word, the more often it causes L1 transfer in L2.

Figure 1. Errors in three categories
Table 1
The Frequency of Some Chinese Polysemous Words in the Corpus of Texts of Recent Chinese
NO. Chinese Words/characters Frequency
1 有 6,539
2 都 3,747
3 大 3,078
4 还 2,581
5 好 2,160
6 后 1,893
7 看 1,811
8 用 1,548
9 做 1,365
10 国家 1,023
11 重要 940
12 需要 825
13 通过 681
14 主要 650
15 方面 632
16 情况 567
17 提高 564
18 作用 508
19 变化 500

The second research question (Are more advanced learners as prone to L1 lexical transfer errors as the
less advanced ones?) was answered by comparing the error frequency in the writing of year one and year four
students. Compared with the writing of year one students (less advanced learners), fewer lexical transfer errors
were identied in the writing of year four students (more advanced learners).
Figure 2. Error frequency in writing by year one and year four students
Figure 2 compares the frequency of lexical errors in the writing of rst year and nal year students respectively
in terms of the three transfer categories on a 1,000-word basis. Specically, in the papers written by rst year
students, in every 1,000-word unit, 0.4404 error could be attributed to Chinese word polysemy, and the number
of errors caused by the transfer of Chinese collocations was the same as the number of those caused by MWU
(0.2936 and 0.2936 respectively). However, the gures decreased in the writing of nal year students. While
0.3605 errors were caused by the transfer of Chinese word polysemy, 0.1478 were caused by Chinese collocations
and 0.1924 by Chinese MWU.
An independent sample t-test was conducted using SPSS statistics software. The difference in the frequency
of lexical transfer from Chinese at all levels was not statistically signicant, t (56.609) = 1.788, p = .079. The
corresponding medium effect size r = .231 suggests however that the additional years of tertiary studies that year
four students had over year one students might have had a moderate bearing on lexical transfer error decline.
Qualitative Research Results

As presented in the Results of Quantitative Research, among the three error types, those caused by Chinese
words polysemy account for exactly one half of all lexical transfer errors identied in the corpus. Here are some
examples based on the polysemous verb ‘ 看(kan),’ which with the frequency of 1,811 in the ToRCH Corpus is
one of the ten most frequent Chinese words and has several translation equivalents in English: look, see, watch,
view, judge, assess, decide:
1) I would like to see ction
*I would like to read ction
我想读小说
2) see herself in the mirror
*looks at herself in the mirror
她看镜子里的自己

3) everything she looked

*everything she sees
她看到的任何一个东西
4) it does not only see whether the translation is faithful to the original text
*it does not only decide whether the translation is faithful to the original text
它不仅仅看翻译是否与源文相符合
5) She see Berry
*She visits Berry
她看望 Berry
A medium frequency word is ‘ 认识(ren shi),’ which can be used both as a verb and a noun. Its English
translation equivalents include: know, realize, acquaint oneself with, be familiar with, recognition, understanding.
It has a frequency of 360 in the ToRCH Corpus and has caused two errors in the present study.
1) have his/her own recognition of …
*[sb.] has his/her own understanding of
[某人]对…有自己的认识
2) they have the ability to realize these western goods.
* They have ability to get to know/get familiar with these western goods.
他们有能力去认识这些西方的商品。
Chinese collocations caused over one fth of the instances of lexical transfer. Three types have been
identied: verb + noun, adjective + noun and noun + noun. An example of the rst is ‘ 学习(xue xi)’ and ‘知识
(zhi shi),’ which always collocate with each other to make up ‘ 学习知识(xue xi zhi shi), whose English word-for-
word translation could be ‘learn knowledge.’ However, the correct English collocation should be ‘gain knowledge’
or ‘acquire knowledge.’ Here are two instances of this transfer type in the corpus:
Verb + Noun Collocations
1) The most important task for student in university should be learning
knowledge.
*The most important task for students in university should be gaining
knowledge.
大学生的主要任务是学习知识。
2) to learn the real practical knowledge
*to gain the practical knowledge
去学习知识
Another collocation type being discussed here is adjective + noun collocations.

Adjective + Noun Collocation
black sky
*dark sky.
天黑了(tian hei le)’
Finally, errors caused by Chinese noun + noun collocations are discussed here.
Noun + Noun Collocations
1) tool books
*reference books
工具书
2) development space
*room for improvement
发展空间

In the present study, close to one third of transfer errors were caused by Chinese MWU. Some typical
examples are discussed in this section. Many English MWU that indicate the author’s point of view were
negatively transferred from the Chinese ones. For instance:
1) standing in perspective of…
*walk in the shoes of…/ form the point of view of…/from the
perspective of …/in the perspective of…
站在…的角度
2) from this point to consider
*from this point of view
从这点考虑
3) from this point to see
*from this point of view
从这点看
Another type of transfer error found appeared to be caused by the difference in the order of language elements
in MWU of Chinese and English.
1) have some extent impact
*have impact to some extent
在某种程度上有影响
2) have some degree of in)uence
*have some in)uence to some degree
在某种程度上有影响
3) In the present day world
*In the world today/in the contemporary world
在当今世界
4) I not only can
* I can not only
我不仅能够
Discussion
The results show that Chinese word polysemy caused the most transfer errors, followed by Chinese MWU and
Chinese collocations. The frequency of transfer errors in the three categories was lower in the writing of year
four students than in the writing of year one students. This indicates that while writing in English, more
advanced learners tend to make fewer connections to the Chinese lexical network. This nding is consistent with
Kroll and Stewart’s (1994) as well as Jiang’s (2002) argument that in the minds of learners who are less procient,
L2 words are directly connected to their L1 equivalents. This nding, however, appears to counter Pienemann et
al. (2005) view that more advanced learners are more prone to L1 transfer than the less advanced ones.
Despite the fact that nal year students made fewer transfer errors, the difference in error frequency between
the two groups was not statistically signicant. As statistical signicance is dependent on sample size, a large
enough sample size would yield a small enough p value so that the desired level of statistical signicance could be
achieved. This was not the case in the current study. Therefore it is more noteworthy that the additional years of
English-medium instruction seem to have had a moderate impact on the decline in the number lexical transfer
errors.
However, the more advanced learners persisted with lexical transfer. This could be partly explained by
‘fossilization,’ which is a feature of L2 interlanguage (Yip, 1995). Moreover, as argued by Han and Selinker
(1999, pp. 248), ‘there is a greater tendency for interlanguage structures to stabilize, leading to possible
fossilization in spite of repeated pedagogical intervention.’ In addition, the nding is consistent with Jiang’s

(2000) L2 processing model, which stipulates that the transition from the L1 lemma mediation stage to the nal
stage could hardly happen due to the cessation of lexical development, or, more specically, due to fossilization.
Lexical Transfer Errors Caused by Chinese Word Polysemy

The errors caused by L1 word polysemy make up 50% of the identied lexical transfer errors. One of the
underlying reasons for this could be that in Chinese, many high-frequency words are used in different contexts to
convey similar senses and meanings without any change in word form. L2 learners are likely to assume that the
situation is the same in English, which might cause overgeneralization and hence lexical errors. Thus, for
example a high frequency Chinese verb ‘ 看(kan), whose meaning can be ‘look,’ ‘see,’ ‘watch,’ ‘read’ and ‘visit,’
was found to have caused 5 instances of lexical errors in the current study. This nding supports the study
conducted by Yang, Ma and Cao (2013), who also concluded that ‘ 看(kan)’ is a typical Chinese word that causes
negative transfer.
Based on the rest of the results, it can also be concluded that high frequency Chinese words are more
polysemous than the low frequency words, thus causing a signicant number of transfer errors. Translation as an
L2 instruction method, relying on the most common L1 lexical inventory might have been a precipitating factor.
In fact, literature (Edmunds, 2013) suggests that grammar-translation method is the preferred English teaching
approach in China. Hence, there seems to be a link between Chinese word polysemy caused transfer errors and
the excessive use of translation in L2 instruction and learning.
Although the students were not exposed to the grammar-translation method at the English-medium
university where this study was conducted, the six years of exposure to this approach in primary and secondary
school might have established the habit of linking L2 words not with the concept, but with the equivalent L1
word (Cook, 2014). Research evidence from Dodigovic (2014) shows that students tend to study words using
bilingual dictionaries and write Chinese equivalents of unknown English words on the margins. Some confessed
that when tasked with writing an essay in English they rst compose it in Chinese and then translate into English.
Thus, the neural networks established in the process (Lightbown & Spada, 2011) might have been too strong for
the new instructional context to re-wire, especially as the students remained immersed in Chinese in their daily
lives.
Lexical Transfer Errors Caused by Chinese Collocations

Chinese collocations caused 22% instances of lexical transfer and have been classied into three categories: verb
+ noun, adjective + noun and noun + noun. The rst example represents the verb + noun type. Thus, in
Chinese, ‘学习(xue xi)’ and ‘知识(zhi shi)’ always collocate with each other to make the collocation ‘ 学习知识
(xue xi zhi shi), whose English word-for-word translation could be ‘learn knowledge.’ However, the correct
English collocation should be ‘gain knowledge.’
Apart from the errors found in the present study, ‘ 学习(xue xi)’ has been found to cause other errors while
collocating with ‘技能(ji neng),’ ‘文化(wen hua),’ and ‘ 经验(jing yan),’ whose literal English translations are ‘to
learn skills,’ ‘to learn culture,’ and ‘to learn experience’ respectively (Li, 2005). However, the correct English
collocations should be ‘to master skills,’ ‘to acquire education,’ and ‘to gain experience.’
The second category of colocations exhibiting signs of transfer is the adjective + noun type. Chinese people
usually say ‘天黑了(tian hei le),’ whose literal English translation could be ‘sky has become black.’ ‘黑(hei)’ is used
to refer to the dark color of the sky. In English however, ‘sky’ is usually collocated with ‘dark.’ Similarly, when
describing a person who has blond hair, Chinese people are likely to say ‘golden hair’ rather than ‘blond hair.’
One probable explanation could be that the Chinese collocation ‘ 金发(jin fa)’ is directly transferred into English.
Another reason might be the leaners’ lack of English vocabulary knowledge or awareness of cultural difference.
The third and nal category of transfer affected collocations is the noun + noun type. For instance, instead
of saying ‘reference books,’ a learner apparently separated the two elements in the Chinese phrase ‘ 工具书(gong
ju shu)’ into ‘ 工具(gong ju)’ and ‘书(shu)’ and then literally translated them into English, making the incorrect
English collocation ‘tool books.’

Based on the discussion above, it appears that the ndings of the present study are in agreement with those
made by Yamashita and Jiang (2010). They concluded that L2 collocations which are not congruent with L1
collocations are more likely to cause negative transfer. In other words, the L2 collocations that cannot be
accurately represented through word-for-word translation from L1 would lead to transfer errors. This in turn
points to translation in the English language classroom as one of the likely precipitating factors in the case of
Chinese collocation transfer. The other one is possible lack of attention to collocations as such.
Lexical Transfer Errors Caused by Chinese MWU

In the present study, 28% transfer errors were found to have been caused by Chinese MWU. Some typical
examples are discussed in this section. Many English MWU that indicate the author’s point of view were
negatively transferred from their Chinese equivalents. For instance, the Chinese MWU ‘ 站在… 的角度(zhan
zai…de jiao du)’ was expressed as ‘stands in perspective of…’ in English, in which ‘ 站在(zhan zai)’ was literally
translated as ‘stand.’ However, compared with Chinese, which prefers the use of concrete language, English
usually uses more abstract expressions, such as ‘from the point view of…,’ ‘from the perspective of…,’ and ‘in the
perspective of….’
Another type of transfer error appeared to be caused by the difference between Chinese and English word
order within MWU. For instance, ‘have some extent impact’ is a literal translation from Chinese 在某种程度上
有影响 with the word order unchanged. In Chinese, ‘ 某种程度上(mou zhong cheng du shang),’ meaning ‘to
some extent/to some degree,’ is used to express the scale or range, and is usually put prior to the thing being
described. Thus, the appropriate English equivalent of the MWU 在某种程度上有影响 is ‘have impact to some
extent.’
In the case of Chinese MWU transfer, grammar-translation methodology again seems to be a probable
cause. This is not only likely due to the reliance on L1 translation to decipher the L2 meaning, but also to the
underlying belief that language is lexicalized grammar (Schmitt, 2000). Such a belief does not take into
consideration the lexical constraints which govern the proximity of words in connected discourse, but assumes
that any word, given that it is the required part of speech, can ll any syntactic slot (Thornbury, 2002).
Conclusions
Intrigued by the role of L1 in L2 vocabulary acquisition, and the paucity of corpus-based research focusing on
L1 lexical transfer in Chinese contexts, the present study attempted to explore the lexical transfer errors caused
by 1) Chinese word polysemy, 2) Chinese collocations, and 3) Chinese MWU. A learner corpus containing 100
writing samples by 100 Chinese learners of English who were at the time studying at a Sino-British university in
China was compiled and manually analyzed. The results show that the majority of lexical transfer errors could
be attributed to Chinese word polysemy. Although less advanced learners made overall more lexical transfer
errors than the more advanced ones, the difference was not statistically signicant.
The fact that more advanced learners did not signicantly outperform the less advanced ones could be
explained by fossilization. Two possible underlying reasons for this were considered. The rst reason could be the
Chinese learners’ lack of adequate depth of English vocabulary knowledge, due to the lack of extensive exposure
to English and the lack of awareness of the lexical features of English vocabulary. The second reason is the over-
reliance on the Chinese conceptual network while learning English, which is exacerbated by the grammar-
translation approach to English instruction.
Implications for Teaching Practice

The ndings from the present study have several pedagogical implications in terms of L2 vocabulary teaching
and learning. First of all, the focus of English vocabulary teaching should shift from size to depth, which means
that learners should not be encouraged to merely memorize the meanings and forms of as many English words
as possible, but should be able to understand the concepts and meaning senses represented by those words, as

well as the associated registers and contexts in which they can be used. All of the above also require English
teachers to have in-depth English vocabulary knowledge.
Secondly, teachers should make the learners aware of the fact that there are no exact overlaps between
translation equivalents across languages. Moreover, in order to reduce the negative transfer from L1, the use of
bilingual dictionaries should decrease, especially for intermediate or advanced L2 learners. In contrast, the use of
monolingual learners’ English dictionaries should be encouraged since they could provide L2 learners with more
accurate and in-depth lexical knowledge, and offer them the contexts in which the words are used.
Thirdly, as argued by Ellis (2008), production could facilitate acquisition only if the learner is pushed, so
teachers should require learners to produce L2 as frequently as possible. For instance, learners should be
encouraged to try to think in English while writing English papers. In this manner, the role played by Chinese
could be reduced, thus preventing negative L1 transfer.
Fourthly, different approaches to teaching L2 lexis should be employed with learners at different levels.
Novice L2 learners most likely correspond to the initial stage in Jiang’s (2000) L2 lexical processing model. L2
learners at this stage are hardly able to establish a direct connection between the concept and the L2 word.
Instead, they connect the L2 word with its L1 translation equivalent. Therefore, Jiang (2000) suggests that an
interlingual teaching approach, namely the use of L1 translation, could be used in moderation to help the novice
L2 learners establish the forms and core senses of L2 words. However, lexical teaching strategies should change
with intermediate or advanced learners, who are already at the L1 mediation stage. In order to help intermediate
or advanced learners overcome the lexical or semantic fossilization, which leads to subordinate bilingualism, the
use of L1 equivalents should be avoided, and authentic and contextualized L2 materials should be used.
In addition, as suggested by Shalaby, Yahya and El-Komi (2009), word lists containing L2 words that are
difcult to acquire could be very helpful in L2 teaching. This is in particular the case with the multiple English
equivalents of the high-frequency polysemous Chinese words.
Similarly, since L2 collocations which are not congruent with L1 are found to cause transfer errors, lists of
English collocations that cannot be directly deduced from their L1 translation equivalents should be generated.
Furthermore, English collocations should be taught as unied wholes rather than as separate words. This is
especially important for beginners who are vulnerable to the negative in)uence of L1 collocations. Finally,
learners should be made aware of MWU, especially the ones that do not translate to English word for word.
Teachers and learners could turn to English language corpora for help concerning many aspects of
vocabulary, in particular collocation and context of use. The Compleat Lexical Tutor available at
http://www.lextutor.ca is a website enabling access to several corpora and analytical tools, which could be
successfully used for this purpose.
Limitations and Desiderata for Further Research

Although the present study has yielded ndings that have important pedagogical implications, it is not without
limitations with regard to its methodology. The rst limitation concerns the learner corpus used in this study. The
100 texts contained in the learner corpus might lack representativeness, as they were written by the students from
the same department at only one university. Therefore, studies that involve more learners who have different
educational backgrounds might yield more generalisable results. Although the error count was normed, the
considerable difference in word length might have still led to some potential problems. In future research,
variables such as the length and topic of texts should be constrained to increase the validity of the ndings.
Furthermore, sample size needs to be enlarged to include texts written by a larger number of learners from a
variety of backgrounds.
Finally, the present study is a cross-sectional one as it explores lexical transfer by examining data from 100
different learners at the same point in time. The lexical fossilization phenomenon discovered in the present study
might be further investigated by approaching the issue from a developmental prospective, which calls for
longitudinal studies. Considering the current dearth of studies focusing on lexical transfer, any further research in
this area would help illuminate the issues at hand.

References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native
speakers of English: A lexical bundles approach, English for speci>c purposes, 31(2), 81-92.
Bialystok, E., Craik, F. I., & Freedman, M. (2007). Bilingualism as a protection against the onset of symptoms of
dementia, Neuropsychologia, 45(2), 459-464.
Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., & Quirk, R. (1999). Longman grammar of spoken and
written English (p. 1204) Harlow: Pearson Education Limited.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge:
Biber, D. (2009). A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and
writing, International Journal of Corpus Linguistics, 14(3), 275-311.
Bogaards, P. (2001). Lexical units and the learning of foreign language vocabulary, Studies in second language
acquisition, 23(03), 321-343.
Boers, F., & Lindstromberg, S. (2009) Optimizing a lexical approach to instructed second language acquisition . Basingstoke:
Palgrave Macmillan.
Carroll, D. (2007). Psychology of language. Belmont: Thomson Higher Education.
Chan, A. Y. W. (2010). Toward a Taxonomy of Written Errors: Investigation Into the Written Errors of Hong
Kong Cantonese ESL Learners, TESOL Quarterly, 44 (2), 295 – 319.
Chen, M.-H. & and Lin, M. (2011). Factors and Analysis of Common Miscollocations of College Students in
Taiwan. Studies in English Language and Literature, 28, 57 – 72.
Cook, V. (2014). How Do Different Languages Connect in Our Minds? In Cook, V. & Singleton, D. (Eds.) Key
Topics in Second Language Acquisition. (pp. 1-16) Bristol: Multilingual Matters.
Dodigovic, M. & Wang, S. (2015). The misuse of academic English vocabulary in Chinese student writing, US-
China Foreign Language 13(5), 349-356.
Dodigovic, M. (2005). Arti>cial intelligence in second language learning: Raising error awareness . Clevedon: Multilingual
Matters.
Duan, M. & Qin, X. (2012). Collocation in English Teaching and Learning, Theory and Practice in Language Studies,
2(9), 1890-1894.
Edmunds, K. (2013). Chinese ESL Learners’ Overuse of the Denite Article: A Corpus Study, BA thesis; Emory
University.
Ellis, R. (2008). The study of second language acquisition. 2nd edition. Oxford University Press.
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text-Interdisciplinary Journal for
the Study of Discourse, 20(1), 29-62.
Folse, K. S. (2006). The Effect of Type of Written Exercise on L2 Vocabulary Retention. TESOL Quarterly: A
Journal For Teachers Of English To Speakers Of Other Languages And Of Standard English As A Second Dialect, 40(2),
273-293.
Gass, S. (2013). Second Language Acquisition: An Introductory Course (4th Edition). New York: Routledge/Taylor Francis.
Gass, S. & L. Selinker (2008). Second Language Acquisition: An Introductory Course (3rd Edition). New York:
Routledge/Taylor Francis.
Gray, B. & Biber, D. (2013). Lexical frames in academic prose and conversation. Intemational Journal of Corpus
Linguistics, 18 (1), 109-135.
Grosjean, F. (2001). The bilingual’s language modes. In Nicol, J. (Ed.). One Mind, Two Languages: Bilingual Language
Processing (pp. 1-22). Oxford: Blackwell.
Han, Z. & Odlin, T. (Eds.), Studies of Fossilization in Second Language Acquisition . Clevedon, U.K.: Multilingual
Matters, 134-156.
Han, Z., & Selinker, L. (1999). Error resistance: Towards an empirical pedagogy. Language Teaching Research, 3(3),
248-275.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation, English for speci>c purposes, 27(1), 4-
21.

Jarvis, S., & Pavlenko, A. (2008). Crosslinguistic inGuence in language and cognition. New York: Routledge.
Jiang, N. (2000). Lexical representation and development in a second language, Applied linguistics, 21(1), 47-77.
Kroll, J. F., Bobb, S. C., & Wodniecka, Z. (2006). Language selectivity is the exception, not the rule: Arguments
against a xed locus of language selection in bilingual speech, Bilingualism: Language and Cognition, 9(02), 119-
135.
Kroll, J. F. & Stewart, E. (1994). Category interference in translation and picture naming: Evidence for
asymmetric connections between bilingual memory representations, Journal of Memory and Language, 33, 149 –
174.
Karabacak, E., & Qin, J. (2013). Comparison of lexical bundles used by Turkish, Chinese, and American
university students, Procedia-Social and Behavioral Sciences, 70, 622-628.
Lennon, P. (1996). Getting ‘easy’ verbs wrong at the advanced level, IRAL-International Review of Applied Linguistics
in Language Teaching, 34(1), 23-36.
Lewis, M. (1997). Implementing the lexical approach: putting theory into practice. Hove: Language Teaching Publications.
Li, S. (2005). An Investigation into Lexical Misuses by Chinese College Students under the Negative InGuence of Their First
Language. Master’ s thesis: Zhejiang University.
Liu, Z. (2011). Negative Transfer of Chinese to College Students’ English Writing, Journal of Language Teaching and
Research, 2 (5), 1061-1068.
Llach, M. P. A. (2011). Lexical errors and accuracy in foreign language writing. Claredon: Multilingual Matters.
MacWhinney, B. (1992). Transfer and competition in second language learning, Advances in Psychology, 83, 371–
390.
Morimoto, S., & Loewen, S. (2007). A comparison of the effects of image-schema-based instruction and
translation-based instruction on the acquisition of L2 polysemous words, Language Teaching Research, 11(3),
347-372.
Nation, I. S. P. (2013). Learning vocabulary in another language. 2nd edition. Cambridge: Cambridge University Press.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins Publishing.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for
teaching, Applied linguistics, 24(2), 223-242.
Pawley, A. & Syder, F.H. (1983). Two puzzles for linguistic theory: Native-like selection and native-like )uency. In
J. Richards & R. Schmidt (Eds.) Language and Communication. (pp. 121 – 225) London: Longman.
Pienemann, M., Di Biase, B., Kawaguchi, S., & Håkansson, G. (2005). Processing constraints on L1 transfer. In J.
F. Kroll & A. M. B. de Groot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 128-153) Oxford:
Oxford University Press.
Paquot, M. (2013). Cross-linguistic in)uence and formulaic language: French EFL learners’ use of recurrent
word sequences under scrutiny, Learner Corpus Research 18(3), 391-417.
Peromingo, J. P. (2012). Corpus analysis and phraseology: Transfer of multi-word units,’ Linguistics and the Human
Sciences, 6(1-3), 321-343.
Rafee, M., Tavakoli, M. & Amirian, Z. (2011). Structural Analysis of Lexical Bundles Across Two Types of
English News Papers Edited by Native and Non-native speakers, The Modern Journal of Applied Linguistics, 3
(2), 1 – 15.
Schmitt, N. (2000) Vocabulary in language teaching. Ernst Klett Sprachen.
Schmitt, N., & Meara, P. (1997). Researching vocabulary through a word knowledge framework: Word
associations and verbal sufxes. Studies in Second Language Acquisition, 19, 17 – 36.
Shalaby, N. A, Yahya, N., & Kl-Komi, M. (2009) ‘Analysis of lexical errors in Saudi college students’
compositions,’ Ayn, Jouenal of the Saudi Association of Language and Translation, 2(3), pp. 65-92.
Schmitt, N., & Hemchua, S. (2006). An analysis of lexical errors in the English composition of Thai learners,
Prospect: an Australian journal of TESOL, 21(3), 3-25.
Thornbury, S. (2002) How to teach vocabulary. Harlow : Longman.
Wang, D. (2011). Language Transfer and the Acquisition of English Light Verb + Noun Collocations by Chinese
Learners, Chinese Journal of Applied Linguistics, 34 (2), 107 – 125.

Wray, A. (2002) Formulaic sequences and the lexicon. New York: Cambridge University Press.
Wolter, B. (2006). Lexical network structures and L2 vocabulary acquisition: The role of L1 lexical/conceptual
knowledge, Applied Linguistics, 27(4), 741-747.
Xia, L. (2013). A Corpus-Driven Investigation of Chinese English Learners’ Performance of Verb-Noun
Collocation: A Case Study of Ability, English Language Teaching, 6 (8), 119 – 124.
Yip, V. (1995). Interlanguage and learnability: from Chinese to English. Philadelphia: John Benjamins Publishing.
Yamashita, J., & Jiang, N. (2010). L1 in)uence on the acquisition of L2 collocations: Japanese ESL users and EFL
learners acquiring English collocations, TESOL Quarterly, 44(4), pp. 647-668.
Yang, L., Ma, A. P., & Cao, Y. (2013). Lexical Negative Transfer Analysis and Pedagogic Suggestions of Native
Language in Chinese EFL Writing. The proceedings of the 2013 Conference on Education Technology and Management
Science (ICETMS 2013). (pp. 669 – 672) Atlantis Press.
Zhou, S. (2010). Comparing receptive and productive academic vocabulary knowledge of Chinese EFL learners.
Asian Social Sciences, 6(10), 14–19.
About the authors

Marina Dodigovic, PhD, is a professor at the American University of Armenia. Previously, she served as the
director of both the MA TESOL program and the Research Centre for Language Technology at Xi’an Jiaotong-
Liverpool Uuniversity. Her research interests have gravitated toward vocabulary teaching, learning and
assessment.
Chengcheng Ma, MA in TESOL, is a graduate from Xi’an Jiaotong-Liverpool Uuniversity and Univesity of
Liverpool. She served as a research assistant at Research Centre for Language Technology at Xi’an Jiaotong-
Liverpool Uuniversity. Currently, she is teaching English in Kunming, PR China.
Song Jing is a graduate from Xi’an Jiaotong-Liverpool Uuniversity. He served as a research assistant on the
project titled Lexical Transfer from Chinese to English in the Writing of XJTLU Students at Xi’an Jiaotong-Liverpool
Uuniversity. At present, he is pursuing an MA degree at the Australian National University.

Helping Language Learners Get Started with Concordancing
Stephen Jeaco*
Abstract
While studies exploring the overall effectiveness of Data Driven Learning activities have been positive, learner participants
often seem to report difculties in deciding what to look up, and how to formulate appropriate queries for a search (Gabel,
2001; Sun, 2003; Yeh, Liou, & Li, 2007). The Prime Machine (Jeaco, 2015) was developed as a concordancing tool to be used
specically for looking up, comparing and exploring vocabulary and language patterns for English language teaching and
self-tutoring. The design of this concordancer took a pedagogical perspective on the corpus techniques and methods to be
used, focusing on English for Academic Purposes and including important software design principles from Computer Aided
Language Learning. The software includes a range of search support and display features which try to make the
comparison process for exploring specic words and collocations easier. This paper reports on student use of this
concordancer, drawing on log data records from mouse clicks and software features as well as questionnaire responses from
the participants. Twenty-three undergraduate students from a Sino-British university in China participated in the
evaluation. Results from logs of search support features and general use of the software are compared with questionnaire
responses from before and after the session. It is believed that The Prime Machine can be a very useful corpus tool which,
while simple to operate, provides a wealth of information for language learning.
Key words: Concordancer, Data Driven Learning, Lexical Priming, Corpus linguistics.
Introduction
This paper presents the results of an evaluation of a concordancing program which the author developed as part
of his doctoral studies (Jeaco, 2015). After presenting a brief introduction to why the software was developed,
some of the theories and studies which had an in9uence on this work will be discussed. Then the basic design of
the software will be introduced and the evaluation itself will be presented.
The Prime Machine for Language Learning

The desire to develop The Prime Machine as a new corpus tool grew out of professional experience as an English
language teacher and manager of language teachers in China. At the time when I began work on this project, I
had been interested in corpus linguistics for several years, but I had had limited success in passing on this
enthusiasm to my students or colleagues. Part of the problem was being able to nd ways to systematically
present convincing examples from corpora which learners could understand and appreciate. Another aspect of
the problem was nding ways to introduce the functions of corpus software tools without needing to explain
complicated procedures or difcult to grasp background information about the corpus linguistic theories
underpinning the results.
* Tel: + 86 51288161301; E-mail: Steve.Jeaco@xjtlu.edu.cn; HS431, Xi’an Jiaotong-Liverpool University, 111 Ren’Ai Lu,
Suzhou Industrial Park, Suzhou, P. R. China

Given the limited time available in class and a deep sense of the need to help my Chinese learners of
English develop skills to explore language themselves, one of the main reasons for developing the concordancing
tool was so that it could be an additional language resource to which my students could turn in order for them to
check the meaning and use of words as they were composing, to consider alternative wordings as they were
proof-reading and editing their own work or the work of a peer, and to explore in their own time some of the
vocabulary which they had encountered brie9y in a class session and the different contexts and environments in
which it typically occurs.
Corpora and Language Teaching

Developments in corpus linguistics over the last few decades have had a great impact on the understanding of
how language operates and how it is used, providing tools for lexicography, research and language study, and
allowing users of these tools to draw on evidence which can be found in the patterning of language choices in
texts. Learners and teachers (whether they know it or not) are using more materials based on patterns from
corpora, and the language learning dictionaries and textbooks of major publishers include patterns such as
collocations and draw on authentic examples, often with the corpora used or corpus-derived wordlists
prominently displayed in the blurb. However, the impact of corpus technology on self-study and in the
classroom has not been as great as the shift in the academic research or publishing elds (Timmis, 2003). Indeed,
it would seem that of the vast numbers of language teachers working around the world, only a relatively small
number attempt to motivate learners to use concordancers, often nding that learning to navigate the user
interfaces requires a deep understanding of linguistic jargon and that learners only experience a limited amount
of success in being able to process snippets from authentic sentences which have been decontextualised.
Nevertheless, hands-on use of corpora with language learners has been successful in a number of different
teaching situations; for a review of the use of corpora with learners see Yoon (2008) and Kennedy and Miceli
(2010). There are several reasons highlighted in the literature which explain why concordancing software can be
especially useful for learners. Data Driven Learning (DDL) is the main way that corpus linguistics tools have
been implemented in the classroom. DDL can assist learners and teachers in deciding what should be learned,
and can provide new meaning-focussed approaches to problem areas such as prepositions (Johns, 2002). The
common patterns of syntax associated with particular items of vocabulary are not typically available in
dictionaries, but can be explored through corpora (Sinclair, 1991). The concordancer can create an “ideal”
space where language learners can test their hypotheses about language use (Kettemann, 1995; cited in Meyer,
2002). If an approach is taken where the learner is seen as a “traveller” rather than a “researcher”, Bernardini
(2004) argues that concordancing tasks can be used as a means of meeting a variety of language teaching goals.
Concordancing skills can also be seen as supporting life-long learning (Mills, 1994; Kennedy, 1998).
There have been many studies into the use of corpora specically as a means for vocabulary building.
Some of the earliest tasks in the Data Driven Learning classroom were centred around comparing words with a
similar meaning or comparing different word forms. The idea of looking up two words and exploring the results
has been a mainstay in articles introducing classroom concordancing (Johns, 1991; Tsui, 2004); as well as in the
methodologies of various studies on the use of concordancers; and advice for teachers or teaching training
(Coniam, 1997). Thurstun (1996) created materials for learners using lists of concordance lines, with a view to
enabling them to recognise the common syntax of selected academic vocabulary and then use the terms for
specic writing functions. Cobb (1999) used a concordancer as a means for students to develop their own
personalised dictionaries, suggesting that new examples from a corpus could help students strengthen their
knowledge of these words.
Although practitioners and participants in such studies have reported the comparisons as being rewarding,
feedback from previous studies on the use of concordancers has also shown that learners can nd formulating
suitable queries quite challenging and knowing what to look up can be a hurdle (Gabel, 2001; Sun, 2003; Yeh, et
al., 2007). From the students' perspective, exploration using carefully selected concordance lines may seem to
take too long (Thurstun, 1996). In a recent study by Luo and Liao (2015), corpora were shown to be more

effective than online dictionaries as reference resources in error correction in writing, but participants also
showed strong attitudes regarding difculties related to the time needed, unknown words in the concordance
lines, rule induction, having cut-off sentences and having too many examples. In addition to these issues, as
Anthony (2004) argues as he presents his classroom concordancer (AntConc), software for concordance exploration
is not usually designed specically with learners in mind. It is true that AntConc goes some way towards
simplifying the interface of a concordancer, but there are still many obstacles to getting started and knowing
enough about the tools and functions in order to use them. It has been argued that effort should be put into
trying to make concordancing software better in terms of its user-friendliness and its suitability for language
learners (Horst, Cobb, & Nicolae, 2005; Krishnamurthy & Kosem, 2007).
The Prime Machine aims to make insights about language based on Hoey’s theory of Lexical Priming (2005)
accessible and rewarding. The software has been designed to provide a multitude of examples from corpus texts
and additional information about typical contextual environments. Hoey argues that priming is “the result of a
speaker encountering evidence and generalising from it” (2005, p. 185), and also considers some of the
challenges that learners of a foreign language face due to limited opportunities to encounter language data
naturally, and also due to the severe limitations of wordlists and isolated grammar rules. The Prime Machine was
developed following key principles from Second Language Acquisition. First and foremost, the concordancer
and concordancing activities are a means of leading language learners to read multiple examples from authentic
texts. The SLA principle of exposing language learners to target language in use (Krashen, 1989; Nation, 1995-
6) provides a basis for this. Another fundamental principle from SLA is that of focussed attention and noticing
(Doughty, 1991). Schmidt claims that “intake is what learners consciously notice” (1990, p. 149). A link
between concordancing activities and Laufer and Hulstijn’s involvement load hypothesis (Laufer & Hulstijn, 2001) has
also been made clear by Lee et al. (2015). Tomlinson argues the positive effects of noticing language features
within authentic texts, and the learners’ recognition of a gap in their own language use can be strengthened if
the discovery process can be one in which the language learners uncover features for themselves (Bolitho et al.,
2003; Tomlinson, 1994, 2008). It is hoped that The Prime Machine goes some way to providing a platform for
these kinds of discovery as it has been designed specically to facilitate noticing of patterns and tendencies
(Jeaco, 2017).
It is possible to evaluate a piece of software like The Prime Machine by carrying out a series of system
evaluations or by conducting a user evaluation. A user evaluation considers how well the system meets the
expectations of its users, and how performance and accuracy affect the attitudes and actions of the users, and
these can be measured through both feedback mechanisms such as questionnaires, interviews or focus groups,
and through looking at the preferences expressed in records of users’ interactions with the software. Following a
user evaluation, priorities for further development become clear as software engineers can focus on ways to build
on the more positively viewed aspects of the software, or they can look at which parts of the system were
underappreciated or neglected and use system evaluation techniques to focus on these in isolation and attempt to
improve them. As the software was designed for language learning and teaching, it is important to consider how
principles from Computer Aided Language Learning (CALL) could be applied for the evaluation. Chapelle
(2001) makes suggestions for the judgemental analysis of CALL software (p53-4), the appropriateness of task
(p59) and the empirical evaluation of tasks (p68). She provides a list of six qualities as follows:
 Language learning potential
 Learner t
 Meaning focus
 Authenticity
 Impact
 Practicality
Each of these qualities should be considered when evaluating the effectiveness of a concordancing tool for
language learning. However, as Krishnamurthy and Kosem (2007) point out, it is also important for software
designers to get feedback from teachers in a pilot scheme in order to ensure teachers will want to use it. Scott’s
own re9ections on perceptions of the user-friendliness of WordSmith Tools include an important point that

teachers need to have condence in their own abilities to use software, and what it should be used for, otherwise
their fears for loss of face can be an inhibiting factor (Scott, 2008).
Overview of the Software

Like many other software applications, one of the main visual components of The Prime Machine is a set of tabs
which can be used to switch between different functions and different pages of results. Figure 1 shows the tabs
which appear at the top of the screen. A range of corpora are available , including the British National Corpus
(BNC, 2007), with sub-corpora from the BNC based on the main groupings provided by Lee (2001), corpora
constructed from the academic journals of Hindawi (2013), and other newspaper and specialist corpora.
Figure 1. The Tabs Across the Top of the Screen in The Prime Machine Concordancer.
Search Tab
The usual starting point for language learners and teachers using the software is a specic word or collocation.
The search tab provides two boxes where words or phrases can be entered. As users start to type, the corpus
which is currently selected is accessed, bringing up lists of words and collocations for complete words. If the
word or phrase entered into the system is not found in the current corpus, the user can seek additional spelling
support, or click to check whether the word or phrase exists in any of the other corpora which are loaded into
the system. The software was designed to make comparisons between two words, two word forms from the same
family, words with similar meanings, and related collocations easy to make by providing search suggestions based
on words entered, and by presenting results for two searches side-by-side on screen 1. The search tab also allows
for comparisons of the same item across two corpora, and some other tools more tailored to corpus linguistic
research.
Cards and Lines Tabs

When language learners are rst presented with concordance lines in the normal manner for corpus linguistics,
namely Key Word in Context (KWIC), it can be quite hard for them to understand exactly what each horizontal
line of disjointed text extract represents, and how they should go about trying to understand and learn from each
example. Once they get used to the KWIC display, there are of course many advantages including the way in
which lexical and grammatical patterning can be made more obvious (particularly through different sorting
mechanisms) and the way in which many examples can be viewed together. However, in The Prime Machine,
information about the source of each concordance line is made available to the user and the Lines Tab and the
Cards Tab provide different layouts of the concordance data, with the aim of making different aspects of the
contextual environment more noticeable.
One of the main differences in the presentation of concordance lines in The Prime Machine is the Cards Tab
and the card for the currently selected KWIC line. For several years, the possibility of presenting more context
to learners in a concordancer had been part of a vision I had had for helping learners become more condent
and more familiar with corpus data. In the literature, there have been many reports of students nding the
KWIC display difcult, at least at rst. While some writers have played down the importance of this, and others
have suggested it could be a benet (Stevens, 1991), since my concordancer was being built from scratch, it
seemed sensible to try to nd an alternative way to display the information. As can be seen in Figure 2, the Lines
display is similar to the KWIC display of other concordancers, but the card for the currently selected line shows

complete sentences above and below the sentence containing the node, with gentle highlighting of the line of
text which contains the node. At the top of each card, the caption shows strong collocations within the nearby
context of the node and the source type and citation is also prominently shown. The Cards Tab presents the list
of concordance lines in the form of cards, but obviously compared with the Lines tab, fewer concordance lines
are visible.
Figure 2. Example of the Lines Tab Showing the Card for the Currently Selected Concordance Line.
(Incidental Data from a Query for the Word consequences in The British National Corpus)
The Collocations Tab

Unlike some of the other features of language which have been uncovered through the approaches of corpus
linguistics, collocation is a term with which language teachers are certainly expected to be familiar, and from the
widespread use of the term in section headings and dictionary panels it is clear that students are being
encouraged to gain an understanding of it too. Other concordancers typically show collocations as lists of words
rather than complete phrases, but learners may need to see the words together for these visual representations of
the collocations to have an impact.
The default measure for collocations in The Prime Machine is based on specic ordering and proximity of
the collocates, so it is possible to present each as a complete collocation rather than isolated words. In this way,
the items in the clouds or tables on the Collocations Tab should provide a stronger impression and provide

learners with the opportunity to experience the phenomena introduced in one of Firth’s ([1951]1957)
memorable assertions: “A word in a usual collocation stares you in the face just as it is” (p. 182).
Other Tabs
Additional information about the typical environments in which the search query may be found in the corpus are
shown on the other tabs. When the user looks up a specic vocabulary item, icons indicating strong tendencies
draw attention to different aspects of its typical context. The Graphs Tab shows the proportion of concordance
lines within specic contexts, and should draw learners’ attention to a selection of features that will resonate with
language teachers and will help learners engage with the data in the concordance lines more easily, including the
use of articles and prepositions, passive voice and modal verbs. Pre-calculated summaries for words and
collocations are also provided covering a range of features from the theory of Lexical Priming. Information on
the other tabs also makes it possible for language learners and teachers using The Prime Machine to explore the
patterns of words or collocations occurring in texts or sections labelled with a wide range of metadata, and as
they occur with other words and collocations in different text categories. Finally, the Corpus Info. Tab provides
information about the currently viewed corpus and its division into text categories.
Research Questions
This paper follows a user evaluation and reports on attitudes of language learners who used the software in a
language learning activity. The following research questions are considered:
1. Can the students nd examples which they consider helpful?
2. Which kinds of information do they look at most? How many results do they look at?
3. Which of the search support features are used most frequently?
4. How do they feel about the software? Would they want to use it in the future?
Methodology
Participants
Volunteers from an English-medium university in Eastern China were invited to participate in the project
through short announcements before lectures and through the student email system. None of the students were
currently studying modules taught by the researcher. Three sessions were scheduled for the same day, and these
face-to-face sessions took place on a Saturday to avoid any con9ict with class teaching. Students were able to
indicate a preferred slot through the university’s virtual learning environment system (VLE), (Moodle version 1.9),
and an information sheet was also provided for them to review before the rst session.
Materials
The materials for the evaluation included two questionnaires, a set of instructions demonstrating various aspects
of the software, a brief user manual for the software and a set of essay question prompts. The rst questionnaire
included demographic questions as well as questions relating to the students’ own views on their use of a range
of language learning reference tools such as dictionaries, electronic dictionaries and search engines, etc.
Therefore prior to using the new software, participants were presented with a broad range of relevant study
resources available as choices in the early part of the rst questionnaire, and for the questions relating to student
habits and their attitudes regarding the best resource for several specic language learning issues, the option of
concordance lines was not in any way foregrounded. The rst questionnaire also included questions about peer
review and more general attitudes towards language study.
The second questionnaire explicitly picked up on one of the questions from the rst questionnaire and
asked students whether their view of the importance of examples had changed as a result of taking part in the
project. There were also questions about how much they used several of the main features of the software and
how useful they perceived them to be. There were also a range of questions designed to gather their views on
appropriate future uses of the software and any suggestions for improvements.

Both of these questionnaires were delivered electronically through the VLE. Examples of resources were
provided on a printed A3 sheet, so that students would not need to 9ip between screens. This had examples of
dictionary entries, popular search engines or mobile phone apps and a picture of concordance lines.
Printed instructions were given to the participants, providing step by step guidance on the overall
procedure from answering the rst questionnaire, downloading the software, working through the examples,
writing the essay, and performing the follow up tasks later. In order to make the writing task relevant to students
from a wide range of university programmes, prompts were written on a range of topics related to contentious
but non-threatening issues which had been discussed in the news, following the style of popular language
prociency examinations.
Procedure
Participants volunteering to take part in the project were required to attend a face-to-face session in one of the
university computer labs. At the beginning of each session, the information sheets and consent forms were
distributed and then students were invited to complete the rst questionnaire on the VLE. After completing the
questionnaire, the students were free to start working through the instruction sheet, download the software and
look through the user manual. When the questionnaires had been completed, the researcher worked through all
the examples using a computer attached to a data projector. The participants were free to just watch or to try
using the software themselves. At the end of the presentation, blank lined sheets were distributed to students who
preferred writing essays by hand, while others loaded Microsoft Word and started to work on their essays on the
computers. The students were then given one hour to write their essays. During this time, they were free to
consult any other resources and to make use of the software. Formal examination conditions were not enforced.
Once students had submitted their essay to the researcher, they were free to leave. Within the next two
days, individual feedback on each essay was sent to each participant. The template used by the researcher for
this feedback included some comments based on each of the four criterion from the public band descriptors for
IELTS (www.ielts.org). The feedback also included three screen shots showing sets of concordance lines related
to three words or phrases used in the essay, as well as two Microsoft Excel spreadsheet attachments showing up to
100 more of the lines for these. A table of other single items or pairs of items to compare was also given. This
feedback was then sent to each participant and he or she was invited to complete the second questionnaire online
once he or she had reviewed the feedback, making use of the software again if he or she wished.
Four students participated in a pilot study several days before the main sessions took place, and a few minor
changes were made to the procedure, the wording of some items, and some small aspects of the software’s
operation.
Logs
For research into the use of corpus tools with language learners, Pérez-Paredes, Sanchez-Tornel et al. (2011)
argue that tracking of user actions through logs is essential in order to determine actual use rather than reported
use. The Prime Machine was designed to include the capability of collecting logs of various actions triggered by
mouse or keyboard movements during the evaluation.
Table shows a summary of the kinds of actions which are logged. During formal evaluations where
participants have consented to the collection of this kind of data, logs are sent when the application is not busy
retrieving data from the server or when the application closes.

Table 1
User Actions Which Can Be Automatically Logged by the Software
Action Examples Details
Category
Search  Auto-complete for single words; Words /
Support  Auto-complete for collocations; collocation
 Suggestions for words with similar meanings; clicked
 Spelling support request;
 Request for a word or collocation to be checked in other
corpora;
 Alternative corpus selected after other corpora have been
checked for a word or collocation not found in the current
corpus.
 Use of other navigation buttons (“Back”, “Forward”, “Home” or
“Swap”).
Query  Rules for query syntax not followed; Search string
Blocked  Too few or too many words entered in a single query;
 Word or collocation not found in the currently selected corpus;
 Combination of words not found in the currently selected
corpus.
Query  Single search; Search string
 Compare mode search for two different queries;
 Compare mode search for two different corpora;
 Requests for more lines or collocation data.
Tab  Cards Tab; Number of
 Lines Tab; seconds viewed
 Collocation Tab;
 Graphs Tab;
 Tags Tab;
 Associates Tab.
Other A variety of other actions including the use of lters, access to help
screens, changes to options, changes of the main corpus and use of
various visual elements including the “Priming Dock”.
Also details such as the number of lines/cards viewed is stored.
As can be seen, a range of categories have been created, allowing the grouping of log data in terms of
search support features, actual queries, viewing of results and other features such as changes to options and
access to help.
Findings
A total of 25 students attended one of the face-to-face sessions, completing the questionnaire and submitting an
essay. All 25 participants were Chinese and came from Mainland China. The vast majority of the participants
were female, with just 3 male participants. In terms of the academic programmes from which the students came,
the most common was Financial Mathematics with 14 students, and this was followed by English and Finance (5
students), and 3 from engineering or computer science programmes, 2 from Chemistry and 1 from Economics.
The ages of the participants ranged from 18 to 22, with 3 students from Year 1, 7 students from Year 2 and 15
students from Year 3. Given the programmes represented, the gender balance and the home provinces of the

participants broadly re9ected the whole student population from which they were drawn. The participants
reported that they had studied English for between 7 and 15 years, with 19 out of 25 students having studied
English for 10 years or more.
Following the demographic questions, the rst set of questions in the questionnaire was related to the
students’ reported use of reference tools to help them with their English. As can be seen from Figure 3, by far
the most popular choice was mobile phone dictionary apps, with 21 students claiming to use these very often,
and 3 students selecting 4 out of 5 for this item. Just one student reported a lower score (2/5) tending towards
never. Interestingly, this student was the same student who indicated very often for concordance lines and one of
the four students who indicated 5/5 for English-English dictionary with Chinese translations. Following mobile
phone or electronic dictionaries, the next most popular choice was search engines. It is also clear that paper
dictionaries are disfavoured, and electronic means through mobile phone apps or search engines are clearly
favoured. As expected, the other clear nding was that for the majority of students concordance lines are not at
all regularly used, with 72% of respondents claiming never to use them at all, and a further 20% choosing the
second lowest rating. Three of the 5 students who chose 2/5 for concordance lines did not rate any of the
resources below 2. The student who rated concordance lines 3/5, also selected neutral scores for half of the
resources and did not select 1 or 5 for anything.
Figure 3. Reported Use of Different Resources.
The next set of questions was related to which resource listed on the handout students thought would be
the most useful for ve specic kinds of language problems. Figure 4 shows the number of students who selected
each of these.

Figure 4. Judgements Given by Participants on the Best Resource for a Variety of Language Issues.
It is clear that mobile phone or electronic dictionaries were perceived to be the best choice for spelling and
meaning, while English-English dictionaries were considered best to check prepositions, collocations or to nd
examples. Interestingly, search engines were not considered the best choice by any students when checking the
meaning of words and were less popular than all three paper dictionary types and mobile phone dictionaries as a
source for examples. The only three areas where search engines were considered the rst choice by 16% or more
of the students were for spelling (24%), prepositions (20%) and collocations (16%). This would suggest that
search engines are used for language purposes by the students to check spelling and co-text rather than to provide
information about meaning or examples.
Again, it is evident that concordance lines were not considered the best resource for any of these problems
by the vast majority of students. There was also an interesting mismatch between the answers to the previous
question about reported frequency of use and the resources which were considered most useful. Only three
students chose concordance lines for any of the problems, and all three of these students had reported actual use
of concordance lines as being 1 (never) or 2. The student who had rated concordance line usage so highly in the
earlier question chose the option for “Chinese-English or English-Chinese dictionary” and the option for
“Mobile phone or electronic dictionary” for all of the problems. This suggests that the student who had reported

using concordance lines very frequently was perhaps using them for other work or considered them to be a
supplementary resource rather than a key one.
Another obvious conclusion which can be drawn from these data is that the vast majority of students (16
out of 25) consider translation dictionaries or mobile phone and electronic dictionaries to be suitable resources to
check meanings. The wording of this question was “Checking a word which has several different meanings” and
it is surprising that students place condence in dictionaries which often only have a limited range of translations.
As explained earlier, after submitting the essay, students left the rst session and were sent individual
feedback within the next two days. They were then invited to complete the second questionnaire. Although 25
students took part in the face-to-face session, two students did not complete the second questionnaire.
Finding Useful Examples

In terms of reported use during different stages of the session, the results were fairly evenly spread. Figure 5
shows that the “Writing”, “Checking/Editing” and “Reviewing feedback” stages were all rated as “Often” or
“Very often” by at least 13 students. The “Planning” stage, however received fewer positive responses, with only
6 students selecting “Often” or “Very often” and this was the only stage where any students reported never
making use of the software.
Figure 5. Reported Frequency of Use During Different Stages of the Writing Task.
Average ratings were 2.57 for planning, 3.87 for writing, 3.74 for checking or editing before submission and
3.74 for reviewing feedback from the teacher. The similar average scores for Writing, Checking/Editing and
Reviewing (Wilcoxon Signed Ranks Tests: Checking/Editing-Writing z=-.408, p=.683, effect size r=-.060;
Reviewing-Writing z=-.408, p=.683, effect size r=-.060; Reviewing-Checking/Editing z=-.0.37, p=.971, effect
size r=-.005) mask individual differences, however, as different students reported use of the software at different
levels. Only three students rated these three areas equally.

However, it is hard to nd evidence of actual use of the software in the logs, which suggests that students
were either exaggerating their use of the software or reporting attitudes rather than actual use. The strength of
the results is somewhat weakened if the question is interpreted as being representative of attitudes, but the varied
results do suggest that different students feel that the software would be useful for different stages of the writing
process.
Figure 6. Evaluation of the Usefulness of Some of the Main Features of the Software
From the graph in Figure 6, it is clear that students rated both the cards and lines tabs quite positively, with
approximately 74% of those who answered the second questionnaire choosing Useful or Very Useful. It is worth
noting that although the Cards Tab seems more mixed with 2 students reporting it was not very useful, 6 of the
23 students (26%) rated the Cards Tab above the Lines Tab. Having both ways of viewing the data may cater
for different learner preferences and different uses.
The Graphs Tab2 received the least positive feedback, with a much lower average rating (Wilcoxon Signed Ranks
Tests: Graphs-Cards z=-3.456, p=.001, effect size r=-.510; Graphs-Lines z=-3.337, p=.001, effect size r=-.492;
Collocations-Graphs z=-3.072, p=.002, effect size r=-.453), however it is worth noting that 6 out of the 23
students (26%) rated it as very useful or useful. The student who rated the Graphs Tab as “Very useful” had
lower ratings for all the other features except the Cards Tab.
The Collocations Tab was generally very positive. The student who rated the Collocations Tab at 2 also
rated the Cards Tab and Graphs Tab as 2, but rated the Lines Tab as 4 (useful). Clearly, this student preferred
looking at the information in the KWIC view, but from the logs it seems that he or she did not view the tables for
collocations.
By far the most striking result from Figure 6, however, is that being able to compare results side-by-side was
rated very highly indeed.
The results of the questionnaire questions related to the frequency of use during different stages of the task
and the students’ evaluation of the usefulness of some of the main features provide evidence that the rst
research question has been positively answered: the learners reported that they could nd examples which they
considered to be helpful.

Types of Information Viewed
Table 2
Logs Showing the Number of Views and Time Spent on Different Tabs in the Software
Average number
Number views Total time of seconds
Cards Tab 160 6485 40.5
Lines Tab 113 9328 82.5
Graphs Tab 53 2479 46.8
Collocations
Tab 70 4325 61.8
Tags Tab 35 813 23.2
Associates Tab 48 6615 137.8
Table shows that the logs seem to support the views regarding the usefulness of different tabs, with Cards and
Lines having much higher event counts and generally more time being spent on Cards, Lines and Collocations.
When looking at these gures, however, it is worth noting that the Cards Tab was set as the default results tab for
all users, so this will have received a log for every search which was completed. However, looking at the number
of cards viewed for each event, the logs show that an average of 15.1 cards were viewed with a range between 1
and 65. Only 17 out of the 160 events had fewer than 10 cards marked as having been viewed. Since only a few
cards are visible unless the user scrolls down, this seems to conrm that some users viewed quite a few results on
the Cards Tab.
It is worth bearing in mind, however, that the vast majority of the events were from the sessions on
Saturday, and the time in Table should be treated with caution since it is likely that students may have left a tab
visible when stopping to listen to another part of the demonstration. The times are calculated for the whole time
that the application is “active” (in the sense of being the window with the current focus), so this kind of data is
more reliable when students are completing a task in another window rather than switching attention to a data
projector during a demonstration or working on a paper-based activity.
From the logs, only 4 students seem to have made use of the software after Saturday, and gures for use
across different tabs for later use are shown in Table 3.
Table 3
Number of Views, Time Spent and the Number of Different Users for the Results Tabs After the Main Input Session.
Number views Total time Users
Cards Tab 10 186 4
Lines Tab 9 1679 4
Graphs Tab 4 91 3
Collocations Tab 7 74 4
Tags Tab 4 22 2
Associates Tab 3 26 2
Again, it is clear that most time was spent on the Lines Tab. Although, gures for the Graphs Tab may
seem a little disappointing, it is worth noting that there were a total of 188 clicks on the priming icons on the
dock and 18 users made use of this feature to switch to the Graphs Tab.
In terms of use of the ability to compare results side by side, the logs show that a fair proportion of
searches were made like this. Of the 281 logs from 22 users, 56% of searches were for one term only, while 44%
were made in compare mode. Three users did not appear to make any queries. Using the logs for the right-

hand retrieval only, 85% of the compare mode searches were to compare different queries across the same
corpus, while 15% were comparing the same query across two different corpora.
The summary of the log data which has been provided here addresses the second research question, which
was concerned the kinds of information viewed and the number of results. It is clear that overall the students
spent most time on the Lines Tab, followed by the Cards and Collocations tabs. The logs also showed some
engagement of the students with the different kinds of information and the number of results, measured by the
number and range of events logged and the number of concordance cards viewed.
Support Features
As well as being able to compare results easily, another set of important design features were related to search
support. The third research question was to ascertain which of these search support features would be used most
frequently. A total of 54 queries from 16 users were logged as having been blocked by the software. Six of these
were related to spelling errors, 1 was because a Chinese word had been entered. Nine blocked queries contained
collocations where the incorrect format had been given (lack of spaces or additional full stops, etc.), and 20
blocked queries were because the phrase was not stored as a collocation in the system. Four queries were blocked
because it seems nothing had been entered in the search box. A further 14 queries were blocked but information
is not provided in the logs.
As well as preventing users from making queries and waiting only to discover that no results are found, the
software also included other features such as auto-complete, collocation suggestions, synonyms, other word forms
and spelling support. From the logs, auto-complete for words was used 12 times, and 9 of these were for words
or word forms which did not form part of the demonstration. Collocations were selected from the drop-down
box 9 times, 8 of which were for collocations not part of the demonstration. Spelling support was requested 5
times, but from the logs it does not seem to be the case that the student made a subsequent search using the
correct spelling. This suggests that either the spelling component was too slow or did not provide useful
suggestions, or perhaps that students were trying it out rather than actually wanting to use it to assist with their
spelling.
A quarter of the all the search queries in the concordancer were made for words or word forms not part of
the worksheet, and these were made by 13 different users. In the second questionnaire, students were asked to
report on whether or not they had looked up words or phrases not connected with their task. Eleven students
reported that they had, and 7 said that the search was useful and 4 interesting/fun, including one student who
chose both useful and interesting/fun. Just 1 student said that this was a waste of time, but it is worth noting that
overall this student was highly positive in his/her responses to the questions about the usefulness of each tab,
having rated everything 5/5 except the Graphs Tab which was still rated positively at 4/5. These results might
suggest that overall the software is likely to have potential for the kind of serendipitous learning which has been
reported in DDL and “discovery learning” activities (e.g. Bernardini, 2004).
These results provide an answer to the third research question, demonstrating that the most frequently
used search support features seem to be those which can be found on the main search screen such as the spelling
support and the auto-complete features for words and collocations.
Interest and Future Expectations

Another set of questions on the second questionnaire related to whether or not students thought corpus
examples, collocation information and the software itself would be useful for students like themselves. These
questions were framed to be Yes/No questions with a required comment box to explain their reasons. Only 2
out of the 23 students who completed the second questionnaire responded negatively to the question of whether
corpus examples were important. From their comments, it seems that both of these students were unsure of the
relevance of the corpus examples to their own language production, with one stating “we do not use those
examples very often”, and the other stating that he/she did not think it was useful for academic writing.
However, of the vast majority of students who responded positively, 6 mentioned examples, 8 mentioned usage, 4
mentioned collocations, and 2 mentioned reliability. Encouragingly, one student wrote, “the examples helped me

to think differently and get some information”, and another mentioned that corpus examples were useful because
students have little opportunity to see how native speakers express themselves.
The second question in this group related to the importance of understanding collocations. All 23 students
responded positively to this question. In the comments, 9 students mentioned the need for this kind of
information to avoid making errors or to improve accuracy, and 8 students mentioned the importance of
knowing how to use words.
The last question in this group asked students whether the software tool was useful. Out of the twenty-
three students, all but one responded positively. The student who selected “no” was one of the two students who
used the software most after the Saturday session. However, the actual comment made by this student is still
positive about the software’s usefulness; as is clear from the full response, his/her reservation is due to his/her
belief that other software packages may be able to provide similar information in a more convenient way:
“It has many many tools and looks useful, but some important usage can be replace[sic] by other APP.”
Overall, it seems that the software was received very positively, especially considering that from the results
of the rst questionnaire it is very clear that very few students had used concordancers before. All but one of the
students responded positively to the question about the usefulness of the software, and even the student who
responded negatively did so in a highly positive manner. As explained earlier, two students chose not to complete
the second questionnaire and their reasons for dropping out are not known. Neither student withdrew formally
from the project and it is likely that other pressures such as coursework deadlines and mounting pressure for the
nal examinations may have in9uenced their choice not to complete the second questionnaire. Nevertheless,
even if the non-participation of these students is interpreted as being lukewarm or negative towards the
usefulness of the software overall, the proportion of positive responses as a total of all 25 participants is still 88%.
Students who completed the second questionnaire gave a variety of reasons why they thought it was useful, with
4 mentioning being able to compare or see differences between words. Two mentioned the resources specically.
One student simply stated “It help [sic] students like a teacher”. Another student demonstrated a good
understanding of how different resources will be suitable for different occasions:
“This software may not be my Erst choice when I look up a word, because [an] electronic dictionary is much more
convenient. However, [the] function of the software is complete and I would like to use it as the complement of my Erst
choice.”
One other student mentioned that it was not so “convenient” to use; however, 4 other students commented
favourably on the “convenience” of the software. Another student focused specically on the way in which the
software can help students discover semantic associations of words writing:
“I think, it can tell us whether a word is positive or negative. This is interesting and useful!”
Other comments included positive evaluations of the software in terms of helping students to learn
effectively (1), the amount of detail (3), and its potential in helping with academic writing (3). One student also
said that it was useful for students from different “levels”.
The positive response is also evident in all of the responses to the question “In future, do you think you
would like to use software like this again?” 10 out of 23 students chose “Yes, denitely”, and the remaining 13
chose “probably”. None of the students chose “Not sure”, “Probably not”, or “Denitely not”. When asked to
select from three situations when the software should be used, 7 chose “In class with a teacher”, 16 chose “In
class for pairwork activities”, and 14 chose “Outside class independently”. Given that almost 70% of the
students thought the software was suitable for pairwork, and 2 of these students had reported that they did not
think peer activities were useful in the rst questionnaire, it seems that the software may have potential to as a
teaching tool to enhance pairwork tasks.
The positive responses to the questions about corpus examples, collocation information and the software
itself, coupled with these highly positive responses to questions about possible future uses of the software go some
way to addressing the fourth research question. However, one factor which needs to be considered in relation to
these largely positive responses is that in China there is a cultural desire to please. It is hoped that the in9uence
of this on the questionnaire responses was reduced through the precaution of not revealing who had created the

software until the debrief message was sent. Nevertheless, the results should be considered in the light of these
cultural in9uences.
Discussion
Taking these results as a starting point for the evaluation of The Prime Machine, this section will return to the 6
qualities of CALL software which were presented from Chapelle (2001).
The rst quality, “Language Learning Potential” when applied to this project might include a judgemental
analysis of the level of interactivity and the suitability of the range of target forms the software can provide. It
would seem fair to award the software highly in this area since its very design encourages students to look up
vocabulary themselves and to interact with the different tabs of data which are presented, and it also supports a
wide range of comparisons between words and collocations or between corpora. It is also clear that the software
has great potential for providing students access to a very wide range of target forms, both in terms of the level
of analysis from individual word types, to similar words and collocations, and in terms of the range of text types
from different disciplines and genres which are contained in the corpora which have so far been used. The
question of whether target forms are acquired and retained, as has been mentioned above, is still one which
needs to be explored, but the responses to the second questionnaire as presented here suggest that students were
able to identify the importance of the software in supporting language use and accuracy and as a means of
obtaining information about language.
In terms of the second quality, “Learner t”, the software would also seem to stand up very well. As a tool
for exploring words and phrases the software provides a great amount of control. The questionnaire responses
indicating how students viewed exploration of words or phrases not directly related to their essay writing also
provides evidence that the software has potential for incidental or less directed learning. To facilitate autonomy
and unsupervised exploration, one of the main aims for the design of the software was to provide more adequate
support, hints and guidance to learners, as compared with other leading concordancers. Within the context of
higher education, the software seems to have been very well received by students of different levels. The
evidence from the questionnaire on how students reported using the software, the variation in their preference for
different tabs of information and also the different views on how it could be used in future suggest that it might
cater well for different learners with different learner styles. Since students were overwhelmingly positive, but
positive about different aspects, it could be claimed that there is some empirical evidence that the software has
succeeded in this respect. Based on the positive responses from students, it would seem that the innovations in
the design of The Prime Machine alleviate some of the difculties reported in previous DDL studies using other
software. The difculties or frustration in formulating and performing search queries which was observed in
previous studies (Gabel, 2001; Sun, 2003; Yeh, et al., 2007) may have been alleviated by the search support
features. The availability of the Card view and being able to compare results side-by-side, could also explain why
there appeared to be fewer of the kinds of difculties related to time or the presentation of results reported in
other studies (Luo & Liao, 2015; Thurstun, 1996). However, clearly longer-term attitudes and measurements of
change in performance over time would need to be considered. Nevertheless, designers of other concordancing
interfaces could consider adding features like these if they wish to make their software more learner-friendly.
A focus on meaning also seems to be evident both from a judgement of the software and task, as well as
empirical evidence in the form of questionnaire responses. The high rating of the compare feature suggests that
students were interested in understanding how different words were used. The reported use of the software as
part of a writing task also provides some evidence that students could see how the software could be used to help
communicate their meaning effectively in writing, although as was mentioned earlier the logs suggest that these
attitudes were probably based on their ideas about how the software could be used, rather than based on their
actual experience using the software. Clearly, a longer study with log data matching reported views would be
desirable.
In terms of “authenticity”, the task design was highly relevant given the number of students who go on to
take language tests such as IELTS as well as tests for their EAP modules, but it lacked the authenticity of being

actually part of the degree programme itself. However, the learners clearly demonstrated a belief that the
software would be useful in the classroom or for self-study, and the overwhelmingly positive indication that they
would denitely or at least probably want to make use of the software again in the future is good evidence that
the software has to some extent met its aims as being a tool suitable for classroom or home use.
The “impact” of the software could be measured in terms of the comprehensiveness of feedback and
software logs. While the log data was a little disappointing in terms of quantity, the evaluation has demonstrated
that the level of detail which can be provided about different actions made by users of this system does have
great potential. It is certainly clear that students rated the experience of using the software as a positive
experience and in this respect the evaluation so far has been highly successful. The limited evidence of actual
use of the software, especially after the main face-to-face part of the evaluation, points to a need for further
research in order to ensure that the positive impact in terms of the perceptions of the students would also follow
through to a positive impact on longer-term use. One of the main limitations of the evaluation in terms of its
face validity was that although the participants were completing a writing task suited to their learning context,
the essay was not part of their formal studies and was administered towards the end of the semester when other
pressures such as assessed coursework and upcoming exams may have meant they were less inclined to put the
usual amount of care and attention into it. In order to encourage greater use of the software so that attitudes
would be based on more direct and prolonged exposure to the interface and results, participants could be given
opportunities to access it over a longer period. The software needs to be made available so students can access it
as and when they encounter language learning needs. Even in a shorter term study, if permission could be
gained for students to bring with them early drafts of assignments or materials from their classes, participants
would be much more likely to look up more words and phrases than when writing for an additional essay which
may not have any long or short term benets beyond general improvement of their language abilities.
Regarding the use of concordancers by language learners, the results of the rst questionnaire were
consistent with Timmis (2003) in that the participants had had very little prior exposure to direct use of
concordancing software. Given the learning background of learners in China, it would be unrealistic to expect a
sudden shift in their understanding of effective language learning processes, but the highly positive response to
the software suggests that providing students with a new way of looking at language can be very effective,
especially when supported by the kind of evidence which The Prime Machine can readily provide. Of course, a
very important consideration with any kind of teaching software is whether or not teachers will be interested and
willing to make use of it and to recommend it to their students. The design of the software was made by
drawing on my own extensive experience as a language teacher and as a manager of language teachers.
However, the importance of getting teacher input on software design (Krishnamurthy & Kosem, 2007) and
responding to teachers’ fears (Scott, 2008) should not be overlooked. Clearly, further exploration of the
perceptions of teachers and input from them will be a key to making The Prime Machine a well-used tool as well as
a useful tool for language learning.
The last quality is that of “practicality”. The fact that the evaluation ran smoothly with a single server
which was actually a desktop machine purchased in 2011 and was located outside the university local area
network suggests that the minimum requirements are reasonable. The Prime Machine has now been running on a
central server at the university for more than a year, and in the near future, this server will be accessible from
outside its host institution. For further details see www.theprimemachine.com.
Conclusion
This paper has focussed on one aspect of the evaluation of The Prime Machine. It has considered the results of the
small scale evaluation which took place over a short period of a few days, and it has also considered the scope of
this evaluation within a wider framework. Despite being somewhat limited in size and duration, the
questionnaire-based study has provided interesting insights into the acceptability of the software, face validity
and student attitudes before and after and has also provided some concrete areas for future development. While
the remaining ground drawing on frameworks from Computer Aided Language Learning for detailed evaluation

of the software as a learning and teaching tool is wide, this initial evaluation has served to demonstrate
condence that the project meets its overall aims. While there is also much scope for detailed evaluation of
specic features and mark-up processes, as well as opportunities for performance enhancement of the computer
processes behind the software, the participants’ enthusiasm suggests that the software is providing some
meaningful data and provides at least face validity for the hidden processes.
Through this small evaluation involving undergraduate students, the software has been shown to have
considerable potential as a tool for the writing process. Since this evaluation was carried out, The Prime Machine
has been developed further now includes additional tools for exploring vocabulary in terms of semantic tags and
other features. As it continues to be developed, it is believed that The Prime Machine will be a very useful corpus
tool which, while simple to operate, provides a wealth of information for English language teaching and self-
tutoring.
Notes
1
For a fuller explanation of the way these features work, for more details about the other features of the software
and for the pedagogical reasons behind the design see Jeaco (2015) and Jeaco (2017).
2
At the time of the evaluation, the label on this tab was "Primings Tab", and the questionnaire asked
respondents to comment on it using this name. However, the label was subsequently changed to "Graphs Tab"
as this better matches the purpose and scope of the tab.
References
Anthony, L. (2004). AntConc: A learner and classroom friendly, multi-platform corpus analysis toolkit . Paper
presented at the Interactive Workshop on Language e-Learning, Waseda University, Tokyo.
Bernardini, S. (2004). Corpora in the classroom: An overview and some re9ections on future developments. In J.
M. Sinclair (Ed.), How to Use Corpora in Language Teaching (pp. 15-36). Amsterdam: John Benjamins.
BNC. (2007). The British National Corpus (Version 3 BNC XML ed.): Oxford University Computing Services
on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/.
Bolitho, R., Carter, R., Hughes, R., Ivanič, R., Masuhara, H., & Tomlinson, B. (2003). Ten questions about
Language Awareness. ELT Journal, 57(3), 251-259.
Chapelle, C. (2001). Computer Applications in Second Language Acquisition: Foundations for Teaching, Testing and Research .
Cobb, T. (1999). Giving learners something to do with concordance output. Paper presented at the ITMELT '99
Conference, Hong Kong.
Coniam, D. (1997). A practical introduction to corpora in a teacher training language awareness programme.
Language Awareness, 6(4), 199-207.
Doughty, C. (1991). Second Language Instruction Does Make a Difference. Studies in Second Language Acquisition,
13(4), 431.
Firth, J. R. ([1951]1957). A synopsis of lingistic theory, 1930-1955. In F. R. Palmer (Ed.), Selected Papers of J R
Firth 1952-59 (pp. 168-205). London: Longman.
Gabel, S. (2001). Over-indulgence and under-representation in interlanguage: Re9ections on the utilization of
concordancers in self-directed foreign language learning. Computer Assisted Language Learning, 14(3-4), 269-
288.
Hindawi. (2013). Hindawi's open access full-text corpus for text mining research. Retrieved 6 November, 2013,
from http://www.hindawi.com/corpus/
Hoey, M. (2005). Lexical Priming: A New Theory of Words and Language. London: Routledge.
Horst, M., Cobb, T., & Nicolae, I. (2005). Expanding academic vocabulary with an interactive on-line database.
Language Learning & Technology, 9(2), 90-110.
Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.

Jeaco, S. (2015). The Prime Machine: a user-friendly corpus tool for English language teaching and self-tutoring
based on the Lexical Priming theory of language. Unpublished Ph.D. dissertation, University of
Liverpool. Retrieved from https://livrepository.liverpool.ac.uk/2014579/
Jeaco, S. (in press). Concordancing lexical primings. In M. Pace-Sigge & K. J. Patterson (Eds.), Lexical Priming:
Applications and advances (pp. 273-296). Amsterdam: John Benjamins.
Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. In T. Johns & P. King
(Eds.), Classroom Concordancing (Vol. 4, pp. 1-13). Birmingham: Centre for English Language Studies,
University of Birmingham.
Johns, T. (2002). Data-driven Learning: The perpetual change. In B. Kettemann, G. Marko & T. McEnery (Eds.),
Teaching and Learning by Doing Corpus Analysis (pp. 107-117). Amsterdam: Rodopi.
Kennedy, C., & Miceli, T. (2010). Corpus-assisted creative writing: Introducing intermediate Italian learners to a
corpus as a reference resource. Language Learning & Technology, 14(1), 28-44.
Kennedy, G. D. (1998). An Introduction to Corpus Linguistics. London: Longman.
Kettemann, B. (1995). On the use of concordancing in ELT. TELL&CALL, 4, 4-15.
Krashen, S. (1989). We acquire vocabulary and spelling by reading: additional evidence for the Input Hypothesis.
The Modern Language Journal, 73(iv), 440-464.
Krishnamurthy, R., & Kosem, I. (2007). Issues in creating a corpus for EAP pedagogy and research. Journal of
English for Academic Purposes, 6(4), 356-373.
Laufer, B., & Hulstijn, J. (2001). Incidental vocabulary acquisition in a second language: The construct of task-
induced involvement. Applied Linguistics, 22(1), 1-26
Lee, D. Y. W. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a
path through the BNC jungle. Language Learning and Technology, 5(3), 37-72.
Lee, J. H., Lee, H., & Sert, C. (2015). A corpus approach for autonomous teachers and learners: Implementing
an on-line concordancer on teachers’ laptops. Language Learning & Technology, 19(2), 1-15.
Luo, Q., & Liao, Y. (2015). Using Corpora for Error Correction in EFL Learners' Writing. Journal of Language
Teaching & Research, 6(6), 1333-1342.
Meyer, C. F. (2002). English Corpus Linguistics: An Introduction. Cambridge: Cambridge University Press.
Mills, J. (1994). Learner autonomy through the use of a concordancer. Paper presented at the Meeting of
EUROCALL, Karlsruhe, Germany.
Nation, I. S. P. (1995-6). Best practice in vocabulary teaching and learning. EA Journal, 3(2), 7-15.
Pérez-Paredes, P., Sanchez-Tornel, M., Alcaraz Calero, J. M., & Jimenez, P. A. (2011). Tracking learners' actual
uses of corpora: guided vs non-guided corpus consultation. Computer Assisted Language Learning, 24(3), 233-
253.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129-158.
Scott, M. (2008). Developing WordSmith. International Journal of English Studies, 8(1), 95-106.
Sinclair, J. M. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Stevens, V. (1991). Classroom concordancing: Vocabulary materials derived from relevant, authentic text. English
for SpeciEc Purposes, 10(1), 35-46.
Sun, Y.-C. (2003). Learning process, strategies and web-based concordancers: a case study. British Journal of
Educational Technology, 34, 601-613.
Thurstun, J. (1996). Teaching the vocabulary of academic English via concordances. Paper presented at the
Annual Meeting of the Teachers of English to Speakers of Other Languages, Chicago.
Timmis, I. (2003). Corpora and Materials: Towards a Working Relationship. In B. Tomlinson (Ed.), Developing
materials for language teaching (pp. 461-474). London: Continuum.
Tomlinson, B. (1994). Pragmatic awareness activities. Language Awareness, 3(3-4), 119-129.
Tomlinson, B. (2008). Language acquisition and language learning materials. In B. Tomlinson (Ed.), English
Language Learning Materials: A Critical Review (pp. 3-13). London: Bloomsbury Publishing.
Tsui, A. B. M. (2004). What teachers have always wanted to know - and how corpora can help. In J. M. Sinclair
(Ed.), How to Use Corpora in Language Teaching (pp. 39-61). Amsterdam: John Benjamins.

Varley, S. (2009). I'll just look that up in the concordancer: integrating corpus consultation into the language
learning environment. Computer Assisted Language Learning, 22(2), 133-152.
www.ielts.org. IELTS | Researchers - Band descriptors, reporting and interpretation. Retrieved 16 January, 2014,
from http://www.ielts.org/researchers/score_processing_and_reporting.aspx
Yeh, Y., Liou, H.-C., & Li, Y.-H. (2007). Online synonym materials and concordancing for EFL college writing.
Computer Assisted Language Learning, 20(2), 131-152.
Yoon, H. (2008). More than a linguistic reference: The infuence of corpus technology on L2 academic writing.
Language Learning & Technology, 12(2), 31-48.
About the Author

Dr. Stephen Jeaco is an Associate Professor at Xi'an Jiaotong-Liverpool University. He has worked in China since
1999 in the elds of EAP, linguistics and TESOL. His PhD was supervised by Professor Michael Hoey and
focused on developing a user-friendly corpus tool based on the theory of Lexical Priming.

Self-assigned Ranking of L2 Vocabulary
Heidi Brumbaugh*
Simon Fraser University, Burnaby, BC, Canada
Trude Heift*
Simon Fraser University, Burnaby, BC, Canada
Abstract
This article describes a research study that determined the depth of vocabulary knowledge of 28 intermediate ESL learners.
The study was carried out with Bricklayer, a vocabulary assessment tool for L2 English which tested the ESL learners on 72
words. Two post-tests collected evidence for concurrent validity. A semantic distance test captured incremental knowledge
for 36 words, but Bricklayer’s predictive power for this partial knowledge was weak. A standard multiple-choice test of the
remaining 36 words showed that Bricklayer predicted 61% of known words and 69% of unknown words; results were better
for words which were strongly predicted to be known or unknown. These +ndings provide promise that Bricklayer’s
assessment paradigm assists in building up models of students’ knowledge and behaviour in CALL environments.
Keywords: Computer Assisted Language Learning, vocabulary assessment, vocabulary depth, meta-cognition,
self-assessment
Introduction
It may seem intuitive, even obvious, that language learners need to know words of the target language in order to
communicate effectively. Nonetheless, Zimmerman (1997) points out that despite vocabulary’s central role in
language, over the course of the history of language teaching, vocabulary has not been emphasized. A surge of
interest in vocabulary over the past decade has shifted this focus. Nation (2013), for instance, states that “over 30
per cent of the research on vocabulary that has appeared in the last 110 years was published in the past eleven
years” (p. 5). This new body of research informs strategies for incorporating vocabulary instruction in the
language classroom including computer-assisted language learning (CALL) contexts.
At its most basic level, vocabulary knowledge involves connecting the word form (written or spoken) with
its associated meaning. Vocabulary researchers, however, have recognized that word knowledge is complex, and
thus have tried to articulate a broader structure for vocabulary knowledge (Henriksen, 1999; Nation, 1990, 2001;
Richards, 1976). These frameworks capture the idea that word knowledge is multifaceted. In addition to
knowledge about a word’s meaning, word knowledge also includes such features as associative knowledge, form
production and recognition, morphology, collocations, etc.
*Tel: (1) 831 247 1379; Fax: (+1) 866 216-8918; E-mail: heidi@vocabsystems.com; 5733 Hollister Ave. Suite 7, Goleta, CA
93117 USA
**Tel: (1) 778 782 3369; E-mail: heift@sfu.ca; Robert C. Brown Hall Building, Room 9201, 8888 University Drive,
Burnaby, BC V5A 1S6 Canada

Apart from the multifaceted nature of word knowledge, lexical knowledge is also acquired incrementally.
In fact, the idea that a learner does not progress immediately from being unfamiliar with a word to having
complete knowledge of all its meanings and usages was observed as far back as the early part of the twentieth
century (Dolch, 1927). Durso and Shore (1991) characterize this intermediate level of knowledge as partially
known words, or so-called “frontier” words (see also Shore and Kempe, 1999). Durso and Shore’s studies show
that although learners denied that the word was part of their language knowledge, they nonetheless were able to
access some semantic content about the word.
An accurate assessment of the learner’s vocabulary knowledge and stage of acquisition is especially critical
for the L2 classroom because it informs and drives instructional strategies. CALL, in particular, is well suited for
this task. Consider, for instance, that a computer could track and keep a record of whether a particular word was
mostly known, mostly unknown, or a frontier word, and construct a model (i.e., a representation) of the learner’s
vocabulary knowledge accordingly.
Such a model is called a learner model or student model and is an integral part of a computerized intelligent
tutoring system (ITS). A learner model allows an ITS to deliver individualized content for each student by
considering each learner’s behaviour and performance and tailoring instruction to their individual needs (see
Heift & Schulze, 2003). For example, words which are mostly known by the learner would not need to be
targeted for direct instruction, whereas mostly unknown words could be targeted for instruction or initial
exposure. Unknown or partially known words in the text could be targeted for hyperlink glosses.
By identifying frontier words which are in the process of being assimilated into the mental lexicon, an ITS
could target such words for what Nation (2001) calls “rich instruction,” which “involves giving elaborate attention
to a word, going beyond the immediate demands of a particular context of occurrence” (p. 95).
The following section discusses the most common vocabulary assessment tools in language instruction and
evaluates the extent to which current assessment tools can capture multi-faced and incremental word knowledge.
We also identify gaps in current vocabulary assessment techniques and introduce the CALL program Bricklayer
which presents a new paradigm for L2 vocabulary assessment. We then describe a study which we conducted
with 28 ESL learners to validate Bricklayer’s performance. After presenting the results of our study, we discuss
the merits of different types of vocabulary assessment tools and conclude with improvement suggestions for
Bricklayer.
Vocabulary Assessment
Vocabulary assessment tools can generally be classi+ed into two main types: breadth tests and depth tests.
Breadth Tests
The goal of the breadth test is to measure a learner’s overall vocabulary size. Two widespread assessment tools of
this type are the Vocabulary Levels Test (VLT) (Nation, 1983, 1990; Schmitt et al., 2001) and the Vocabulary
Size Test (VST) (Nation & Beglar, 2007). These tests rely on sampling across different frequency bands or ranges
in order to generate a comprehensive vocabulary score. The tests provide strong examples of content validity, in
that a test is typically considered to be a sample of a particular domain (Messick, 1989).
Nonetheless, breadth tests are not designed to assess speci+c vocabulary items. For example, if the Levels
Test indicates that the student knows 400 of the words at the 3,000 frequency band, there is no way of telling
which 400 words are known and which 600 words are unknown. Furthermore, as Milton and Vassiliu (2000) point
out, “learners acquire their knowledge from course books and not from frequency lists” (p. 446). The authors
researched a small corpus of three +rst-year EFL course books for Greek students and found that the vocabulary
was thematic and idiosyncratic. In addition, vocabulary at the 2,000 word range was underrepresented and
vocabulary at the 3,000 word range was overrepresented, challenging the notion that vocabulary is acquired by
students in the order suggested by frequency lists. Neither the VLT nor the VST pinpoint speci+c gaps in
vocabulary.

The Checkbox Tests

The checkbox test, also known as the Yes/No test, is a common breadth assessment tool that relies on an
examinee’s self-assessment of word knowledge. Examinees are presented with a list of words and indicate via a
checkmark which words they know. This format is also used as a breadth assessment for the Eurocentres
Vocabulary Test (EVT) (Meara, 1990), which samples from the different frequency bands and then estimates
total vocabulary size.
In order to quickly assess information about many words at once, a self-assessment tool such as the
checkbox test is arguably a good choice. For example, one version of the REAP vocabulary tutor (Rosa &
Eskenazi, 2013) uses the checkbox test to create a model of a learner’s knowledge, as described in the previous
section, in order to individualize instruction. However, there are some concerns with the validity and reliability
of the checkbox test. The assumption underlying self-assessment is that examinees know what they know. This
idea of self-assessment has been investigated empirically; for the most part, learners can accurately self-assess
their knowledge (LeBlanc & Painchaud, 1985). However, if the examinee checks a box for a word, or clicks the
“Yes” button in the case of a Yes/No format test, how does the examiner know that this is accurate? The most
common approach to verifying test response is to include “pseudowords” mixed in with the target test words
(Anderson & Freebody, 1983; as cited in Beeckmans et al., 2001). The general idea here is that pseudowords
provide a way to measure the extent to which examinees overestimate their vocabulary knowledge. The tester
can count the number of pseudowords incorrectly selected as real words (“false alarms”) and then apply a
correction formula to modify the examinee’s +nal score based on the number of real words correctly selected
(“hits”).
Unfortunately, the reliability of this technique, which has been extensively studied, varies widely. Although
some researchers have found that the checkbox test correlates highly with other vocabulary measures (Meara &
Buxton, 1987; Mochida & Harrington, 2006), others have found conLicting results. Pellicer-Sánchez and Schmitt
(2012), for instance, compared different false alarm formulas and found that the accuracy of the corrected score
depended on the number of false alarms. Beeckmans et al. (2001) likewise discovered that test scores can change
dramatically based on which correction formula is applied.
In addition to these concerns, the checkbox test does not have the ability to capture partial knowledge and
thus pedagogical interventions of words on the frontier of acquisition are not possible.
Depth Tests
There are several vocabulary assessments designed to detect the learner’s depth of word knowledge and they
differ in the types of lexical depth they measure: Webb (2005): different knowledge types; Schmitt (1998): four
different kinds of word knowledge; Meara (2009): word association knowledge; Schmitt and Meara (1997): depth
of word association as well as depth of knowledge for verbal suf+xes; Nagy, et al. (1985) and Collins-Thompson
& Callan (2007): precision of semantic meaning; Qian (2002): synonymy, polysemy, and collocational knowledge;
Schmitt (1998b) and Crossley et al. (2010): polysemy; and, Laufer and Nation (2001): Luency.
One of the more ambitious assessment instruments is the Vocabulary Knowledge Scale (VKS) (Paribakht
& Wesche, 1997; Wesche & Paribakht, 1996), which aims to measure depth of different kinds of word knowledge
via varying levels of questions. In the VKS, +rst students indicate whether they have seen a word, then whether
the meaning is known; if known, they +rst produce the word meaning, and then a sentence. Scoring is based on
how much knowledge was indicated.
The advantage to the VKS is that gradations of understanding can be captured. The downside is that it is
very time-consuming to administer; furthermore, Laufer and Goldstein (2004) point out that it does not
necessarily measure what it purports to measure. Indeed, most of the assessment instruments mentioned above
are designed for research purposes. Some are very arduous to administer (Webb’s (2005) assessment, for example,
requires ten questions for each word), making them impractible for the L2 classroom.

Bricklayer, a Vocabulary Assessment Tool

Bricklayer, the assessment tool developed by the lead author and used in the current research study, combines
elements of several existing assessments. Like the VST and other multiple-choice style assessments, it presents
learners with quizzes which measure their ability to recognize the correct meaning of a lexical form. Like the
checkbox test, Bricklayer relies primarily on learners’ self-assessment of their lexical knowledge. Like the VKS, it
measures depth of knowledge for individual words. Yet, there are also a number of signi+cant differences
between Bricklayer and existing vocabulary tools.
Bricklayer modi+es the self-assessment paradigm by addressing the validity issues of the checkbox
assessment which were surveyed in the previous section. Most importantly, Bricklayer’s goal is to rapidly measure
depth of knowledge for a large number of words by providing a task in a game environment which forces the
learner to rank a given word list in terms of the learner’s semantic knowledge for each word. Because words are
ranked along a continuum, Bricklayer is designed to capture mostly known and mostly unknown words as being
the words at the edges of the rankings. The words at the middle of the rankings are considered somewhat
known. For the purpose of our study, we consider a word mostly known if the primary meaning for a word can
be correctly recognized. A word is considered to be somewhat known if it is familiar to the learner, or if the
learner can identify the general semantic domain of the word. A word is considered to be mostly unknown if the
word is not recognized or familiar.
Bricklayer’s quiz presentation is unique in that these quizzes are only presented for a random subset of the
words. In this way, the quizzes serve as a means of verifying the accuracy of the learner’s self-report.
Furthermore, quiz results are weighted differently depending on the rank that the learner assigns the word. That
is, if the learner indicates the word is strongly known, and then incorrectly answers a quiz for that word, their
score in the game is more strongly penalized than if they rank the word as weakly known. Accordingly, the
learner is rewarded with a higher game score if they accurately represent their lexical knowledge.
In Bricklayer, the learner receives feedback and an indication of the kind of knowledge the program is
asking for in the form of these random mini-quizzes. When the learner knows that there may be a multiple-
choice quiz in which they must associate form with meaning, then it is clear that this is the type of knowledge
being elicited. This is in accordance with Eyckmans (2004), who found that different instructions affected the
reliability of the checkbox assessment.
Bricklayer also produces mini-quiz results and gamescores which verify that the learner is accurately
representing their knowledge. For example, if all of the quiz questions in a game are wrong, it is likely that the
word list for that game is too hard and that the rankings are thus not reliable. This addresses the limitations of
the use of pseudowords in the checkbox assessment, which is not always a reliable way to verify that knowledge is
being accurately represented.
Finally, Bricklayer, unlike the binary checkbox assessment, provides a mechanism for word knowledge to be
ranked. Using the Bricklayer results, the examiner can see which words are better known than others, and thus
can make inferences about which words may be only partially known.
The following section illustrates the program Low of Bricklayer from a user’s point of view.
Bricklayer’s Program Flow

Bricklayer begins by presenting the game board, a wall of blank “bricks.” To the left of the board is a word bank,
that is, a list of vocabulary items. Players are instructed and trained to “strengthen” the wall by dragging words
from the word bank onto the bricks. In order to score well on the game players should put the words they know
the best on the lowest rows. The game continues until the user +lls up the board.
Figure 1 shows an example of a board that is full of words. Notice that there are more words than there
are bricks. For this reason, some words (in this case, baby, ball, and born) are left behind on the bank.

Figure 1. Players Fill the Board by Placing Words on Each Brick
After the board is full, the game goes into mini-quiz mode. At this point, starting from the top, one random
brick per row lights up and the player is given a multiple-choice quiz for that word. For instance, Figure 2
displays the quiz for the word fat, which the user placed on the top left brick. The user must take the quiz in
order to continue. If the player picks the correct de+nition for a word on a brick, the brick becomes solid. If the
incorrect de+nition is chosen, the brick is destroyed.
Figure 2. Players Choose the De+nition of a Randomly Selected Word
Continuing on, the player is then quizzed on a random word on each of the following rows. The trick to
Bricklayer is that if the player picks the wrong de+nition, not only is the brick that the word was on destroyed,
but the bricks above the quizzed brick are also destroyed. The metaphor used in the game is that each brick
needs to be supported by the bricks below it. For instance, in Figure 3, the player has incorrectly answered a quiz
question for the brick hotel. Therefore, the player lost not only that brick but the four bricks above it.
Figure 3. Visual Feedback for an Incorrect Quiz Choice
After the player has taken one quiz per row, the game ends, and the player gets points for all the bricks left
on the board. The gamescore is presented as the percentage of all bricks remaining.

Each Bricklayer game presents the learner with a list of words. If this list has a range of words such that
some are known, some are unknown, and some are partially known, then this application has the potential to
measure not simply whether or not the player knows a word, but how well the player knows the word, at least in
comparison to all words contained in the word bank. If a player places a word on the top row and then “loses”
the word by an incorrect guess, not much is lost. Therefore, a player may “risk” putting an unknown word on the
higher rows. However, maintaining solid bricks on the lowest rows is critical to success, since one wrong guess can
knock out many bricks above. Therefore, the strategy for success in the game is for players to put the words they
think they know the best at the bottom, words they know pretty well in the middle, and words they are less
certain about near the top. For the purposes of the research study described below, participants were explicitly
instructed in this strategy.
Research Questions
In order to assess the ef+cacy of Bricklayer, we conducted a study with 28 ESL learners who were tested on 72
words (Brumbaugh, 2015). For the purpose of this article, we report the results with regards to the following two
research questions:
RQ 1: Does the learner’s behavior in the Bricklayer game provide a way to accurately predict the learner’s
knowledge for that word?
RQ 2: Does the strength of this prediction provide a measurement for the learner’s depth of knowledge for
that word?
Methodology
Participants
28 ESL learners participated in the study which took place at a mid-sized Canadian university. 1 Study
participants were recruited from the university’s English for Academic Purposes program, which is a remedial
ESL program for intermediate-level students seeking admission to the university. According to a background
questionnaire, the study participants were about evenly split by gender, ranged from 17 to 21 years old, and had
been studying English for an average of 7 years. The participants’ English language skills were from lower to
upper intermediate according to their self-reported IELTS test scores and student placement in the program. All
participants were native speakers of non-Indo-European languages: Chinese (20 participants), Vietnamese (5
participants), and Turkish (1 participant).
Materials
Aside from the ethics release form which was provided as hard copy, all remaining study and assessment
materials were presented to the participants sequentially on a web site. In addition to the background
information questionnaire, materials included an instructional video, the computer program Bricklayer with the
72 words chosen for the study, and two post-tests.2
The two post-tests tested the learners’ word knowledge for each of the 72 test items. They were divided
equally into one of two post-test categories: a standard multiple-choice test and a semantic distance test. As
discussed by Meara (1997) in the context of vocabulary acquisition, “[m]ultiple choice vocabulary tests, of the
sort typically used to assess incidental learning, may not be sensitive enough to pick up what is going on
[cumulative vocabulary acquisition]” (p. 119). For this reason, the semantic distance test was designed to measure
gradations of word knowledge.
All test questions were made up of the correct word de+nition, and three distractors. Note that the
de+nition for a given distractor was used instead of the distractor itself so that each distractor selection was the
de+nition for an actual word. The de+nitions were all drawn from the Merriam-Webster Learner’s Dictionary.

The multiple-choice test contained three distractors which were not semantically related to the correct
answer. Table 1 provides an example of the answer and distractor set for the word basket, a word in the multiple-
choice test condition.
Table 1
Sample Multiple-choice Quiz
Target
Correct answer Distractor 1 Distractor 2 Distractor 3
word
basket a container usually a covering for the a strong building a piece of cloth
made by weaving hand that has or group of with a special
together long thin separate parts for buildings where design that is used
pieces of material each +nger soldiers live as a symbol of a
nation or group
The semantic distance test included distractors of varying semantic distance from the target word (Nagy et
al., 1985). The choices for a semantic distance test contained the correct answer, for which a full score of 2 is
given, two words with a strong semantic relationship to the target (for which a partial score of 1 is given), and two
unrelated words, for which a score of 0 is given.
Table 2 provides an example of the answer and distractor set for the word straw, a word in the semantic
distance test condition. Moreover, Table 2 also shows the word on which the distractor de+nition was based,
although participants only saw the de+nitions. During Bricklayer gameplay, the same question/answer sets for
the words were used as the mini-quizzes.
Table 2
Sample Semantic Distance Quiz
Target Correct
Distractor 1 Distractor 2 Distractor 3 Distractor 4
word answer
straw the dry stems (corn) the (tractor) a (tin) a soft, (sew) to make
of wheat and seeds of the large vehicle shiny, bluish- or repair
other grain corn plant that has two white metal something
plants eaten as a large back that has many (such as a
vegetable wheels and different uses piece of
two smaller clothing) by
front wheels using a needle
and that is and thread
used to pull
farm
equipment
Data Collection
A video tutorial provided instructions to orient the participants to the Bricklayer game. The video emphasized
the strategy for scoring well. Speci+cally, it showed the player that placing the words they know the best on the
lowest row of the game board is the best strategy to minimize the risk of losing all the supported bricks due to a
missed quiz question. After two practice games, study participants then played 8 rounds of Bricklayer as part of
the research study. There were a total of 72 words, 18 on each board. Given that Bricklayer essentially forces
students to rank word knowledge, each word was presented in two different boards because their rankings may
depend on which other words are on the board. Finally, the participants took the two post-tests for all 72 words.

Findings
In order to examine whether the learner’s placement of each word predicted his or her knowledge for that word
(RQ 1), the results were modeled using two Rasch logistic regressions. First the multiple-choice test set was
modeled, then, the semantic distance test set.
In the Rasch model, independent variables are referred to as facets. In this way, the effect of each individual
item is calculated. The facets for the model reported here include all the scores that may have inLuenced the +nal
prediction for word knowledge. The most important facet is called the wordscore; this is based on the word’s +nal
position on the board as placed by the learner. Another facet is the gamescore which measures how well the learner
performed on that individual game. Which board was played (board) is included as a facet because the board
dif+culty may inLuence the prediction. Finally, learner and word are included as facets for the model because, in
item response theory, word dif+culty and learner ability each contribute a measure to the prediction (see
Brumbaugh, 2015 for a more detailed analysis of the individual scoring values used in the research study).
The dependent variable is the value of the post-test score. Half of the data, selected randomly, was used as
training data to assign weights to the facets. The other half was used for testing purposes to assess the validity of
the weights. All results here are from the test set. The resulting prediction for each observation is referred to as
the target score.
The Rasch model provides goodness-of-+t results for individual facets and so was used to provide an
analysis of individual lexical items. These results are presented in the following section.
Rasch Model for Multiple-Choice Post-Test

First, the multiple-choice post-test group was modeled; the results shown in Table 3 indicate that all independent
variables (i.e., learner, board, word, wordscore, and gamescore) exert a signi+cant effect on the model except for
the board. In the Rasch model, the chi-square values are tests of statistical signi+cance and probability. These
statistics are reported for each of the individual facets (degrees of freedom are given in brackets). Accordingly,
Table 3 indicates that, for the +xed effects chi-square, the results are signi+cant for learner (χ2 (25) = 71.3, p < .
01), word (χ2 (35) = 140.2, p < .01), wordscore (χ2 (6) = 14.7, p = .02), and gamescore (χ2 (19) = 33.7, p = .02). In
contrast, board did not have a signi+cant effect (χ2 (3) = 4.3, p = .23).
Table 3
Rasch Model Chi-Squared Statistics for Multiple-Choice Test Condition
Variable Fixed chi- Sig. Random chi- Sig.
square[df] square[df]
Learner 71.3[25] <.01* 19.2[24] .74**
Board 4.3[3] .23 1.8[2] .41**
Word 140.2[35] <.01* 28.6[34] .73**
Wordscore 14.7[6] .02* 4.3[5] .51**
Gamescore 33.7[19] .02* 12.0[18] .85**
* Fixed chi-square is signi+cant <.05 and indicates the probability that items are equal on a rating scale.:
** Random chi-square signi+cance indicates the probability that these items could have been randomly sampled
from a normal population.
The random chi-square results identify the probability that the items could have been sampled from a
normal population. The highest probability is found with gamescore (χ2 (18) = 12.0, p = .85), followed by learner
(χ2 (24) = 19.2, p = .74), word (χ2 (34) = 28.6, p = .73), wordscore (χ2 (5) = 4.3, p = .51), and board (χ2 (2) = 1.8,
p = .41).
The Rasch model can also be evaluated by means of a confusion matrix, which gives the accuracy of the
model’s predictions in percentages. Table 4 organizes the observed scores (the multiple choice post-test scores) in
rows and the model predictions in columns. Once again, the model was based on observations – each prediction

was for a single instance of a learner/board/word/wordscore/gamescore combination. There were a total of

936 observations (half of the observations were used in the training set and half in the test set).
Table 4
Confusion Matrix for Rasch Results of Multiple-Choice Test Condition
Observation Predicted 0 Predicted 1 No prediction* % Correct
0 (unknown) 351** 140 15 69.4%
1 (known) 167 261** 2 60.7%
Total 518 401 17 65.4%
*
Note. There is no prediction for values of .5.
**
Accurate predictions.
The +rst row shows the results for observed scores of 0, that is, cases in which an incorrect answer was
given on the post-test. Of these incorrect words, 351 were accurately predicted to be incorrect and 140 were
inaccurately predicted to be correct. There were 15 unknown words for which no prediction was made (see
below for a discussion), thus the incorrect results were accurately predicted 69.4% of the time. In the next row,
the words which were tested to be known are given. Of these, 167 words were inaccurately predicted to be
unknown and 261 were accurately predicted to be known. There were 2 words with no prediction. The accuracy
rate for known words was 60.7%. Overall, 518 words were predicted to be unknown, 401 were predicted to be
known, and 17 words had no prediction. The overall accuracy rate of the model is 65.4%.
It is important to understand that although these predictions are presented as binary, the Rasch model
actually generates an expected value which is between 0 and 1. In the case of the multiple-choice data, if the
expected value is lower than .5, 0 is predicted. If it is above .5, 1 is predicted. At .5, the model makes no
prediction; that is, there is an even probability that the word is known. Because this measurement is probabilistic,
expected values close to the midpoint of .5 are less certain than values further from the midpoint (Bond & Fox,
2007). Accordingly, the further away the expected value is from the midpoint, the more accurate the prediction
will be. Table 5 provides data to con+rm this assumption. It shows a set of four confusion matrices for the Rasch
multiple-choice results drawn from various ranges of expected values. In the +rst matrix, all results are modeled,
and the predictions are 65.4% accurate. In the second matrix, data from the mid 20% of predictions are
omitted, and the model is 69.1% accurate (although only 78.5% of the data are analyzed). The following two
matrices model even less data but the overall predictions are more accurate. In the third matrix, the mid 40% of
the predictions are omitted with an accuracy rate of 72.2%, and in the fourth matrix, the mid 60% of the
predictions are omitted for an accuracy rate of 75.6%.

Table 5
Confusion Matrix: Various Prediction Levels Modeled
All results
Observed Pred. 0 Pred. 1 None* % % of data Range
Correct included
0 351 140 15 69.4%
1 167 261 2 60.7%
Total 65.4% 100%*
Excluding predictions from .41 to .60

% % of data Range
Observation Predicted 0 Predicted 1 Correct included
0 304 96 76.0%
1 131 204 60.9%
Total 69.1% 78.5%

Observation Predicted 0 Predicted 1 % % of data Range
Correct included
0 228 54 80.9%
1 94 157 62.5%
Total 72.2% 56.9%

Observation Predicted 0 Predicted 1 % % of data Range
Correct included
0 158 23 87.3%
1 62 105 62.9%
Total 75.6% 37.2%
Note. *There is no prediction for values of .5.
Rasch Model for Semantic Distance Post-Test

This section gives the results of the partial credit Rasch logistic regression that was performed on the semantic
distance test data to examine whether the target scores predicted the learners’ depth of semantic knowledge for
words (RQ 2). The facets for the model reported here are the same independent variables (i.e., learner, board,
word, wordscore, gamescore) used for the Rasch model of the previous multiple choice test analysis. The
dependent variable is the value of the semantic distance post-test score. In this case, the partial credit model

developed by Masters (1982) is used, since the middle scores in the semantic distance post-test correspond to
partial knowledge.
As in the previous model, half of the data was used to train the model and the other half was used for
testing purposes; all results reported here are from the test set. In order to reduce the level of factoring in the
model, the results of the semantic distance test were binned into three groups rather than the original +ve scores,
and then converted to integers (for the purposes of the modeling software).
The chi-square tests of statistical signi+cance and probability are reported in Table 6 for each of the
individual facets (degrees of freedom are given in brackets). Accordingly, Table 6 indicates that, for the +xed chi-
square, the results are signi+cant for learner (χ 2 (25) = 59.3, p =< .01), word (χ2 (35) = 161.2, p < .01), and
gamescore (χ2 (18) = 44.9, p < .01). Neither board (χ 2 (3) = 6.3, p = .10) nor wordscore (χ 2 (6) = 11.9, p = .06) had
a signi+cant effect on the model. As for the random chi-square, learner (χ2 (24) = 17.7, p = .82) and gamescore (χ2
(18) = 12.6, p = .82) have the highest probability of having been sampled from a normal population, followed by
word (χ2 (34) = 29.0, p = .71), wordscore (χ2 (5) = 3.9, p = .56), and board (χ2 (2) = 2.0, p = .36).
Table 6
Rasch Model Chi-Squared Statistics for Semantic Distance Condition
Variable Fixed Sig. Random Sig.
chi-squared[df] chi-square[df]
Learner 59.3[25] <.01* 17.7[24] .82**
Board 6.3[3] .10 2.0[2] .36**
Word 161.2[35] <.01* 29.0[34] .71**
Wordscore 11.9[6] .06 3.9[5] .56**
Gamescore 44.9[18] <.01* 12.6[18] .82**
* Fixed chi-square is signi+cant <.05 and indicates the probability that items are equal on a rating scale.
** Random chi-square signi+cance indicates the probability that these items could have been randomly sampled
from a normal population.
Table 7 shows the confusion matrix results for the semantic distance condition. In this case, if the model
reports a strong probability that the word is known, the prediction corresponds to full knowledge. A lower
probability for word knowledge corresponds to partial knowledge, and a low probability for word knowledge
corresponds to no knowledge.
Table 7
Confusion Matrix for Rasch Results for Semantic Distance Condition
Observation Predicted 0 Predicted 1 Predicted 2 % Correct
**
0 (unknown) 143 112 44 47%
1 (partial) 135 101** 87 31%
2 (known) 62 91 161** 51%
Total correct 43%
**
Accurate predictions.
As in the case of the previous confusion matrix displayed in Table 4, each row in Table 7 contains the
results for an observed score. When the score was observed to be 0 (the participant selected an unrelated
distractor), the model accurately predicted an incorrect score 143 times, inaccurately predicted partial knowledge
112 times, and full knowledge 44 times, for an accuracy rate of 47%. Words which were observed to be partially
known (the participant selected a semantically similar distractor) were inaccurately predicted to be unknown 135
times, accurately predicted to be partially known 101 times, and inaccurately predicted to be known 87 times for
an accuracy rate of 31%. Words which were observed to be known (the participant selected the correct

de+nition) were inaccurately predicted to be unknown 62 times and partially known 91 times; 161 times they
were accurately predicted to be known for an accuracy rate of 51%. The model’s overall accuracy rate was 43%.
In summary, applying the partial credit Rasch model to the semantic distance test results weakens the
predictive power (which was shown in the multiple choice test set) to approximately chance values (43%),
implying that the game does not con+dently predict the learners’ depth of semantic knowledge for words. 3
Moreover, in the Rasch analysis, which included words and learners as factors by taking into account the
dif+culty of each word as well as the ability level of each learner, the three independent variables learner, word,
and gamescore are signi+cant factors, in contrast to wordscore and board which are insigniDcant.
Discussion
In Messick’s (1989) seminal article on test validation, he emphasizes the importance of test use in validating an
assessment instrument. It is not enough to ask whether a test measures what it purports to measure, but it must
also be considered whether the results are appropriate to the particular purpose for which the test was designed.
Bricklayer was designed to generate a learner model in the context of an ITS. Thus, it is appropriate to discuss
the results of this study in that context.
Unlike a teacher (who may be attuned to the general ability level of his or her students), an ITS could
construct a detailed model of a learner’s lexical knowledge. In a vocabulary ITS, an assessment tool would be
used to seed this model. This learner model should be dynamic, adjusting instruction to the learner’s behaviours
and knowledge states as they evolve and manifest themselves during system use. Such a model is similar to the
one described by Mislevy, et al. (2002), which “refers to a piece of machinery: a set of variables in a probability
model, for accumulating evidence about students” (p. 482).
Brumbaugh (2015) compared Bricklayer’s results to a standard checkbox assessment, which has also been
used in an ITS (Rosa & Eskenazi, 2013), and found that the checkbox assessment fared slightly better overall
than the Bricklayer assessment for the multiple-choice word set when words were binned into two ( known or
unknown) categories.
However, the Bricklayer prediction model also reports the probability that a word is known. Examining the
results more deeply, Bricklayer does a better job of modeling the “edge conditions” – words which are strongly
predicted to be known or unknown. This is shown by the analysis in Table 5 which models various prediction
levels. The two assessments may therefore be better suited for different tasks. The checkbox offers a quick way to
make assessments for a lot of words thereby suggesting that the checkbox test would be useful for breadth
assessments, or evaluations which require comparisons between students. In contrast, Bricklayer is more accurate
at identifying words which are either very likely known or unknown; in an ITS environment the remaining words
at the middle ranges might be considered frontier words which merit attention. Even the fact that words in this
predictive range are just as likely to be known as unknown might turn out to be indicative of frontier knowledge.
A word on the edge of acquisition may be subject to inconsistent test results as the memory trace for the word
may be incomplete or not always accessible.
At this point, an ITS could provide additional focused tasks for these words, for example, readings, games,
quizzes, concordance exercises, and other activities. Subsequent learner behaviour such as clicking a word to look
up the meaning or correctly answering a cloze activity would then present opportunities to update the learner
model with more precise information for these words. In other words, Bricklayer’s assessment results are not
considered de+nitive, but rather one piece of data in the larger construction of a learner model.
There are some shortcomings of Bricklayer which might be addressed in order to improve its performance,
as well as possible limitations in the experimental design of the current study which may have adversely affected
the statistical results.

Levels of Word Knowledge

Attempting to map the placement of the words on the bricks onto a continuum of word knowledge was
exacerbated by the fact that the game offered seven levels of rankings and yet only three levels of knowledge
were actually captured (Brumbaugh, 2015). This is consistent with previous research (Schmitt, 1998) which tried
to capture gradations of knowledge but found that slight increases were dif+cult to measure. Future modi+cations
to this assessment tool should reconsider the number of wordscore categories available, perhaps drawing from
Horst and Meara’s (1999) matrix model of lexical growth or a modi+ed version of the VKS (Wesche &
Paribakht, 1996). It is worth noting, however, that even with only three degrees of resolution, Bricklayer provides
a mechanism for predicting knowledge that goes beyond the known/unknown dichotomy currently available in
the widely-used version of the checkbox assessment. Such a rubric would furthermore contain structural validity,
in that it is consistent with a model of word knowledge as being either known, unknown, or partial.
Computer Adaptation
Neither Bricklayer, nor any game based on this forced-choice ranking model, will ever meet its full potential until
it is able to adapt to the word knowledge of the learner. Games with word sets which are either all known or all
unknown simply do not do a good job of distinguishing knowledge. Due to the challenges of programming and
data analysis, an adaptive study was not feasible for this initial research.
In an ITS context, such computer adaptive testing techniques are used generally to target content to
individual learners (Beatty, 2010). Furthermore, a student model that keeps a record of student behavior and
performance could ultimately track not only lexical knowledge and ability levels for students, but student “+t” to
the model as well.
Better Assessment of Partial Knowledge

Bricklayer is designed to capture partial knowledge. In order to validate this construct, it was necessary to
compare Bricklayer’s results to a separate measure of partial knowledge. However, there are challenges to this
approach. Partial lexical knowledge is a complex construct which is notoriously dif+cult both to de+ne and to
measure (Schmitt, 2014). Thus, the partial-credit Rasch prediction did not predict partial semantic knowledge,
but it is impossible to tell from the results of this study whether or not Bricklayer was sensitive to different aspects
of partial knowledge, such as collocations, polysemy, or degrees of receptive/productive knowledge. Indeed,
Bricklayer produced some interesting results for polysemous items, and yet the limitations of the post-test forced
some speculation. An example of such a word is fare. The sense used in the post-test is the main dictionary
de+nition entry, to do something well or badly. However, the participants, as temporary international students and
thus frequent users of public transportation, likely had repeated exposure to the word fare in the sense of the
money paid for public transportation. Indeed, it turns out that on one of the game boards, 15 of the students (58%) put
this word on the lower two rows of the board but then provided a wrong answer on the post-test. They thought
they knew the word, but fared poorly on the test. A more thorough post-test, using a polysemous testing
instrument such as that used by Qian (2002) and Read (1993, 1995), or one-on-one interviews with the
participants, would be necessary to better interpret these results. Such an improved post-test measure might well
improve the partial-credit Rasch predictions.
Conclusion
This research study introduced Bricklayer, an assessment tool which can identify strongly known and unknown
words, and which can suggest which words might be on the frontier of acquisition. An analysis of the results also
ascertained ways in which the tool’s performance might be improved by +ne-tuning the scoring rubric and by
using computer adaptive testing techniques to customize game boards for each learner.
Bricklayer, which presents a new paradigm for L2 vocabulary assessment, connects with research on
vocabulary acquisition by providing a mechanism to capture partial word knowledge. While Bricklayer was the

primary focus of the empirical investigation, the original contributions to the vocabulary assessment +eld are not
about Bricklayer per se, but rather about some fundamental characteristics unique to Bricklayer. From this
perspective, Bricklayer is a working exemplar of a novel self-assessment paradigm.
Bricklayer essentially presents learners with the meta-cognitive task to rank a list of words according to
how well they know them. This differs in a qualitative way from typical self-assessments which force a binary
choice. Learners must consider not just whether they know a word, but how well they know it. It is possible that
this leads to a deeper level of cognitive reLection. In the Bricklayer study, participants only spent about a minute
in total on the three screens of checkbox items (n=24 words); in contrast, they spent on average 2 ½ minutes on
each game (n=18 words per game). This may indicate that they were giving more focused attention to the game
task.
There are certain drawbacks to ranking data. Primarily, if two words are equally known or equally
unknown, the ranking data are not useful. This could be mitigated in several ways in task design. For example,
participants might be presented with two or three words and then instructed to rank them in terms of
knowledge. In a computer interface, this could be achieved by dragging the words into an ordered list. Words
could be repeated in different contexts and then results subjected to an item response analysis such as Rasch.
Alternately, the participant could simply report, for example, by pressing a button on a screen, that both words
are known or both are unknown.
From a quantitative point of view, measurements derived from rankings provide a mechanism for
sensitivity to partial lexical knowledge. Implementing such a modi+cation to the standard self-assessment tools
might result in more robust results with a higher level of structural validity.
Currently, vocabulary assessment falls into two broad categories: traditional tests in which the learner must
select or give the correct answer, and checkbox self-assessments in which the test administrator must either rely
on the learner’s response or depend on pseudowords to gauge the learner’s accuracy. The assessment paradigm
on which Bricklayer is based offers a third option: random spot-checks of learners’ self-assessments. The mini-
quizzes in the game serve three important functions. Firstly, they give a way to validate the learner’s responses.
Secondly, they provide accountability to the learners – since they know the test may be coming, they have a
reason not to misrepresent their knowledge. Finally, they provide a mechanism for clarifying the expectation
about what type of word knowledge is being tested.
There are typically three uses for assessment: evaluation, instruction, and research. In a context in which a
student is being evaluated for aptitude for a given program or in which learning gains for a course are being
assessed, Bricklayer’s probabilistic results might be too subtle to accomplish the test purpose. However, in
instructional contexts, such as a classroom or ITS environment, Bricklayer’s paradigm might be well-suited to
identify frontier words which would bene+t from further, direct instruction.
Endnotes
1. Two of the participants were excluded from the +nal data analysis due to incorrect usage of the software
which may have corrupted the results.
2. It should be noted that these are post-tests in the sense that they are taken after the main part of the study
for the purposes of collecting data for concurrent validity; this study did not use a pre-test/post-test
design.
3. Interestingly, although the results could not predict partial knowledge, deeper analysis of the data showed
that Bricklayer was sensitive to this knowledge (Brumbaugh, 2015).
References
Anderson, R. C., & Freebody, P. (1983). Reading comprehension and the assessment and acquisition of word
kowledge. In B. Huston (Ed.), Advances in Reading/Language Research (Vol. 2, pp. 231–256). Greenwich, CT:
JAI Press.

Beatty, K. (2010). Teaching and researching computer-assisted language learning (2nd ed.). Harlow, England ; New York:
Longman.
Beeckmans, R., Eyckmans, J., Janssens, V., Dufranne, M., & Van de Velde, H. (2001). Examining the Yes/No
Vocabulary Test: Some Methodological Issues in Theory and Practice. Language Testing, 18(3), 235–274.
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model: Fundamental Measurement in the Human Sciences, Second
Edition (2nd ed.). Routledge.
Brumbaugh, H. (2015). Self-assigned ranking of L2 vocabulary: using the Bricklayer computer game to assess depth of word
knowledge ( D o c t o r a l d i s s e r t a t i o n , A r t s & S o c i a l S c i e n c e s : ) . R e t r i e v e d f r o m
http://summit.sfu.ca/item/15287.
Collins-Thompson, K., & Callan, J. (2007). Automatic and Human Scoring of Word De+nition Responses. In C.
L. Sidner, T. Schultz, M. Stone, & C. Zhai (Eds.), HLT-NAACL (pp. 476–483). The Association for
Computational Linguistics.
Crossley, S., Salsbury, T., & McNamara, D. (2010). The Development of Polysemy and Frequency Use in English
Second Language Speakers. Language Learning, 60(3), 573–605.
Dolch, E. W. (1927). Reading and word meanings,. Ginn and company.
Durso, F. T., & Shore, W. J. (1991). Partial knowledge of word meanings. Journal of Experimental Psychology: General,
120(2), 190–202. http://doi.org/10.1037/0096-3445.120.2.190
Eyckmans, J. (2004). Measuring receptive vocabulary size: reliability and validity of the yes/no vocabulary test for French-
speaking learners of Dutch. Utrecht: LOT.
Heift, T., & Schulze, M. (2003). Student Modeling and ab initio Language Learning. System, 31(4), 519–535.
Henriksen, B. (1999). Three Dimensions of Vocabulary Development. Studies in Second Language Acquisition, 21(2),
303–317.
Horst, M., & Meara, P. (1999). Test of a model for predicting second language lexical growth through reading.
Canadian Modern Language Review/La Revue Canadienne Des Langues Vivantes, 56(2), 308–328.
Laufer, B., & Goldstein, Z. (2004). Testing Vocabulary Knowledge: Size, Strength, and Computer Adaptiveness.
Language Learning, 54(3), 399–436.
Laufer, B., & Nation, I. S. P. (2001). Passive Vocabulary Size and Speed of Meaning Recognition: Are They
Related? EUROSLA Yearbook, 1, 7–28.
LeBlanc, R., & Painchaud, G. (1985). Self‐Assessment as a Second Language Placement Instrument. Tesol
Quarterly, 19(4), 673-687.
Masters, G. N. (1982). A Rasch Model for Partial Credit Scoring. Psychometrika, 47(2), 149–74.
Meara, P. (1990). Some notes on the Eurocentres Vocabulary Tests. In J. Tommola (Ed.), Vieraan kielen
ymmärtäminen ja tuottaminen (Foreign Language Comprehension and Production) (pp. 103–113). Turku: Suomen
Soveltavan Kielitieteen Yhdistys AFinLA.
Meara, P. (1997). Towards a new approach to modelling vocabulary acquisition. In N. Schmitt & M. McCarthy
(Eds.), Vocabulary: Description, Acquisition and Pedagogy (pp. 109–121). Cambridge, UK: Cambridge University
Press.
Meara, P. (2009). Connected Words: Word Associations and Second Language Vocabulary Acquisition. Amsterdam: John
Benjamins Pub. Co.
Meara, P., & Buxton, B. (1987). An Alternative to Multiple Choice Vocabulary Tests. Language Testing, 4(2), 142–
154.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed, pp. 13–103). Washington, D.C.:
American Council on Education.
Milton, J., & Vassiliu, P. (2000). Frequency and the lexis of low-level EFL texts. In Proceedings of the 13th Symposium
in Theoretical and Applied Linguistics, Aristotle University of Thessaloniki (pp. 444–55).
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). Design and Analysis in Task-Based Language
Assessment. Language Testing, 19(4), 477–496.

Mochida, A., & Harrington, M. (2006). The Yes/No test as a measure of receptive vocabulary knowledge.
Language Testing, 23(1), 73–98. http://doi.org/10.1191/0265532206lt321oa
Nagy, W. E., Herman, P. A., & Anderson, R. C. (1985). Learning Words from Context. Reading Research Quarterly,
20(2), 233–253.
Nation, I. S. P. (2013). Learning Vocabulary in Another Language (Second). Cambridge: Cambridge University Press.
Nation, I. S. P. (1983). Testing and teaching vocabulary. Guidelines, 5(1), 12–25.
Nation, I. S. P. (1990). Teaching and Learning Vocabulary. New York: Newbury House Publishers.
Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge University Press.
Nation, I. S. P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9–13.
Paribakht, T. S., & Wesche, M. (1997). Vocabulary enhancement activities and reading for meaning in second
language vocabulary acquisition. In J. Coady & T. N. Huckin (Eds.), Second Language Vocabulary Acquisition: A
Rationale for Pedagogy (pp. 175–200). Cambridge, U.K: Cambridge University Press.
Pellicer-Sánchez, A., & Schmitt, N. (2012). Scoring Yes–No vocabulary tests: Reaction time vs. nonword
approaches. Language Testing, 29(4), 489–509. http://doi.org/10.1177/0265532212438053
Qian, D. D. (2002). Investigating the Relationship between Vocabulary Knowledge and Academic Reading
Performance: An Assessment Perspective. Language Learning, 52(3), 513–536.
Read, J. (1993). The Development of a New Measure of L2 Vocabulary Knowledge. Language Testing, 10(3), 355–
371.
Read, J. (1995). Validating the word associates format as a measure of depth of vocabulary knowledge. In 17th
language testing research colloquium, Long Beach, CA.
Rosa, K. D., & Eskenazi, M. (2013). Self-Assessment in the REAP Tutor: Knowledge, Interest, Motivation, &
Learning. International Journal of ArtiDcial Intelligence in Education (IOS Press), 21(4), 237–253.
Richards, J. C. (1976). The Role of Vocabulary Teaching. TESOL Quarterly, 10(1), 77–89.
Schmitt, N. (1998). Tracking the Incremental Acquisition of Second Language Vocabulary: A Longitudinal
Study. Language Learning, 48(2), 281–317.
Schmitt, N. (2014), Size and Depth of Vocabulary Knowledge: What the Research Shows. Language Learning, 64:
913–951.
Schmitt, N., & Meara, P. (1997). Researching Vocabulary through a Word Knowledge Framework: Word
Associations and Verbal Suf+xes. Studies in Second Language Acquisition, 19(1), 17–36.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and Exploring the Behaviour of Two New Versions
of the Vocabulary Levels Test. Language Testing, 18(1), 55–88.
Shore, W. J., & Kempe, V. (1999). The Role of Sentence Context in Accessing Partial Knowledge of Word
Meanings. Journal of Psycholinguistic Research, 28(2), 145–163. http://doi.org/10.1023/A:1023258224980
Webb, S. (2005). Receptive and Productive Vocabulary Learning: The Effects of Reading and Writing on Word
Knowledge. Studies in Second Language Acquisition, 27(1), 33–52.
Wesche, M., & Paribakht, T. S. (1996). Assessing Second Language Vocabulary Knowledge: Depth versus
Breadth. The Canadian Modern Language Review/La Revue Canadienne Des Langues Vivantes, 53(1), 13–40.
Zimmerman, C. B. (1997). Historical trends in second language vocabulary instruction. In J. Coady & T. N.
Huckin (Eds.), Second Language Vocabulary Acquisition: A Rationale for Pedagogy (pp. 5–19). Cambridge, U.K:

About The Authors

After receiving her PhD from Simon Fraser University in 2015, Heidi Brumbaugh founded Vocabulary Systems,
Inc., where she is currently developing a suite of vocabulary learning games students can play on their
smartphones.
Trude Heift is Professor of Linguistics in the Department of Linguistics at Simon Fraser University, Canada. Her
research focuses on the design and evaluation of CALL systems with a particular interest in learner-computer
interactions and learner language. She is co-editor of Language Learning & Technology.

Recognition Vocabulary Knowledge and Intelligence

as Predictors of Academic Achievement in EFL Context
Ahmed Masrai*
King Abdulaziz Military Academy, Saudi Arabia
James Milton
Swansea University, UK
Abstract
Research has shown that general vocabulary knowledge (e.g., Milton & Treffers-Daller, 2013), academic vocabulary
knowledge (e.g., Townsend et al., 2012) and general intelligence (e.g., Laidra et al., 2007) are good predictors of academic
achievement. While the effect of these factors has mostly been examined separately, Townsend et al. (2012) have tried to
model the contribution of general and academic vocabulary to academic achievement and ,nd academic vocabulary
knowledge adds only marginally to the predictive ability of general vocabulary knowledge. This study, therefore, examines
further factors as part of a more extensive predictive model of academic performance, including L1 vocabulary knowledge,
L2 general and academic vocabulary knowledge, and intelligence (IQ) as predictors of overall academic achievement among
learners of EFL. Performance on these measures was correlated with Grade Point Average (GPA) as a measure of academic
achievement for undergraduate Arabic L1 users (N = 96). The results show positive signi,cant correlations between all the
measures and academic achievement. However, academic vocabulary knowledge shows the strongest correlation (r = .72)
suggesting that the pedagogical use of this list remains important. To further explore the data, multiple regression and factor
analyses were performed. The results show that academic and general vocabulary knowledge combined can explain about
56% of the variance in students’ GPAs. The ,ndings, thus, suggest that, in addition to L1 and L2 vocabulary size, and IQ ,
knowledge of academic vocabulary is an important factor that explains an additional variance in learners’ academic
achievement.
Keywords: academic achievement, academic vocabulary, general vocabulary, intelligence, L1 vocabulary
Introduction
Academic achievement is crucial in impacting students’ future employability and the opportunity to obtain better
jobs. It is also a major concern for higher education institutions. Thus, research which taps into modelling the
potential factors that might in=uence student academic success is worthwhile. A number of studies have
investigated factors which are thought to in=uence students’ academic success in various contexts (e.g., Laidra,
Pullmann, & Allik, 2007; Milton & Treffer-Daller, 2013; Roche & Harrington, 2013; Townsend, Filippini,
Collins, & Biancarosa, 2012). Among the factors identi,ed as being associated with learners’ overall academic
performance have been intelligence, general L2 vocabulary size, L2 academic vocabulary knowledge, and ,rst
language (L1) vocabulary size. Despite the in=uence of these factors on academic success, there is a scarcity of
studies examining their effect on achievement with native Arabic learners in the Arab world, with the exception
of two studies by Roche and Harrington (2013) and Harington and Roche (2014) who studied the effect of
*E-mail: ahmedalmasrai@hotmail.com; King Abdulaziz Military Academic, Riyadh, Saudi Arabia.

vocabulary knowledge on students’ academic success in Oman. Thus, this study is an attempt to explore the
effect of vocabulary knowledge, in L1 and L2, and intelligence on academic performance with learners from L1
Arabic context. There are currently many schools and universities in the Middle East which deliver their
programmes through the medium of English, and academic achievement is one of their main concerns. Thus,
this study was motivated by both a desire to expand our understanding of the predictors of academic
achievement in general and in the Arab world context in particular, and by the scarcity of research in L1 Arabic
users studying at higher education institutions through the medium of English in an environment where English
is not the primary language used outside the classroom.
Success when studying through a foreign language is likely to be in=uenced by a range of possible factors
and while we have some understanding of the factors through studies which investigate these individually,
examining multiple factors as part of an overall predictive model of academic performance is likely to be more
useful. Few studies have attempted to place these various factors, including vocabulary knowledge, into an overall
model for the prediction of academic success. This study, therefore, will consider incorporation of four
independent variables into a model in order to predict academic achievement of native Arabic speakers studying
through the medium of English in Saudi higher education institutions.
Vocabulary Knowledge and Academic Achievement

The concept of vocabulary knowledge is not unidimensional and this concept should be clearly de,ned,
particularly when it is referred to in vocabulary testing. Various kinds of vocabulary knowledge can be
distinguished (as in Milton & Fitzpatrick, 2014) and it is essential to be clear about what we mean by word
knowledge in any study that involves an assessment of vocabulary knowledge. In this study, the vocabulary
knowledge referred to is recognition of word form. The term vocabulary size is generally used for a measure of
how many words are known this way and, as Milton and Fitzpatrick re=ect, this measure of knowledge generally
correlates well with measures of all other aspects of vocabulary knowledge.
Vocabulary knowledge is suggested by several studies to be closely linked with various measures of English
language ability and academic achievement. Insuf,cient vocabulary knowledge can impair students’ study
success (e.g., Alderson, 2005; Daller & Phelan, 2013; Milton & Treffer-Daller, 2013; Roche & Harrington, 2013;
Saville-Troike, 1984). Milton and Treffers-Daller (2013) examined the relationship between L1 vocabulary size
and academic success with native English speakers (N = 178) at undergraduate level at British universities and
found that vocabulary size positively correlated with students’ academic attainment ( r = .477). In an earlier study,
Saville-Troike (1984) investigated the academic success of school children where English as a second language is
used as the medium of instruction. Her study concluded that “vocabulary knowledge is the single most important
area of L2 competence” (p. 199).
Research also points to the idea that moderate to high correlations exist between general vocabulary size
measures and performance in the four skills as measured by tests of academic English such as IELTS. Thus,
estimates of vocabulary knowledge correlate with scores in reading comprehension (e.g., Beglar & Hunt, 1999;
Laufer, 1992; Qian, 1999; Stæhr, 2008), with writing ability (e.g., Astika, 1993; Laufer, 1998; Stæhr, 2008), with
listening comprehension (e.g., Milton, Wade, & Hopkins, 2010; Stæhr, 2008; Zimmerman, 2004), and with oral
=uency (Milton et al., 2010; Zimmerman, 2004). The correlations are typically between 0.6 and 0.8 so, not
surprisingly, overall vocabulary size alone is often capable of explaining over 50% of variance in scores in foreign
language performance (e.g., Stæhr, 2008). This clearly indicates that vocabulary size is a major factor, if not the
major factor, in explaining differences in language performance (e.g., Milton, 2013; Stæhr, 2008).
More recently, Harington and Roche (2014) conducted a study to detect academically at-risk students in an
undergraduate level studying through the medium of English in Oman. Their ,ndings show that vocabulary size
was the best predictor of students’ performance, as measured by GPA. In the same vein, Daller and Yixin (2016)
found that vocabulary knowledge, as measured by C-test, can explain about 21% of the variance in the
international students’ academic success.

Academic Vocabulary Knowledge and Academic Achievement

The Academic Word List (AWL) (Coxhead, 2000) is widely used in preparing non-native speakers for academic
courses which are taught through the medium of English and it is thought that these words are essential for the
understanding of English academic texts (Cobb & Horst, 2004). The rationale for the signi,cance of the AWL
comes mainly from the evidence of the contribution to coverage provided by the list. The AWL is generally
thought to provide around 10% coverage of academic written texts (e.g., Chen & Ge, 2007; Cobb & Horst, 2004;
Coxhead, 2000). Together with the knowledge of the words in West’s (1953) General Service List (GSL), the
AWL provides approximately 90% coverage of academic written text (e.g., Nation, 2004). A number of research
studies have also emphasised the importance of the AWL in academic texts related to speci,c ,elds, such as
medical research (Chen & Ge, 2007), engineering (Mudraya, 2006), and applied linguistics (Chung & Nation,
2003). Other arguments in support of the AWL’s use include Nagy and Townsend (2012), who suggest that the
AWL can be very useful in identifying the words and types of words that support learners to access academic
texts, and Lesaux, Kieffer, Faller, and Kelley (2010), who propose that the AWL, or subsets of the list used as a
goal of learning, promotes a signi,cant improvement in learners’ overall vocabulary knowledge. If this small
number of words is really very important it should be no surprise if academic words included in the AWL are
particularly identi,ed in the setting of L2 programme objectives, the design of lexical syllabi, and in proposals
for a learner lexical focus in various stages of L2 learning (e.g., Laufer, 1992; Nation, 2001).
Although words in the AWL are arguably so important in the handling of academic discourse, the number
of words which the AWL and GSL combined contain (2,570 words and their derivations) is not adequate for
learners to achieve good levels of comprehension in handling academic text. 90% coverage is insuf,cient for this.
Laufer and Ravenhorst-Kalovski (2010) suggest two ,gures of coverage would be required for different levels of
=uency. They suggest a minimal ,gure of 95% coverage is needed even for adequate comprehension, which they
indicate would not satisfy most educators, and an optimal ,gure of 98% coverage is needed for signi,cantly
better comprehension associated with ‘functional independence in reading’ (p. 25). The minimal coverage ,gure
they propose requires knowledge of the most frequent 4,000 to 5,000 words in English and the optimal coverage
,gure requires knowledge of the most frequent 8,000 words. Nation (2006) likewise reports a ,gure of 8,000 to
9,000 words is required for the ideal coverage of 98% for the comprehension of written text, but a slightly
smaller ,gure, 6,000 to 7,000 words for 98% coverage and comprehension of spoken text. The conclusion of this
is that learners are likely to need far greater volumes of vocabulary than that provided by the GSL and AWL to
reach the levels of knowledge necessary for academic study. Research suggests that if these volumes of words are
attained then the AWL will be most likely be known since the AWL falls predominantly within the most frequent
5,000 words in most well-constructed frequency lists.
A new study, Masrai and Milton (forthcoming), indicates that the impact on text coverage of the AWL per
se may be overstated since its choice of the GSL as a basic vocabulary list, places underlying general vocabulary
knowledge at around the 2,000 word mark. In their analysis of AWL against BNC/COCA, Masrai and Milton
found that words from the AWL are heavily concentrated on the 3,000 most frequent words in BNC/COCA.
They argued that if Schmitt and Schmitt’s (2014) 3,000 word limit for basic vocabulary knowledge were applied
in place of GSL, then only 86 AWL words would fall outside this range. The remaining 86 words are not
disproportionately frequent in Coxhead’s (2000) lists and fall into her least frequent groups, and the additional
contribution these words will make to coverage, beyond BNC/COCA 3,000, is likely to be minimal. Masrai and
Milton further report that factor analysis suggests that a test of the AWL is, in effect, a test of vocabulary size.
Tests of AWL and of general vocabulary size are strongly correlated and appear to test the same factor. However,
while the AWL is believed to be an important contributor to attainment in academic study there are few studies
which can show that it has an impact additional to general vocabulary knowledge, or that can quantify how large
this impact is.
A study that does attempt to quantify the importance of the AWL is by Townsend et al. (2012) who
estimate both academic vocabulary knowledge and general vocabulary size in a group of school learners, and use

the scores on these measures to calculate the contribution to academic success that the two types of knowledge
can make. The contribution of scores from the two measures to academic success was calculated both
individually and combined. They conclude that academic vocabulary knowledge contributes unique variance to
achievement across disciplines even when the overall breadth of vocabulary knowledge is controlled. The
explanatory power of vocabulary size as a whole was larger than that of academic word knowledge, between
26% and 43% of variance according to discipline. However, academic word knowledge can still add an
additional 2% to 7%, depending on discipline, to this explanatory power. These ,ndings appear to suggest that
developing a reasonably large vocabulary is more effective for success but that knowledge of the AWL has some
additional and marginal in=uence on academic performance.
The ,ndings from Townsend et al.’s (2012) study is supported by results from Roche and Harrington
(2013), who attempt a variety of methodological changes to their test to understand better how vocabulary and
academic performance are linked. Their results for the impact of vocabulary size are similar to those of
Townsend et al. (2012) and in their study vocabulary size can explain about 25% of the variance in students’
GPAs.
Intelligence and Academic Achievement

The question of the relationship between IQ and academic achievement has been addressed by researchers over
many years. A number of studies have shown empirical evidence for the strong link between general cognitive
ability and academic success. In a study by Jensen (1998), academic achievement of students in high school was
found to strongly correlate with IQ , correlations ranged from .50 to .70. A similar correlation ( r = .50) between
IQ scores and students’ grade was also found by Neisser et al. (1996). In another study, Laidra, Pullmann, and
Allik (2007) investigate general intelligence with personality traits from the Five-Factor model as predictors of
academic achievement in a large sample of Estonian schoolchildren from elementary to secondary school. The
results from the study suggest that intelligence is the best single predictor of study success among the Estonian
schoolchildren.
Despite the evidence from the literature on the relationship between general intelligence and academic
achievement, over 50% of variance in students’ academic performance remains unexplained by general
intelligence alone. Rohde and Thompson (2007) point out that about 51% to 75% of the variance in academic
achievement is not accounted for by the measures of general cognitive ability per se. From this perspective, the
question is raised as to whether general intelligence, when examined along other factors, i.e., general vocabulary
knowledge and academic vocabulary knowledge, remains the best predictor of students’ academic achievement,
over and above vocabulary knowledge.
The Study
The aim of this study is to model a number of factors as part of a predictive model of academic performance
among native Arabic speakers from an undergraduate population. These factors are general vocabulary
knowledge, academic vocabulary knowledge, L1 vocabulary knowledge, and general intelligence. To examine the
effectiveness and the predictive power of these factors, individually and combined, on measure of academic
achievement, four research questions were addressed:
1. What are the levels of correlation of general vocabulary, academic vocabulary, L1 vocabulary, and IQ
with GPA?
2. What is the contribution of each of these variables to academic achievement?
3. Can general vocabulary knowledge and academic vocabulary knowledge explain a unique variance in
academic achievement?
4. Can factor analysis allow us to identify whether the vocabulary based variables are identifying separate
factors which contribute to GPA?

Method
Participants
Participants in this study were 96 undergraduate students (aged 20-22 years) from two universities in Saudi
Arabia. The students were following degree courses in Languages and Translation. The two universities where
the participants were drawn implement a very similar programme in English language and translation, so at least
in part the input factor from language classroom is controlled. The participants in both institutions were
attending levels two, three and four in a four-year degree programme when the data collection for this study took
place. Informed consent was obtained from all participants. Also, as a monolingual Arabic vocabulary size test is
administered to the participants in the study, only native Arabic speakers were included in the study. The
participants’ involvement was voluntary.
Instruments
Four measures were used to collect the required data for the current study.
1. The ,rst was a general vocabulary size test (XK-Lex; Masrai & Milton, 2012), which was used to
measure the receptive vocabulary knowledge of the participants in the most frequent 10,000 words in
English. The XK-Lex is a yes/no test of decontextualised words sampled from the ,rst ten 1000
frequency bands in English and includes non-words to control for guesswork.
2. The second was a written receptive vocabulary knowledge test (Arabic-Lex; Masrai & Milton, 2017),
which was used to estimate the participants’ L1 (Arabic) vocabulary size. This test is similar in its
construct to the XK-Lex, but was designed to measure the knowledge of the most frequent 50,000
words in Arabic.
3. The third was a newly developed receptive academic vocabulary size test (AVST; Masrai & Milton,
forthcoming) used to assess students’ academic vocabulary knowledge of the 570 words from
Coxhead’s (2000) AWL. The test is similar in its design (frequency based test) to the English and Arabic
vocabulary size tests.
4. The ,nal tool was Raven’s Standard Progressive Matrices (SPM), a non-verbal IQ test (Raven’s
matrices, 1998). We chose this version because it was developed to measure a wider range of mental
ability and to be equally used with persons of all ages, regardless of their education, background and
physical condition (Raven, Raven, & Court, 1998). The test consists of 60 problems divided into ,ve
sets (A, B, C, D, and E), each includes 12 problems. All the testing materials were delivered in pencil
and paper format and were not timed. However, each of the three vocabulary measures should not
take longer than 10-15 minutes to complete. The non-verbal IQ , on the other hand, should take about
45 minutes to ,nish.
Yes/No tests have been reported in the literature as being suitable, reliable and valid measure of breadth
vocabulary knowledge (e.g., Harrington & Carey, 2009; Milton, 2009; Mochida & Harrington, 2006; Read,
2000). It allows for the sampling of a large number of items, and is easy and economical to administer and score.
The scoring system of the three Yes/No tests used in the current study is far from being complicated. Yes
responses to real words is calculated to represent a participant’s raw score, and yes responses to non-words are
false alarms. The false alarms result in a reduction in the participant’s total score. The scoring matrix of Yes/No
tests is presented in Table 1.
Table 1
Matrix of Possible Responses in XK-Lex, Arabic Lex, and AVST, Where UPPER CASE = Correct Responses
Word Non-word
Yes HIT False alarm
No Miss CORRECT REJECTION

Procedure
Participants were tested on two consecutive days at each institution to avoid testing fatigue. After instructions
were delivered, participants were ,rst presented with the two general vocabulary size measures (Arabic-Lex and
XK-Lex) followed by the academic vocabulary size test (AVST), which was administered after a short break. On
the second day, the non-verbal IQ test was delivered to the same participants. All the testing procedures were
performed with help of volunteer lecturers at each institution.
Results
Correlation Analysis
In this study four predictor variables of academic success were used (XK-Lex as a measure of general English
vocabulary size, Arabic-Lex as a measure of Arabic vocabulary size, AVST as a measure of English academic
vocabulary knowledge, and SPM as a measure of non-verbal IQ). The descriptive statistics of the four predictor
variables are shown in Table 2.
Table 2
Descriptive Statistics for Four Variables (IQ , Arabic-Lex, XK-Lex, and AVST)
N Minimum Maximum Mean Std. Deviation
IQ 96 12 39 28.25 5.55
Arabic_Lex 96 8500 43000 30843.75 7784.03
XK_Lex 96 1100 6400 3125.00 1310.40
AVST 96 20 470 171.48 86.62
Table 3
Correlation between Variables in The Study
IQ Arabic_Lex XK_Lex AVST GPA
IQ - .501** .340** .411** .469**
Arabic_Lex - .512** .446** .590**
XK_Lex - .782** .683**
AVST - .728**
GPA -
Note. ** = Correlation is signi,cant at the 0.01 level.
In order to examine research question 1 (i.e., the relationship between the four measures in our study and
academic achievement), correlational analysis was conducted between the observed scores from the four
measures and students’ GPA. The correlation of the four predictor variables with GPA is shown in Table 3. All
predictor variables appear to correlate signi,cantly with GPA. This indicates the validity of these measures since
they had all been identi,ed on theoretical grounds to be related to academic success, although most of the
previous studies have examined their relationship with academic success individually. XK-Lex and AVST scores
show the strongest correlation with GPA, followed by Arabic-Lex and IQ. Also, the strongest correlation between
the predictor variables is reported between XK-Lex and AVST (r = .782), which may indicate that a test of
academic vocabulary (AVST) resembles very strongly a test of overall vocabulary size (XK-Lex) as suggested in
Masrai and Milton (forthcoming). The correlation matrix reported in Table 3 provides a preliminary indication
of the effect size (ES) of each independent variable on the dependent variable (GPA). However, to examine the
ES in more depth, partial Eta Squared was calculated for each predictor variable.

Analysis of variance showed large ES of L2 general and academic vocabulary knowledge on academic
achievement (F(31, 64) = 6.79, p = .001, ηp2 = .77; F(35, 60) = 63.54, p = .001, ηp2 = .97, respectively). The other
two variables (Arabic-Lex scores and IQ scores) were also found to explain some levels of ES on academic
performance, but to a lesser extent (F(27, 68) = 5.75, p = .001, ηp2 = .69; F(18, 77) = 3.31, p = .001, ηp2 = .44,
respectively). Although correlation analysis and ES measures provide insight into how well independent variables
relate to the dependent variable, a more detailed analysis is needed to gain further understanding of the
predictive power of the different predictor variables. Thus, regression analysis was performed to calculate the
explanatory power of the variables individually and combined.
Regression Analysis
Since some levels of high inter-correlations were observed between the predictor variables, multicollinearity
diagnostics were preformed prior to reporting regression analysis. Result shows no indication for multicollinearity
(all values for tolerance were > .02 and all values for VIF were < 5).
To examine research question 2 (i.e., the predictive power of scores on XK-Lex, Arabic-Lex, AVST, and
IQ measures for students’ academic achievement measured by GPA), regression analysis was performed.
Table 4
Explained Variance in The Regression Model Predicting GPA with The Four Measures Combined
Model R R2 SE
a
1 .80 .64 .49
Note. a. Predictors: (Constant), IQ , XK_Lex, Arabic_Lex, AVST.
First, a multiple regression was carried out with GPA as the dependent variable and XK-Lex, Arabic-Lex,
AVST and IQ as independent variables, using Enter method. This led to a signi,cant model (F(4, 91) = 39.675, p
< .001) which explains about 64% of the variance in students’ GPA.
However, we are also interested in the individual contribution of each predictor variable towards the
predictive power of the regression model. To examine this, multiple regressions were carried out to compute the
effect of each variable individually. The models summary is reported in Table 5.
Table 5
Explained Variance in The Regression Model Predicting GPA with Each of The Four Measures
Predictor Model R R2 SE
XK_Lex 1 .68 .47 .58
Arabic_Lex 2 .59 .35 .64
AVST 3 .73 .53 .55
IQ 4 .47 .22 .71
As shown in Table 5, each variable can explain variance in students’ success. XK-lex and AVST scores
explain the greatest variance in students’ GPA, (R2 = .47 and .53, respectively). The other variables, Arabic-Lex
and IQ , also explain substantial amounts of variance in the students’ achievement (R2 = .35 and .22,
respectively).
To further examine the explanatory power, we carried out hierarchical regression with L2 vocabulary size
measures (XK-lex and AVST) in block 1, and L1 vocabulary size measure (Arabic-Lex) and IQ in block 2. Since
XK-lex and AVST are the best predictors there is a danger the contribution of the less well correlated predictors,
Arabic-Lex and IQ , will be lost in a combined model. Dividing the factors this way allows the contribution of
these other, less well correlated variables, factors to the model, to be estimated. The result is shown in Table 6.

Table 6
Hierarchical Regression Models
Change Statistics
2 2
Model R R SE R Change F Change df1 df2 Sig. F Change
a
1 .75 .56 .53 .56 59.92 2 93 .000
b
2 .80 .64 .49 .07 9.05 2 91 .000
Note. a. Predictors: (Constant), AVST, XK_Lex; b. Predictors: (Constant), AVST, XK_Lex, IQ , Arabic_Lex.
The variables in block 1 produce a signi,cant model (F(2, 93) = 59.92, p < 001) which predicts about 56%
of the variance in GPA and this is substantial. The other variables in block 2, however, can still be shown to
contribute marginally to the predictive power of the regression model. The addition to R2 is still signi,cant (F(2,
91) = 9.05, p < 001) and these two factors appear to explain an additional 7% of the variance in GPA. These
results indicate that when general L2 vocabulary knowledge (measured with XK-Lex) and L2 academic
vocabulary knowledge (measured with AVST) are combined they can have a very strong positive effect on
learners’ performance when studying through the medium of English but that the predictive power of other
factors can still improve on this result.
To provide an answer to research question 3 (i.e., whether L2 general vocabulary knowledge can explain a
unique variance in academic success) we had to control ,rst for the scores from AVST, the academic vocabulary
knowledge test, as these two variables were highly correlated. Interpretation of this strong correlation will be
provided in the discussion section of the study. However, to measure if a unique predictive power can be
explained by L2 general vocabulary per se, a stepwise regression model was generated including the R2 change,
but with AVST scores removed from the model. The result is summarised in Table 7.
Table 7
Predictive Power of General L2 Vocabulary When Academic Vocabulary Is Controlled for
Change Statistics
2 2
Model R R SE R Change F Change df1 df2 Sig. F Change
a
1 .68 .47 .58 .47 82.35 1 94 .000
b
2 .74 .55 .54 .08 16.07 1 93 .000
c
3 .75 .57 .53 .02 4.89 1 92 .030
Note. a. Predictors: (Constant), XK_Lex; b. XK_Lex, Arabic_Lex; c. XK_Lex, Arabic_Lex, IQ.
The result in Table 7 shows a signi,cant unique contribution of L2 general vocabulary knowledge in
explaining academic success. The R2 of .47, explaining about 47% of variance, therefore, has already been
shown in Table 5. But the two other factors are able to enhance this and, combined, add a further 10% to the
explanation of variance in GPA scores.
Factor Analysis
Factor analysis was run in an attempt to provide an answer to research question 4 (i.e., examining whether
different factors can be discerned in the four sets of results). The factor analysis results are summarised in the
Scree plot in Figure 1 and the component matrix in Table 8.

Figure 1. Scree Plot from The Four Sets of Results
Table 8
Component Matrix from The Four Sets of Data
Component
1
XK_Lex .855
AVST .854
Arabic_Lex .767
IQ .679
Note. Extraction Method: Principal Component Analysis; a = 1 Components Extracted.
There appears to be only one component extracted with an Eigen value above 1 and it is concluded that
the four variables examined in this study are measuring the same construct.
Discussion
In this study, the contribution of four variables were investigated to assess their impact on students’ academic
performance measured in GPA. These variables were L2 general vocabulary knowledge, L2 academic
vocabulary knowledge, L1 vocabulary knowledge, and non-verbal IQ. While the predictive power of these
variables on academic achievement is widely reported in the literature, we investigated their power to predict
academic performance among Arabic university students incorporating the four variables in one experimental

setting. The study also aimed at ,nding out whether including academic vocabulary knowledge among other
factors will explain a unique variance and remain the greatest contributing factor towards students’ academic
success.
Research Question 1: The Relationship between the Four Measures in This Study
In answer to research question 1, all the measures show statistically signi,cant correlations with students’
academic performance, as measured by GPA (see Table 2). This ,nding is broadly in line with what is reported in
the literature (e.g., Alderson 2005; Milton & Treffers-Daller, 2013; Laidra et al., 2007; Townsend et al., 2012).
The strongest correlation (r = .728) is between L2 academic vocabulary knowledge and GPA. The correlation
between L2 general vocabulary knowledge and GPA is moderate to strong ( r = .683). The other two factors, L1
vocabulary knowledge, and non-verbal IQ , also display moderate correlations with GPA, which are less strong
than the two L2 vocabulary knowledge factors. Since all four test variables correlate moderately to strongly with
GPA, it should not be a surprise that they also correlate moderately with each other as is shown in Table 3.
There is a particularly strong correlation between L2 general vocabulary knowledge and L2 academic
vocabulary knowledge (r = .782). The way that the L2 academic vocabulary knowledge test is likely also to test
general vocabulary knowledge has already been suggested in the second sub-section of the literature review. The
L2 academic vocabulary knowledge test is based on the AWL and these words occur in general frequency lists
spread across the most frequent bands. Good correlations should therefore be expected between any test based
on the AWL and any well-formed general vocabulary size test, as is noted in Masrai and Milton (forthcoming).
Although the four measures show signi,cant correlations with GPA, multiple regression analyses were required to
quantify the effect of each measure on academic performance.
Research Question 2: The Predictive Power of The Variables

The strength of the correlations between the four test variables and GPA means each of the variables can
explain some levels of variance in students’ GPAs, and this is shown in Table 5. In line with the strength of the
correlations, L2 academic vocabulary knowledge and L2 general vocabulary knowledge explain the largest
variance in GPA scores (about 53% and 47%, respectively). The least well correlated variable, IQ , still explains a
substantial amount of variance, about 22%. These ,ndings ,t with the results of other studies reported in the
literature. The predictive power of L2 vocabulary knowledge in explaining variance in GPA scores is noted in
Townsend et al.’s (2012), Roche and Harrington (2013), and Daller and Yixin (2016). The results from this study
suggest that vocabulary has a particularly high predictive power. In Daller and Yixin’s study, for example,
vocabulary knowledge explains about 21% of variance in GPA, and in Townsend et al.’s study the general
vocabulary factor explained between 26% and 43% of variance. In this study the two L2 factors explain about
56%. The ,ndings from this study also agree with other studies with regard to the impact of IQ on academic
achievement. Rohde and Thompson (2007), for example, suggest that although IQ can in=uence academic
achievement, about 51% to 75% of the variance in academic achievement is not accounted for by the measures
of general cognitive ability per se. Thus, the current study suggests that possessing larger L2 vocabularies, both
general and academic, can have a major impact on students’ academic achievement, possibly greater in its effect
than the academic ability of the learner.
The high predictiveness of L2 language factors beyond that reported by others, for example Townsend et
al. (2012), is capable of various interpretations. It is possible that the language problems for the Arabic speaking
learners in this study are suf,ciently great in some cases to affect strongly the knowledge and understanding of
the subject matter they have, greater than in the Townsend et al.’ study. We have no baseline language knowledge
scores to allow us to compare the language levels of the learners in the two studies but if the learners in this study
were to include a tail of students with particularly weak English then this might explain the high correlations
seen here. But these differences may equally mean that the examiners in this study are placing a high emphasis
on language accuracy in the scores they give for academic achievement. GPA is not a well-de,ned construct and
variation in what marks are awarded for is bound to occur.

A multiple regression, Table 4, indicates that the four variables combined can explain 64% of the variance
in GPA scores. This combined result is greater than the two factor model investigated in Townsend et al. (2012),
which examined only L2 language knowledge factors. This suggests that the variables examined in this study
would be particularly useful in a practical setting where, for example, university and school teachers need to
anticipate which of their students are at risk of low academic performance and are in need to support in their
academic studies.
The further regression analyses carried out in this study are designed to examine the way the variables
interact with each other, to better understand in what proportions these variables combine in their interactions
with GPA. The strongest predictors among the four variables are the L2 vocabulary factors, academic vocabulary
knowledge and general vocabulary size. It has been indicated above that these two tests may be testing a single
factor and the hierarchical regression reported in Table 6 has therefore been carried out to separate out the L2
language factor from the potential contribution of the other variables in gaining good GPA scores. The results
suggest that L1 Arabic vocabulary size and IQ combined can add slightly more than 7% to the predictiveness of
the L2 language factor. The 56% of variance explained by general and academic vocabulary knowledge rises to
64% once IQ and L1 vocabulary size are added in (note there is some rounding of numbers in Table 6). The
regression analysis summarised in Table 7 separates out the contribution of IQ and L1 vocabulary size and for
this analysis scores from the L2 academic vocabulary size variable have been omitted because of their co-
linearity with general vocabulary size, and the better to examine the effect of the other factors. The results in
Table 7 suggest that both IQ and L1 vocabulary can make separate and unique contributions to the predictive
ability of the model and with this combination of variables, L1 vocabulary size appears to add some 8% to the,
47% of variance explained by L2 general vocabulary size. IQ appears to add a further 2%. This last ,gure need
not contradict the suggestion of Rohde & Thompson (2007) where between about 25% and 50% of variance in
academic achievement can be explained by IQ alone since studies in the effect of IQ rarely include in their
models the powerful effects of L2 vocabulary as measured with the sophistication of the most recent L2 tests.
Research Question 3: The Relationship between General and Academic Vocabulary Knowledge
The co-linearity of L2 vocabulary factors, L2 general vocabulary size and L2 academic vocabulary knowledge,
has been noted above and has raised the question whether academic vocabulary knowledge is capable of making
a unique contribution to the variance in GPA scores, over and above the impact of general vocabulary size. In
this study, as distinct from the Townsend et al.’s (2012), it is the L2 academic vocabulary knowledge test which is
the best individual predictor among the four variables, slightly better than L2 general vocabulary size. The
difference between the results might largely be attributed to the academic word measures used in both studies.
Townsend et al. (2012) used the academic part of the revised version of Vocabulary Levels Test (VLT) (Schmitt,
Schmitt, & Clapham, 2001) which includes only 30 words sample of the 570 AWL (Coxhead, 2000). The low
sampling rate and also the problematic sampling technique (see Schmitt et al., 2001) of this part of the VLT
might explain, in part, why the predictive power of the test scores is lower than for their general vocabulary
measure. On the other hand, the test used in the current study (AVST) is thought to produce more credible
scores, as it features a high sampling rate (1:5) and is based on frequency selection of its items (Masrai & Milton,
forthcoming).
Nonetheless, this result mirrors the ,ndings reported in Masrai and Milton (forthcoming). This suggests
that while the results produced by the L2 academic vocabulary knowledge test must include L2 general
vocabulary size (its construction using words drawn from across the most frequent general vocabulary bands
means it cannot avoid this), the two types of knowledge can nonetheless still be differentiated. Our best
interpretation of the data is that L2 general vocabulary size is crucial to academic achievement and that
academic vocabulary knowledge will add marginally to this. It appears that knowledge of the AWL speci,cally
can add an additional 7% to L2 general vocabulary knowledge in explaining variance in GPA. This conclusion is
strikingly similar to the results obtained in the studies by Townsend et al. (2012) and Masrai and Milton

(forthcoming) and similar too to other studies (e.g., Harington & Roche, 2014; Milton & Treffers-Daller, 2013;
Saville-Troike, 1984).
Research Question 4. How Many Separate Factors Can Be Identi1ed in These Variables?
One argument used in Masrai and Milton (forthcoming) to suggest that a test based on the AWL is likely to
function also as a general vocabulary size test, is that when scores for the two different tests were subjected to
factor analysis, only one component could be identi,ed leading to the conclusion that they were testing the same
construct. Figure 1 and Table 8 report the results of factor analysis with the four sets of data obtained in this
study. In line with the earlier study, the results here also suggest that the two L2 factors, L2 general vocabulary
size and L2 academic knowledge, are part of the same component. But the results in Figure 1 and Table 8 also
suggest that the other two variables investigated in the study, IQ and L1 vocabulary size, are included in the same
component and are also, in some way, measuring the same construct.
Perhaps it should not be a surprise if all three of the language related variables form part of the same
component. L1 and L2 vocabulary size have been demonstrated to correlate closely among native Arabic
speakers who use English as a foreign language (e.g., Masrai, 2015). But there are suggestions at the level of
theory too, for example Cummins’ Common Underlying Pro,ciency ideas (Cummins, 2000), that L1 and L2
vocabulary size should be related. There may be a general language ability factor at play here. However, it is not
so clear why IQ scores should form part of the same factor. The tests used in this study have been deliberately
chosen to be non-verbal assessments with the intention that this would avoid potential interference from language
knowledge and ability. The tests used are abstract reasoning tasks which involve completing a pattern or ,gure
with a part missing, by choosing the correct missing piece from among six alternatives. However, there is some
evidence that these types of reasoning task can function well in predicting language learning aptitude in young
children (Milton & Alexiou, 2006), and may pick up on the ability to infer rules and structures in language. It
must be noted that all four variables correlate quite strongly with each other and there is a long-standing
tradition that a wide variety of variables can all fall under a single general intelligence factor as in Spearman’s G
factor (Spearman, 1927). Nonetheless, this idea that the four variables may all be part of a single factor need not
detract from the evidence of the regression analyses which suggests the four variables investigated here interact
with academic performance as measured by GPA in slightly different ways and that a unique contribution to
GPA for each of them can be found.
Conclusions
The attempt to use several factors to predict and explain academic performance has produced results which are
very encouraging. The combined model of four variables in this study can predict nearly two-thirds of variance
in academic performance as measured by GPA, stronger than any individual factor. This suggests greater
predictiveness than most other studies even where several factors are combined in a predictive model (e.g.
Townsend et al, 2012; Daller & Yixin, 2016; and Roche & Harrington, 2013). This may be the result of the
particular circumstances of the learners and staff who provided the marks for GPA, involved in this study. The
bulk of the explanatory power is provided by L2 knowledge factors but the regression analyses suggest it is
possible to identify a unique, if sometimes marginal, contribution to variance in GPA scores for all the factors
investigated here.
The L2 general vocabulary and L2 academic vocabulary scores are strongly correlated and it is dif,cult to
decide how independently these two variables function. Our best interpretation of the results is to con,rm
Townsend et al.’s (2012) conclusion that knowledge of academic words provides some unique, albeit marginal,
variance to general academic success as measured by GPA, in addition to general vocabulary size. A focus on the
AWL in teaching, within this interpretation, appears a useful element of any English for academic purposes
course provided it is implemented within the context of an overall programme of vocabulary development for
learners to reach the size of lexicon necessary for =uent language use.

The factor analysis suggests all the variables here are closely related, and here our best interpretation is
that there may be a general language ability factor at play which is linked to other factors, identi,ed in other
studies, like IQ. Even though these factors appear closely related, the use of multiple tests in combination appears
to have potential for identifying learners at risk of academic failure. It may be possible to provide language
support for students at risk. The prominence of L2 vocabulary knowledge in predicting academic success
suggests that a wider use of vocabulary size tests speci,cally, in the acceptance process for learners at school or
university, could help improve the selection process and ensure those entering education and studying through
the medium of English as a foreign language have the skills to succeed academically.
While these results are encouraging, it must also be noted that this is a single study, drawing learners from a
homogenous L1 Arabic speaking background, with results drawn from two institutions in Saudi Arabia. Further
research is needed with larger samples, learners from different L1s, and including groups from different
disciplines, to con,rm the idea that combinations of factors can usefully predict students’ academic attainment.
References
Alderson, J.C. (2005). Diagnosing foreign language pro<ciency: The interface between learning and assessment. London:
Bloomsbury.
Astika, G. (1993). Analytical assessments of foreign students’ writing. RELC Journal, 24, 61–70.
Beglar, D., & Hunt, A. (1999). Revising and validating the 2000 word level and university word level vocabulary
tests. Language Testing, 16(2), 131–162.
Chen, Q., & Ge, C. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word
families in medical research articles. English for Speci<c Purposes, 26, 502–514.
Chung, M., & Nation, P. (2003). Technical vocabulary in specialized texts. Reading in a Foreign Language, 15, 103–
116.
Cobb, T., & Horst, M. (2004). Is there room for an AWL in French? In B. Laufer & P. Bogaards (Eds.), Vocabulary
in a second language: Selection, acquisition, and testing (pp. 15-38). Amsterdam: John Benjamins.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238.
Cummins, J. (2000). Language, power and pedagogy: bilingual children in the cross<re. Clevedon: Multilingual Matters.
Daller, M. H. & Phelan, D. (2013). Predicting international student study success. Applied Linguistics Review, 4(1).
173–193.
Daller, M., & Yixin, W. (2016). Predicting study success of international students. Applied Linguistics Review, (ahead
of print).
Harrington, M., & Carey, M. (2009). The online yes/no test as a placement tool. System, 37(4), 614−626.
Harrington, M. & Roche, T. (2014). Identifying academically at-risk students in an English-as-a-Lingua-Franca
university setting. Journal of English for Academic Purposes, 15, 37–47.
Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.
Laidra, K., Pullmann, H., & Allik, J. (2007). Personality and intelligence as predictors of academic achievement:
A cross-sectional study from elementary to secondary school. Personality and Individual Differences, 42(3),
441-451.
Laufer, B. (1992). How much lexis is necessary for reading comprehension? In H. Bejoint & P. Arnaud (Eds.),
Vocabulary and applied linguistics (pp. 126-132). London: Macmillan.
Laufer, B. (1998). The development of passive and active vocabulary in a second language: Same or different?
Applied Linguistics, 19(2), 255-271.
Laufer, B. & Ravenhorst-Kalovski, G. (2010). Lexical threshold revisited: Lexical text coverage, learners’
vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15-30.
Lesaux, N., Kieffer, M., Faller, S. E., & Kelley, J. G. (2010). The effectiveness and ease of implementation of an
academic vocabulary intervention for linguistically diverse students in urban middle schools. Reading
Research Quarterly, 45, 196–228.

Masrai, A. (2015). Investigating and explaining the relationship between L1 mental lexicon size and organisation and L2
vocabulary development. Unpublished PhD thesis. Swansea University, UK.
Masrai, A., & Milton, J. (2012). The vocabulary knowledge of university students in Saudi Arabia. TESOL Arabia
Perspectives, 19(3), 13-20.
Masrai, A., & Milton, J. (2017). How many words do you need to speak Arabic? An Arabic vocabulary size test.
Language learning Journal, (ahead of print).
Masrai, A., & Milton, J. (forthcoming). Measuring the contribution of academic and general vocabulary
knowledge to learners’ academic achievement. Journal of English for Academic Purposes.
Masrai, A., & Milton, J. (forthcoming). Frequency distribution of the words in AWL in BNC and BNC/COCA.
Milton, J. (2009). Measuring second language vocabulary acquisition. Bristol, UK: Multilingual Matters.
Milton, J. (2013). Measuring the contribution of vocabulary knowledge to pro,ciency in the four skills. In C.
Bardel, C. Lindqvist & B. Laufer (Eds.), L2 vocabulary acquisition, knowledge and use: New perspectives on
assessment and corpus analysis (pp. 57-78): EUROSLA monograph 2.
Milton, J., & Alexiou, T. (2006). What makes a good young language learner? In A. Kavadia, M. Joanopoulou &
A. Tsaggalidis (Eds.), New directions in applied linguistics (636-646). Thessaloniki, Greece: Aristotle University
of Thessaloniki.
Milton, J. and Fitzpatrick, T. (eds.). (2014). Dimensions of vocabulary knowledge. Basingstoke, UK: Palgrave
Macmillan.
Milton, J., & Treffers-Daller, J. (2013). Vocabulary size revisited: The link between vocabulary size and academic
achievement. Applied Linguistics Review, 4(1), 151–172.
Milton, J., Wade, J., & Hopkins, N. (2010). Aural word recognition and oral competence in a foreign language. In
R. Chacón-Beltrán, C. Abello-Contesse & M. Torreblanca-López (Eds.), Further insights into non-native
vocabulary teaching and learning (pp. 83-98). Bristol: Multilingual Matters.
Mochida, A., & Harrington, M. (2006). The yes/no test as a measure of receptive vocabulary knowledge.
Language Testing, 23(1), 73–98.
Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Speci<c Purposes, 25,
235–256.
Nagy, W., & Townsend, D. (2012). Words as tools: Learning academic vocabulary as language acquisition.
Reading Research Quarterly, 47(1), 91-108.
Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Nation, I.S.P. (2004). A study of the most frequent word families in the British National Corpus. In Bogaards P.
and Laufer B. (Eds), Vocabulary in a second language: Selection, acquisition and testing (pp. 3-13). Amsterdam:
John Benjamins.
Nation, I.S.P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language
Review, 63(1), 59–81.
Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, A. W., Brody, N., Cesi, S. J., et al. (1996). Intelligence: Knowns
and unknowns. American Psychologist, 51, 77–101.
Qian, D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension.
Canadian Modern Language Review, 56, 282–308.
Raven, J., Raven, J. C., & Court, J. H. (1998). Manual for Raven's Progressive Matrices and vocabulary scales:
Section 4 Advanced Progressive Matrices sets I and II, 1998 ed. Oxford, UK: Oxford Psychologists Press
Ltd.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Roche, T., & Harrington, M. (2013). Recognition vocabulary knowledge as a predictor of academic performance
in an English as a foreign language setting. Language Testing in Asia, 3(1), 1-13.
Rohde, T. E., & Thompson, L. A. (2007). Predicting academic achievement with cognitive ability. Intelligence,
35(1), 83-92.

Saville-Troike, M. (1984). What really matters in second language learning for academic achievement? TESOL
Quarterly, 18(2), 199–219.
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching.
Language Teaching, 47(4), 484-503.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of
the Vocabulary Levels Test. Language Testing, 18(1), 55–88.
Spearman, C. (1927). The abilities of man. Oxford: Macmillan
Stæhr, L. S. (2008). Vocabulary size and the skills of listening, reading and writing. Language Learning Journal, 36(2),
139–152.
Townsend, D., Filippini, A., Collins, P., & Biancarosa, G. (2012). Evidence for the importance of academic word
knowledge for the academic achievement of diverse middle school students. The Elementary School Journal,
112(3), 497-518.
West, M. (1953). A general service list of English words. London: Longman.
Zimmerman, K. (2004). The role of vocabulary size in assessing second language pro<ciency. Unpublished MA thesis.
Brigham Young University.
About the Authors

Ahmed Masrai is Assistant Professor of Applied Linguistics at King Abdulaziz Academy, Saudi Arabia. He holds a
PhD degree in Applied Linguistics from Swansea University, UK. His research interest includes second language
acquisition, lexical processing, and vocabulary assessment. This interest has led to a number of publications in
these areas.
James Milton is Professor of Applied Linguistics at Swansea University, UK. He worked in Nigeria and in Libya
before coming to Swansea in 1985. A long-term interest in measuring lexical breadth and establishing normative
data for learning has produced extensive publications including Measuring Second Language Vocabulary Acquisition
(Multilingual Matters, 2009).

Using Category Generation Tasks

to Estimate Productive Vocabulary Size in a Foreign Language
Shadan Roghani
James Milton*
Abstract
This paper reports an investigation into whether a test of productive vocabulary size using a category generation task can be
useful and effective. A category generation task is a simple task where learners are asked to name as many as words as they
can from a prescribed category such as animals or body parts. The virtue of this approach is that it potentially allows an
estimate of productive vocabulary size, comparable to receptive size estimates, to be made. Four such tasks were trialled on
92 learners ranging from elementary to advanced level. Subjects also took Nation’s Productive Vocabulary Levels Test
(PVLT) (2001) and Meara & Milton’s X-Lex (2003). The results suggest that category generation tasks can produce
vocabulary size estimates and these are comparable in size with PVLT and about one third of the size of a receptive
vocabulary size estimate (X-Lex). The tests appeared very reliable and can distinguish between learners of different levels of
performance. There are still issues to be resolved concerning the tasks which can be used and the volumes of vocabulary
they can potentially obtain. Factor analysis suggests the receptive and all the productive tasks test a single factor.
Key words: productive vocabulary, vocabulary size, category generation task, vocabulary assessment, frequency
vocabulary bands
Introduction
The acquisition of vocabulary knowledge, that is growing a lexicon of an appropriate size and quality, is crucial
to language learning success. Since it is an aspect of language knowledge which is so important, it would make
sense to measure and monitor its development among learners, and where this is done it appears that
measurements of knowledge can be very useful. So, for example, estimates of vocabulary size correlate well with
performance in all the language skills and in formal exams (e.g. Stæhr, 2008; Milton et al, 2010). Learners with
larger vocabularies tend to perform better than those with smaller vocabulary knowledge in these activities.
Approximate vocabulary sizes have been identi6ed as requirements for passing formal exams such as Cambridge
FCE in English, and have been linked to hierarchies of communicative levels as in the CEFR (Milton, 2010;
Milton & Alexiou, 2009). Because vocabulary is so important perhaps it is not surprising that students identify
shortcomings in their L2 vocabulary knowledge as a principle obstacle to comprehension (Laufer, 1989). The
importance of vocabulary is such that Long & Richards (2007, p.xii) suggest that ‘vocabulary can be viewed as
the core component of all the language skills.’
*Tel: (44) 1792 295678; E-mail: j.l.milton@swansea.ac.uk; Department of Applied Linguistics, Swansea University, Singleton
Park, Swansea SA2 8PP, UK.

While clearly insightful, there is a feeling among academics that making this kind of measurement can be a
complicated business. Vocabulary knowledge, it seems, is multifaceted. It can include knowledge of both the
written and oral forms of words. It includes possessing a link between a word form and its meaning including the
associations which a word can carry and which can vary from one language to another. It can include a
knowledge of how words can combine into collocations and idioms, and a knowledge too of when, and when
not, to use some of these words. Vocabulary researchers usually make a distinction between vocabulary size or
breadth, the number of words a learner knows, and vocabulary depth, how well these words are known and how
well and idiomatically they can be used. It can also include making a distinction between receptive and
productive knowledge, an observation that goes back at least as far as Palmer (1921). Palmer identi6ed a
difference between the words learners can recognise, what is called today receptive or passive vocabulary, and the
words a learner can use and communicate with, a sub-set of the receptively known words which learners can
readily call to mind for use in speech and writing and referred to today as productive or active vocabulary. He
suggested these different types of word knowledge should be assessed separately. Different tasks, it seems, appear
to activate different kinds of vocabulary knowledge (Webb, 2005), and different kinds of vocabulary knowledge
can impact on the different language skills. For example, Milton & Riordan (2006) observed that knowledge of
words in their oral form can be measured separately from word knowledge in written form and that oral word
recognition predicts success in speaking tests while the ability to recognise words in their written form does not.
These different dimensions and features of vocabulary knowledge cannot, of course, be entirely unrelated.
Possession of a large receptive vocabulary is a precondition of having a large productive vocabulary, for example,
and the various dimensions of knowledge generally correlate quite well with each other as is noted by Fitzpatrick
& Milton (2014). There are even arguments that suggest they can be collapsed into a single dimension of
vocabulary knowledge. Vermeer (2001) argues that breadth and depth are essentially the same construct. Meara
(1997) argues that automaticity in word production is a product of the number of links between words, a product
of depth therefore. Fitzpatrick & Milton (2014, p.177) in considering the strength of the inter-relationship
between the elements of vocabulary knowledge speculate that it may be possible, ‘ through frequency (Ellis,
2002a; 2002b) to explain the driver behind all the aspects of knowledge in [Nation’s] table.’ Nonetheless,
multiple testing of vocabulary knowledge is often advocated so that a learner’s knowledge can be more fully
characterised (e.g. Nation, 2007; Richards & Malvern, 2007). While there seems general agreement that using
multiple tests is desirable it is not clear that this is actually done outside the realm of specialist researchers.
Perhaps this is because the standard tests of vocabulary are relatively few and are limited, largely, to testing
receptive vocabulary breadth. This paper is particularly concerned with assessing the potential for a test which
measures productive knowledge; how many words do learners have that they can easily activate and use for
communication, in the hope that this will make the process of multiple testing more practical.
There are several well recognised tests is the area of receptive vocabulary size, but well-established tests are
lacking in other areas of vocabulary knowledge such as productive vocabulary knowledge. Receptive vocabulary
size, or breadth, testing attempts to estimate how many words in the foreign language a learner can recognise,
and this type of testing is usually distinguished from vocabulary depth testing which attempts to assess how well
these words are known and whether they can be used appropriately Receptive breadth tests have the advantage
in their creation that the writer can control the items being tested and make a principled selection of words from
which a good estimate of knowledge can be made. Both Nation’s Vocabulary Size Test (VST) (2012) and Meara
& Milton’s X-Lex (2003) work in this way and sample words across the frequency bands and this is used to form
an estimate of vocabulary size. These tests also have the advantage that they do not have to be customised to the
6rst language of the learners and can be quick to deliver and are easy to mark. Nation’s VST uses a multiple
choice format where the learners select a meaning for a test word from a choice of four explanations and where
the explanations are ‘in much easier language than the tested word’ (Nation, 2012, p.3). The checklist format in
Meara & Milton’s X-Lex is particularly minimalist requiring only that the testee identi6es words that they
recognise in a list, and the computer version of this takes only a few minutes to deliver and marks itself. With

both tests it appears relatively straightforward to produce parallel forms of the tests and the different forms are
reported to be equivalent (Nation, 2012; David, 2008).
However, even these tests have their drawbacks. Nation (2012) reports that VST may under-estimate where
learners are not motivated to perform on the test, but this could be said of any form of assessment. A more
serious consideration is the potential for the test to over-estimate where learners are prepared to use guesswork to
provide answers to words they do not know. The multiple choice format means that there is a one in four chance
of getting the right answer by guesswork and there appears to be no mechanism for recognising where this is
occurring and adjusting for it when it does occur. X-Lex does have such a mechanism and includes false words,
and, where the testee identi6es these as known words, an arithmetic formula is applied and the score is reduced.
But X-Lex’s simple checklist method is also prey to potential problems especially in terms of dealing with
learners’ uncertainty over their knowledge of a word. This form of test takes no account of partial or incomplete
knowledge, and low level learners in particular are often unsure over things like spelling and may not, therefore,
be able to represent the knowledge that they have. Nonetheless, both tests are reported to be robust and reliable.
In an ideal world a test of productive vocabulary knowledge would have the good qualities of the receptive tests
and would be easy to use and capable of accessing a suf6cient and principled sample of the learner’s vocabulary
from which to form a good estimate of size. Ideally it should be able to demonstrate good reliability so test and
retest scores, for example, should not differ signi6cantly if there is no change in the vocabulary knowledge being
tested. It should be able to demonstrate the same kinds of construct validity that receptive tests have, as in the
ability to draw on a principled sample of words from across the frequency bands so that a good estimate of size
can be made. It should possess good concurrent validity and correlate appropriately with other scores of the
same or similar quality. So, a good productive test, if it is working well, should correlate with other tests of
productive vocabulary size and should probably correlate too, although perhaps less well, with receptive
vocabulary size which is generally considered a different although related construct.
Well recognised productive tests are harder to 6nd than receptive tests. This may be because in many
productive tasks, the choice of words is that of the testee and this may prevent a useful sample of words being
created from which meaningful conclusions about vocabulary size or knowledge can be drawn. Thus, measures
of lexical diversity and sophistication (e.g. Meara & Bell, 2001, P-Lex) appear sensitive to genre (van Hout &
Vermeer, 2007) so the scores they produce may say more about the nature of the text rather than the lexicon
which produced it. These measures are also sensitive to length and a minimal length, usually several hundred
words, is needed before stable results are achieved (e.g. Meara & Bell, 2001). These approaches do not generally
produce an estimate of size but the exception to this is Meara & Miralpeix’s V-Size (2008) which analyses a
testee’s text and calculates the proportions of vocabulary occurring in 6ve frequency bands to produce a curve.
This curve can then be compared with curves from other texts where the size of the writer’s lexicon is known and
an estimate of the testee’s lexical size can be made. Meara & Miralpeix’s initial conclusions are that this
approach is not sensitive to genre or to the length of text and that it can discriminate between learners of
different ability levels. The idea is an interesting one but our experience is that the scores it produces are rather
erratic and more work is probably needed to demonstrate the reliability of this approach.
Other approaches to productive vocabulary testing use controlled methods for eliciting knowledge. Laufer
& Nation’s Productive Vocabulary Levels Test (PVLT) (1999) takes a sample of words from the second, third, 6fth
and tenth 1000 word frequency ranges, and from the university word list as the target vocabulary for their test.
Students are presented with a sentence giving context with the target word missing from the context, although
the initial letters of the target are provided. Testees 6ll in the missing word. This approach has the considerable
merit that its sample of words is directly equivalent to Nation’s receptive Vocabulary Levels Test (Nation, 2001)
and so productive and receptive scores ought to be directly comparable. The approach has been criticised,
however, in that the degree of contextualisation may be so great that it becomes a receptive test in another form
(Webb, 2008). This strikes at the heart of the issue in the creation of a test of productive vocabulary knowledge.
Productive performance requires some kind of prompt and there is no agreed construct of productive knowledge
to guide us as to how rich or minimal in contextualisation such a prompt should be. Webb (2008) considers a less

rich context in testing, therefore and suggests the merit of a translation test where the testees are presented with a
prompt in their native language to elicit a translation into the foreign language target word. The approach is a
simple one which ought to allow the test writer to make the kind of sample of knowledge that an estimate of
vocabulary size could be drawn from. In terms of practicality, however, this approach will not be so
straightforward in, say, a class of learners from many different 6rst language backgrounds and where multiple
different forms of the test will be needed. It seems there is still the opportunity for a convincing methodology to
emerge in this area to produce meaningful and useful estimates of productive vocabulary size which, like the
receptive tests described above, are simple enough to be used by learners from all language backgrounds and
with a simple enough prompt to avoid replicating a receptive test in another form.
The research presented in this paper aims to access and measure productive vocabulary size using a new
test format to see if category generation tasks can be a useful addition to testing in this area.
What Are Category Generation Tasks?

A category generation task is a simple task where the student is asked to name as many as words as they can from
a prescribed category such as animals, body parts, clothes or furniture. This approach to word elicitation is
widely used in psychology research and produces reliable scores which can be used to provide evidence of, for
example, cognitive development or language impairment (Izura, Hernández-Muňos & Ellis, 2005). While this
approach has been used with bilingual children (McKinney, 2009), it does not appear to have been used among
second and foreign language learners to produce estimates of vocabulary size.
In the context of foreign language learning this approach does raise issues as to whether testing
knowledge of lexical sets in this way can provide a good estimate since these are staple thematic areas generally
addressed in elementary learning materials. Where knowledge of these areas is speci6cally taught, it may not
accurately reKect knowledge of vocabulary overall. It might be argued too that a testing approach based on
lexical sets might have an unwanted backwash in encouraging the teaching of vocabulary through lexical sets, a
technique currently thought to be less than optimal (e.g. Tinkham, 1997). However, teaching materials, if they
are to be coherent and usable, must have some thematic organisation and a testing approach that reKects this
might be thought desirable. It should be noted that the research evidence with suggests that teaching vocabulary
in semantically unrelated sets promotes better retention, always shows too that teaching through lexical sets is
effective. We would argue, also, that this is a legitimate productive task since lists are widely used by all language
users for example for shopping or when packing for holidays and is therefore a meaningful way of accessing
productive knowledge. It is a task which requires minimal explanation and is equally applicable to learners
regardless of their language background. We are aware that, notwithstanding potential shortcomings, it is a
technique currently used in EFL where the vocabulary knowledge of very young and low level learners is tested
and where more complex production is impractical. It is part of the purpose of the research presented here to
assess whether these issues prevent the technique from producing good estimates of productive vocabulary size.
The category generation task format potentially offers the chance to gain an estimate of productive
vocabulary size comparable to receptive size measures. Language learners have a tendency to learn frequently
occurring vocabulary before less frequent items (Milton, 2007) and this provides a rationale for receptive
vocabulary tests where the selection of items focuses on the initial frequency bands. In respect of the category
generation tasks, frequency lists as used in Cobb’s website (Cobb, 2014) can provide us with items from each
category divided by frequency band. So, for example, the BNC/COCA Cobb uses lists include six animals in the
6rst 1000 word band. If, in producing a list of animals the testee names all these six animals then for the
purposes of estimating vocabulary size it might be assumed that all the words in this 1000 word band are known.
If only three are produced then it might be estimated that only 500 of this 1000 word band are known. By
examining knowledge of the words from each category which occur in the more frequent ranges a workable
estimate of overall size can be made.

Research Questions
The intention in this study is to use four category generation tasks with EFL learners and to use the words that
testees produce to calculate estimates of productive vocabulary size which might be seen as equivalent to the
receptive vocabulary size estimates produced by X-Lex. The broad aim, therefore, is to examine whether these
estimates can be fairly described as believable, reliable and valid. Do category generation tasks have potential as
useful measures of vocabulary knowledge?
To achieve this broad aim we have set a number of speci6c research questions.
1. Is there a frequency effect in learning to suggest that a test targeted on the 6rst 6ve 1000 words bands is
appropriate in a productive test?
2. Does the test produce suf6cient data for estimates of size to be made?
3. Do the scores from parallel forms of the test suggest that the test is reliable? Do they produce estimates
which are similar in size and which correlate?
4. Are the scores comparable with other equivalent tests of vocabulary size and knowledge: Laufer &
Nation’s PVLT (1999) and Meara & Milton’s X-Lex (2003)?
5. Are estimates on the test capable of distinguishing between learners at different levels of knowledge and
performance: beginner, intermediate and advanced levels?
6. Do these tests and PVLT access a single factor of knowledge, productive vocabulary size, and can this be
distinguished from a receptive vocabulary size measure X-Lex?
Method
Participants
A total of 92 EFL learners were tested in a foreign language teaching institute in Iran. The learners came from
three difference levels of knowledge: basic, intermediate and advanced levels as categorised by the institute. The
92 learners comprised 43 male and 49 female participants, were aged between 15 and 40, and were distributed
among the three levels as shown in Table 1.
Table 1
Participant Levels
Level Basic Intermediate Advanced Total
Number 36 23 33 92
The Tests
Four category generation tasks were used: animals, clothes, body parts and furniture. These categories are described by
Izura et al (2005, p.386) as ‘commonly used in cognitive, neuropsychological and linguistic research’ and which
proved capable of prompting considerable language output from the participants.
Laufer & Nation’s Productive Levels Test version C (Nation, 2001, p.425-428) was used as a second test of
productive vocabulary knowledge. Scores from versions of Nation’s VLT are widely used as a proxy for
vocabulary size (e.g. Stæhr, 2008). The entire test was not administered and only the 2,000, 3,000 and 5,000
levels were used. This was converted to a productive vocabulary size estimate out of 5,000 using the formula:
2000 level score * 2000 + 3000 level score * 1000 + 5000 level score * 2000 = size
18 18 18

A paper version of Meara & Milton’s X-Lex (2003) was used as a second measure of vocabulary
knowledge. This version tests 20 words in each sample across each of the 6ve most frequent 1000 word bands
taken from Hindmarsh (1980) and Nation (1984). The test contains a further 20 false words. Testees are required
to indicate if they know each of these words. Yes responses to the false words are taken to indicate that the testee
is over-estimating their knowledge and the score drawn from the Yes responses to the real words is adjusted
downwards accordingly.
Procedure
The participants took the tests in class in the order: X-Lex, the generation tasks, and 6nally the PVLT. They were
given a booklet to record all their answers. Instructions were given orally in English. There was no time limit
imposed but all students completed the tasks within the 45 minutes of the class.
Analytical Procedure
The tests can be argued to have good construct validity if they can be shown to generate words across the 6rst
6ve 1000 words frequency bands and it is expected that frequency effects should be visible in the data produced
by students. Learners should score more in the higher frequency bands than the less frequent ones. If the
responses do not display this kind of frequency pro6le then this will undermine the potential for category
generation tasks as we are using them to provide a good estimate of size.
The number of words available for selection from each of the four categories separated by the 6ve 1000
words frequency bands (taken from the BNC/COCA lists) is shown in Table 2.
Table 2
Availability of Words in the First Five Frequency Bands Divided by Category.
1000 2000 3000 4000 5000 Total
Animals 6 6 15 14 15 56
Clothes 16 10 4 7 14 51
Body parts 24 12 10 17 10 73
Furniture 20 39 4 10 10 83
The words produced by learners from each of these bands are compared with the number of words
available for selection in each frequency band and these 6gures are used to generate an estimate of knowledge
out of 5000. For example, if a learner were able to produce 28 of the 56 available words in the animal category
then it would be assumed that this represented productive knowledge of 50% of the 5,000 most frequent words
in English; a score of 2,500 words.
In testing this format’s reliability the results from the four categories can be used to generate a calculation
for Cronbach's Alpha. If the tests work well then the calculations generated by each test should correlate well and
the Alpha score should be high.
The category generation tasks can be argued to be valid if results correlate well when compared with
results from other tests of the same quality. It might be expected that they should correlate well with PVLT,
which tests the same construct of productive vocabulary knowledge. They should correlate too with X-Lex,
though perhaps not so well since X-Lex is, in theory, testing a slightly different construct. The tasks, if they are
producing useful estimates of productive vocabulary size, should also be able to distinguish between low level
learners and high level learners for example. It would be expected, too, that frequency effects should be visible.
Learners should score more in the higher frequency bands than the lower frequency bands.
Finally, it might be expected that if the category generation tasks and PVLT are testing the same quality of
productive vocabulary size then factor analysis and the calculation of Eigen values will con6rm that a single

factor underlies the results all 6ve tests. If receptive vocabulary knowledge is a separate and distinct construct
then these calculations should show that a second factor underlies the X-Lex scores.
Findings and Discussion

Frequency Effects
Responses from the 4 generation tasks, per 1000 word frequency group and presented as an estimate of words
known, are presented in Table 3 and an indication of the kind of frequency effects which emerge in the data are
summarised, using 6gures combined from all four tasks, in Figure 1.
Table 3
Total Responses by Frequency Band
1000 2000 3000 4000 5000
animals 316 216 168 262 229
clothing 455 292 88 127 205
furniture 501 449 210 47 40
Body parts 508 369 104 103 124
500
vocab size (max 1000)
400
300
200
100
0
1000 2000 3000 4000 5000
1000 word frequency bands
Figure 1. Frequency Effects in Productive Vocabulary
Table 3 and Figure 1 demonstrate a visible frequency effect with the bulk of learners’ vocabulary
knowledge lying in the most frequent 1000 and 2000 word bands. Beyond this mark and frequency effect is no
longer visible. Nonetheless, productive vocabulary knowledge resembles receptive vocabulary knowledge, with
the presence of the frequency pro6le as suggested by Ellegård (1960) and Meara (1982), and as observed in
Waring (1997). The implication of this is that the category generation tasks are be capable of providing a
characterisation of the scale of a learner’s productive vocabulary size. Since such an estimate is similar in its
calculation to a test such as X-Lex, which also draws its estimate from these frequency bands, this should allow
productive and receptive vocabulary size to be meaningfully compared.
Productive Size Estimates

Productive size estimates obtained from the four category generation tasks are shown in Table 4. The four
category generation tasks have demonstrated they draw words from across the 6rst 6ve frequency bands which
means that it is possible to produce estimates of productive vocabulary size. The mean size estimates produced
are in the region of about 1000 words. There is some variation here with the Furniture task producing the
smallest mean estimate of 790 words, and the Clothes task the highest mean estimate of 1243 words. There are

no normalised 6gures for the size of productive vocabularies for learners at the levels in this study and the
signi6cance of these 6gures can only become apparent when compared with results from the others tests.
Table 4
Mean Word Knowledge by Category Generation Task
Mean productive vocabulary size SD
animals 1155.86 386.90
clothing 1243.61 380.38
furniture 790.99 243.96
body parts 1026.84 357.88
There are several reasons why the tasks used here might vary in the scores they produce. One is that the
topics, taken from the literature on testing in psychology, have not been chosen with EFL testing speci6cally in
mind. However, they are thematic areas which are typically contained in teaching texts for young and beginner
learners of EFL although we have no way of knowing exactly what lexis is contained in these teaching texts nor
how the treatment of this lexis may vary from one theme to another in terms of presentation and recycling. It is
conceivable that these differences in measured knowledge may accurately reKect differences in the presentation
of the material and this might challenge the usefulness of this approach as a quick and easily replicable method
generating consistent measures for productive vocabulary size. A second is that the size of the estimate may vary
according to the theme chosen for testing and not just the overall vocabulary knowledge of the learner. A third
possibility is that these differences may be related to the size of the category generation task itself. Thus, the
furniture category which has the largest number of words available for production produces the smallest size
estimate, and clothing which has the smallest number of words available produces the largest estimate. It is also
quite possible, however, that these differences are the by-product of different task forms and different
administrations, where some variation in scores is inevitable even in well-constructed and regulated tests. Nation’s
14,000 word multiple choice test, for example, has parallel forms which in trials, he reports (2012, p.5), produce
different scores.
These differences in the means between all four category generation tasks are statistically signi6cant, and
the results of t-test and Cohen’s D comparisons are given in Table 5. If parallel forms of this task consistently
produce scores which are different then this challenges the validity of the testing method and the usefulness of
the technique as a method for quickly and easily assessing productive vocabulary size. However, the Cohen’s D
calculations show that the effect size is highly variable. It is not yet clear, therefore, whether these differences do
challenge the test’s validity in this way or a simply part of the kind of variation which repeated testing produces
and which Nation (2012), for example, reports in relation to receptive vocabulary size testing.
Table 5
T-test Comparisons between the 4 Category Generation Tasks
Clothes Test Furniture Test Body Test
t-score Cohen’s D t-score Cohen’sD t-score Cohen’sD
Animal Test 2.617** 0.223 11.502** 1.13 3.511** 0.35
Clothes Test 17.836** 1.42 9.607** 0.79
Furniture Test 8.708** 0.59
Note. ** = signi6cant at the 0.01 level
Reliability Calculations
There are moderate to good correlations between scores on the four category generation tasks. All correlations
are statistically signi6cant at the 0.01 level. The 6gures are shown in Table 6.

Table 6
Category Task Inter-test Correlations
Clothes Test Furniture Test Body Test
Animal Test .554** .618** .649**
Clothes Test .688** .830**
Furniture Test .781**
Note. ** = signi6cant at the 0.01 level
The Body parts task scores correlate particularly well with both the Furniture and the Clothes task while
the Animals task scores correlate least well with the others. This observation might be connected to the number
of words available for production in these tests. The Body parts and Furniture tasks have the highest number of
words in the 5,000 word bands, 73 and 83 words respectively, while the Animal task has only 56 words. For
comparison it might be considered that the receptive X-Lex test samples 100 words form this 5,000 word range
and in the Animal and Clothing tasks there are only about half this number available for production. The
reliability of the task might be inKuenced by the sampling rate and, as a general rule, a larger sample is likely to
produce a more useful estimate. However, in this type of task a very large sample may challenge the immediate
recall ability of the learner and lead to under-estimation. A thematic prompt where there are 20 words available
from the 5,000 word range under examination is an achievable task but a similar task with 2,000 words is not.
The impact of the potential sample size available from different themes and task is something to be investigated.
The calculation of Cronbach’s Alpha using the 4 parallel forms of the productive task can be taken as an
indication of the degree to which these tests measure a single construct. The Cronbach’s Alpha result was .885
(N = 4). Notwithstanding potential dif6culties with individual category tasks and their sampling rate, the score
of .885 is good and can be taken as con6rmation that these tasks can produce results which are both reliable and
consistent.
Productive Scores by Level

Mean productive vocabulary size scores generated by each of the four category generation tasks, divided by the
level of the students, are shown in Table 7.
Table 7
Mean Productive Vocabulary Size Scores by Level
Level animals clothes furniture body parts
mean sd mean sd Mean sd mean Sd
Beginner 910 343 745 200 606 176 940 209
Intermediate 1102 298 995 223 754 114 1185 195
Advanced 1461 265 1357 289 1019 182 1616 296
The productive size scores generated by all four tasks increase with the level of the students as is expected.
The advanced group of learners produce in each task, on average, more words from the 5,000 word frequency
ranges, than the intermediate level students who, in turn, can produce more words on average than the students
at the beginner level. An ANOVA con6rms that this relationship is statistically signi6cant and the results are
shown in Table 8. Tukey tests con6rm that there are statistically signi6cant differences between the means at all
levels in all tests. The ability of these tasks to discriminate meaningfully between learners at different levels of
knowledge and performance supports the construct behind the test and suggests this technique is valid.

Table 8
ANOVA Scores from the Category Generation Tasks
test degrees of freedom F Sig
animals between groups 2 28.439 < .001
within groups 89
clothes between groups 2 55.648 < .001
within groups 89
furniture between groups 2 54.277 < .001
within groups 89
body parts between groups 2 68.634 < .001
within groups 89
PVLT And X-Lex Scores And Inter-test Correlations

If the new test form is to demonstrate concurrent validity then test scores should correlate with scores from
others tests of the same or related constructs. The new tests should correlate acceptably with PVLT, which is a
test ostensibly of exactly the same construct, and should correlate too with X-Lex scores, which tests a closely
related construct. PVLT mean scores per level and the overall means are shown in Table 9 and X-Lex mean
scores per level and the overall means are shown in Table 10.
Table 9
PVLT Scores Divided by Level
n mean Sd
Beginner 36 982 745
Intermediate 23 910 717
Advanced 33 2124 1348
Total 92 1338 1138
Table 10
X-Lex Scores Divided by Level
n mean Sd
Beginner 36 3084 845
Intermediate 23 2737 567
Advanced 33 3685 790
Total 92 3213 847
Correlations between PVLT and X-Lex scores, and the scores on the four category generation tasks are
shown in Table 11.
Table 11
Correlations between Category Generation Task Scores and PVLT and X-Lex Scores
PVLT X-Lex
Animals test 0.494** 0.362**
Body parts test 0.408** 0.424**
Furniture test 0.344** 0.324**
Clothes test 0.481** 0.353**
Both PVLT scores and X-Lex scores indicate, broadly, that the vocabulary size of the learners increases
with level as might be expected and this is con6rmed by ANOVAs (PVLT F (2,89) = 14. 539, sig<.001, X-Lex F

= (2,89) = 11. 237, sig<.001). Tukey tests, however, indicate that neither test is able to produce a statistically
signi6cant difference in the means between the Beginner and Intermediate students. The category generation
tasks were capable of doing this and one interpretation of this is that the category generation tasks are better
able to distinguish levels of knowledge among lower level learners than the other tests. PVLT produces an
estimate of size which is slightly larger than the estimates produced by the category generation tasks. An analysis
of variance used to calculate effect size suggests a moderately large effect size but this result is not statistically
signi6cant (F(89,2)=5.312, sig=.171). This may be a product of the different methodologies and knowledge being
accessed. PVLT provides quite extensive context and a letter cues for each test word where the category
generation tasks so not. Or it may be an outcome of the formula for turning PVLT scores into a size estimate
where not all frequency bands are tested and knowledge in these missing bands has to be inferred from
knowledge elsewhere. The difference between the means for the PVLT and the largest scoring category task,
Clothes, is not statistically signi6cant. The difference between the means for PVLT and Animals is signi6cant
only at the .05 level (t = 2.077, sig = .041). There are signi6cant differences between PVLT
and the means for the other two tests (Furniture t = 3.138, sig = .002, Body parts t = 5.177, sig < .001).
X-Lex produces a larger estimate of vocabulary size than either the category generation tasks or PVLT. X-
Lex is a receptive vocabulary size test and it is expected that receptive size estimates will be larger than
productive size estimates. An analysis of variance used to calculate effect size produces a result that is not
statistically signi6cant (F(89,2)=1.016, sig=.622). In a review of the literature in this area Milton (2009), Nation
(1990) and Schmitt (2000) report that the difference between these scores varies but that, typically, receptive sizes
are about double that of productive sizes. In this study the scores suggest that the productive size estimates are
between one third and a half of the size of the receptive estimates and the relationship is summarised in Figure
2. This 6gure suggests that while the 6ve productive sizes mean scores can be distinguished statistically, they are
of similar scale and in the right kind of proportion in relation to receptive vocabulary size. It may be that re6ning
the category generation tasks can make them perform more consistently in producing more similar size estimates.
Figure 2. Comparison of Receptive and Productive Vocabulary Size Estimates
Factor Analysis
Since PVLT and the four category generation tasks are all designed to access productive vocabulary knowledge
and produce estimates of productive size, it is expected that factor analysis should reveal a single factor
underlying the scores. Factor analysis and the calculation of Eigen values allows this to be investigated. The scree
plot (Figure 3) and component matrix (Table 12) suggest that this is the case. The scree plot identi6es only one

component with a score above 1. The component matrix indicates that the four category generation tasks all
correlate well with this factor while the correlation produced with PVLT is smaller but still satisfactory.
Figure 3. Scree Plot for Productive Vocabulary Size Tests
Table 12
Component Matrix for Productive Vocabulary Size Tests
Component 1
Animals .806
Clothing .865
Furniture .854
Body parts .928
PVLT .626
It is expected too, that when the 6ve productive tests and X-Lex are compared that more than one factor
should be visible since X-Lex is designed to access a different construct from the others and that receptive
knowledge is considered to be qualitatively and quantitatively different from productive knowledge. It is not clear
from the factor analysis that this is visible. The scree plot (Figure 4) and component matrix (Table 13) suggest that
a single factor underlies the scores in all six tests even if X-Lex, like PVLT, correlates less well with this single
factor than the category tasks. The implication of this is that receptive and productive knowledge scores are all,
largely, explained by just one factor. We presume this is vocabulary size but it could be other things. It could be a
general vocabulary knowledge factor or it could be a something non-linguistic like intelligence.
It is fashionable to think of vocabulary as multidimensional but these results suggest that one of the oldest
divisions of vocabulary knowledge, receptive and productive knowledge, may not be quite the division that is
thought. Of course, receptive and productive knowledge cannot be completely unrelated. A condition of having
a large productive vocabulary knowledge is having a large receptive vocabulary knowledge; it is presumably
impossible to produce meaningfully words in a foreign language that are not even recognised as words. In
principle, it should be possible for the reverse to be true and for a large number of words to be recognised even if
knowledge is so limited that they cannot be activated and used. However, our interpretation of the factor

analysis, and correlations between the productive and respective tests, is that in practice productive knowledge
tends to grow with receptive knowledge. Co-linearity is a feature of the studies which compare vocabulary size
with automaticity in production (Schoonen, 2010).
Figure 4. Scree Plot for All Vocabulary Size Tests
Table 13
Component Matrix All Vocabulary Size Tests
Component 1
Animals .794
Clothing .856
Furniture .828
Body parts .902
PVLT .669
X-Lex .599
Conclusions
What can we conclude from this? It is possible to make a case that the category generation task is, potentially, a
useful test format which can measure, and put a size on, productive vocabulary knowledge. The tests have proven
reliable and, in certain ways, valid. The category generation task triggers learners at all levels to produce a large
number of words with minimum direction or interference from the teacher or a text. It is able to target a
predictable range of words in the frequent vocabulary bands so that a workable estimate of productive
knowledge can be formed, and these estimates correlate reasonably well with each other so the Alpha score is
high. It distinguishes between low, intermediate and high level learners well, arguably rather better than PVLT
or X-Lex. It correlates, although modestly, with other tests of productive and receptive vocabulary knowledge,
and this suggests that teaching effects may not be signi6cantly affecting the ability of the technique to make a

good estimate of size. It also produces scores, consistently, which are smaller than receptive vocabulary size which
makes sense. It is a very easy format that requires very little adaptation to work across learners from different
language backgrounds, and it may be particularly useful in assessing knowledge among very low level learners.
This type of test for productive vocabulary size seems to have potential, therefore, but this study has raised
questions about the use of the technique and the estimates it creates which need to be investigated more
thoroughly.
One is that the separate scores from the different category generation tests and the PVLT all produce
different mean scores and, with one exception, the differences are suf6ciently great to be statistically signi6cant.
Parallel forms which give a stable size estimate are necessary if the test is to perform like the receptive tests of
vocabulary size and be capable of being used as a standard test in this area. Nonetheless, the scores that are
produced are all about one third the estimate of receptive vocabulary knowledge and that ties in with other
studies in the literature which compare receptive and productive vocabulary knowledge. It has already been
noted that parallel test forms rarely produce identical scores, but what should be made of the scale of variation
seen here is, as yet, unclear. As Meara (2009) points out, the words produced for assessments in productive tasks
are dependent on the task, the genre and the prompt itself so, perhaps, a range of scores is what we should be
seeing if students are responding to a range of tasks even if their vocabulary remains unchanged. The construct
of productive vocabulary could usefully bene6t from a more precise speci6cation to help us work through these
dif6culties.
It has to be noted too that this is just one study based on learners with one language background and in
one country. It would make sense to repeat this form of testing on other learners with different learning and
language backgrounds as a check to see that the technique is applicable beyond learners in Iran.
There are issues too with the prompts used in this study which are a small group of prompts drawn from
the psychology literature. These prompts were chosen not least because they are also areas typically covered in
young learner syllabuses. But this may make the scores they produce potentially misleading since words drawn
this way may also challenge the underlying idea that a good estimate of size is made by using a random sample
of words across the frequency bands. A sample that draws on the subject areas that we know that learners have
covered is not a random sample. The effect of such a choice of prompt also needs to be clari6ed although it is
not clear from this study that any effect that does exist is very great.
The sampling across the frequency bands, produced by these prompts, produces a workable selection, from
which an estimate can be made. But it is notable that the selections this produced are of different sizes and not
evenly spread across the frequency bands. The effect on the estimate this produces will need to be measured and
appraised. Given the issues which may surround the size of the potential sample a thematic prompt can produce,
it would also make sense to repeat this work with other prompts. It would make sense to investigate prompts
capable of producing larger samples in order to test the effect of this on the size of the estimate. Large prompts
seem likely to produce smaller estimates. It would be useful to know at what levels the estimates appear less than
useful. It would make sense, too, to investigate prompts capable of producing better and more equally sized
samples. This would seem likely to help control for the variation in scores produced by the four tests used in this
study. This would require the use of themes other than the four used in this study which were, in any case taken
from psychology. If the methodology is to prove useful in EFL then a wider variety of themes, perhaps more
directly applicable to EFL testing, might be appropriate. It might even be useful to test the use of other prompts
such as letters of the alphabet rather than thematic cues although in the psychology literature, these appear to
work rather differently.
Finally the factor analysis is raising an unexpected question since it appears that productive and receptive
vocabulary knowledge used here are not the separate constructs as they are generally portrayed but are all
tapping into a single factor which may be some general vocabulary knowledge or size. Maybe that should not be
surprising since the various dimensions of vocabulary knowledge ought to be connected. The ability to produce a
word has as a precondition that the word is known receptively, so it follows that a large productive vocabulary
knowledge must be associated with a large receptive score. High productive and low receptive scores ought to be

impossible if the construct of the lexicon is as we understand it, and the tests we use to access knowledge are
working tolerably well. The opposite may be potentially true, where a high receptive knowledge might be
associated with a small productive knowledge, but it is hard to imagine the circumstances of teaching and
learning that might produce a very highly disparate set of scores. The common acceptance of the idea of multi-
dimensionality in vocabulary knowledge and the need for multiple testing, therefore, should not blind us the way
these dimensions necessarily interconnect. Our interpretation of the factor analysis in this study is that for most
practical purposes, the need for multiple testing in vocabulary is probably not as important as is thought.
Multiple testing may be useful in the research community but it seems as though for most practical purposes a
single well-constructed test is likely to give a good impression of all aspects of vocabulary knowledge.
This study suggests that in its present form the test would be useful in schools in order to generate an
estimate of size so learners can be ranked or compared on their productive knowledge. Where a productive test
in particular is wanted, this will likely work well. However, it is not yet in a state where parallel forms can be
generated and a stable estimate of size produced and used, as in receptive vocabulary size tests, for use in
research or to link with other factors of language performance like exam performance.
References
Cobb, T. (2014). http://www.lextutor.ca/. (accessed 31st August 2014).
David, A. (2008). Vocabulary breadth in French L2 learners. Language Learning Journal, 36(2), 167-180.
Ellegård, A. (1960). Estimating vocabulary size. In Word, 16, 1960, 219-244.
Ellis, N. C. (2002a). 'Frequency effects in language processing: A Review with Implications for Theories of
Implicit and Explicit Language Acquisition. Studies in second language acquisition, 24(02), 143-188.
Ellis, N. C., (2002b). ReKections on frequency effects in language processing. Studies in second language acquisition,
24(02), 297-339.
Fitzpatrick T. and Milton J. (2014). Reconstructing vocabulary knowledge. In Milton, J. and Fitzpatrick, T. (eds.)
Dimensions of Vocabulary Knowledge(pp. 173-177). Basingstoke: Palgrave.
Hindmarsh, R. (1980). Cambridge English Lexicon. Cambridge: Cambridge University Press.
Izura, C., Hernández-Muňos, N. and Ellis, A. (2005) Cognitive norms for 500 Spanish words in 6ve semantic
categories. Behavior Research Methods, 37(3), 385-397.
Laufer, B. (1989). What percentage of text is essential for comprehension? In Lauren, C. and Nordman, M. (eds.)
Special Language; from Humans Thinking to Thinking Machines (pp. 316-323). Cleveden: Multilingual Matters.
Laufer, B. & Nation, P. (1999). A vocabulary size test of controlled productive ability. Language Testing, 16(1), 33-
51.
Long, M. and Richards, J (2007). Series Editors’ Preface. In Daller, H., Milton, J. and Treffers-Daller, J. Modelling
and Assessing Vocabulary Knowledge (pp. xii-xiii). Cambridge: Cambridge University Press.
Meara, P. (1982). Word association in a foreign language: a report on the Birkbeck vocabulary project. Nottingham
Linguistic Circular, 11, 29-37.
Meara, P. (1997). Towards a new approach to modelling vocabulary acquisition. In N. Schmitt, and M.
McCarthy, (Eds.) Vocabulary: Description, Acquisition and Pedagogy. Cambridge; Cambridge University Press,
109-121.
Meara, P. (2009). Connected Words: Word associations and second language vocabulary acquisition . Amsterdam: John
Benjamins.
Meara, P. and Bell, H. (2001). P-Lex: A simple and effective way of describing the lexical characteristics of short
L2 texts. Prospect 16(3), 323-37.
Meara, P. and Milton, J. (2003). The Swansea Levels Test. Newbury: Express.
Meara, P. M. and Miralpeix, I. (2008). Vocabulary Size Estimations: V_Size 41st Annual Meeting of the British
Association for Applied Linguistics (BAAL). Swansea, UK.

McKinney, K L (2009). Lexical Errors Produced During Category Generation Tasks by Bilingual Adults and Bilingual Typically
Developing and Language-Impaired Seven to Nine-Year-Old Children. Unpublished MA thesis The University of
Texas at Austin.
Milton, J. (2007). Lexical pro6les, learning styles and the construct validity of lexical size tests. In Daller, H.,
Milton, J. and Treffers-Daller, J. (eds.) Modelling and Assessing Vocabulary Knowledge (pp. 47-58). Cambridge:
Milton, J. (2009). Measuring second language vocabulary acquisition. Bristol: Multilingual Matters.
Milton, J. (2010). The development of vocabulary breadth across the CEFR levels. In Vedder, I. Bartning, I. &
Martin, M. (eds.) Communicative proEciency and linguistic development: intersections between SLA and language testing
research (pp. 211-232). Second Language Acquisition and Testing in Europe Monograph Series 1.
Milton, J. & Alexiou, T. (2009). Vocabulary size and the Common European Framework of Reference for
Languages. In Richards, B., Daller, M., Malvern, D., Meara, P., Milton, J. & Treffers-Daller, J. (eds.)
Vocabulary Studies in First and Second Language Acquisition (pp. 194-21). Basingstoke: Palgrave.
Milton, J. & Riordan, O. (2006). Level and script effects in the phonological and orthographic vocabulary size of
Arabic and Farsi speakers. In Davidson, P., Coombe, C., Lloyd, D. and Palfreyman, D. (eds) Teaching and
Learning Vocabulary in Another Language (pp. 122-133). UAE: TESOL Arabia.
Milton J., Wade, J. & Hopkins, N. (2010). Aural word recognition and oral competence in a foreign language. In
Chacón-Beltrán, R., Abello-Contesse, C. & Torreblanca-López, M. (eds.) Further insights into non-native
vocabulary teaching and learning (pp. 83-98). Bristol: Multilingual Matters.
Nation, I.S.P. (ed) (1984). Vocabulary Lists: words, afExes and stems. English University of Wellington, New Zealand:
English Language Institute.
Nation, I S P (1990). Teaching and Learning Vocabulary. Boston: Heinle and Heinle.
Nation, I.S.P. (2001). Vocabulary Levels Test. In Nation, I.S.P. (2001) Learning Vocabulary in Another Language (pp.
416-424). Cambridge: Cambridge University Press.
Nation, I.S.P. (2007). Fundamental issues in modelling and assessing vocabulary knowledge. In Daller, H.,
Milton, J. & Treffers-Daller, J. (Eds.) Modelling and Assessing Vocabulary Knowledge (pp. 33-43). Cambridge:
Nation, I.S.P. (2012). Vocabulary Size Test. instructions a t http://www.victoria.ac.nz/lals/about/staff/paul-
nation (accessed 31st August 2015).
Palmer, H.E. (1921.) The Principles of Language Study. London: Harrap.
Richards, B.J., & Malvern, D.D. (2007) Validity and threats to the validity of vocabulary measurement. In Daller,
H., Milton, J. & Treffers-Daller, J. (Eds.) Modelling and Assessing Vocabulary Knowledge (pp. 79-92).
Schmitt, N (2000). Vocabulary in Language Teaching. Cambridge: Cambridge University Press.
Schoonen, R. (2010). The development of lexical pro6ciency knowledge and skill. Paper presented at the
Copenhagen Symposium on Approaches to the Lexicon, Copenhagen Business School on 8-10 December
2010. Accessed at https://conference.cbs.dk/index.php/lexicon/lexicon/schedConf/presentations on
03.03.2011.
Stæhr, L.S. (2008). Vocabulary size and the skills of listening, reading and writing. Language Learning Journal, 36(2),
139-152.
Tinkham, T. (1997). The effects of semantic and thematic clustering on the learning of second language
vocabulary. Second Language Research, 13(2), 138-163.
van Hout, R. & Vermeer, A. (2007). Comparing measures of lexical richness. In Daller, H., Milton, J. & Treffers-
Daller, J. (Eds.) Modelling and Assessing Vocabulary Knowledge (pp. 95-115). Cambridge: Cambridge University
Press.
Vermeer, A. (2001). Breadth and depth of vocabulary in relation to L1/L2 acquisition and frequency of input.
Applied Psycholinguistics (2001) 22, 217-234.

Waring, R (1997). Comparison of the receptive and productive vocabulary knowledge of some second language
learners. Immaculata; The Occasional Papers of Notre Dame Seishin University. 1997, 94-114.
Webb, S. (2005). The effects of reading and writing on word knowledge. Studies in Second Language Acquisition, 27,
33-52.
Webb, S. (2008). The effects of context on incidental vocabulary learning. Reading in a Foreign Language, 20(2), 232-
245.
About the Authors

Shadan Roghani is a quali6ed teacher who has worked for 14 years in secondary schools. She obtained her Master’s
degree in TEFL from Swansea University in 2013. She is currently a PhD student in Swansea investigating the
topic ‘Using Lexical Generation Tasks in Testing Productive Vocabulary Size.’
James Milton is Professor of Applied Linguistics at Swansea University, UK. He worked in Nigeria and in Libya
before coming to Swansea in 1985. A long-term interest in measuring lexical breadth and establishing normative
data for learning has produced extensive publications including Measuring Second Language Vocabulary Acquisition
(Multilingual Matters, 2009).

How General is the Vocabulary in

a General English Language Textbook?
Hedy McGarrell*
Brock University, Canada
Nga Tuiet Zyong Nguien

Brock University, Canada
Abstract
This study reports on the analysis of a widely used “General English” textbook to explore the relationship between lexical
bundles included in the text and lexical bundles identi ed in relevant corpora to determine the appropriateness of the text’s
vocabulary in relation to its stated objective. Appropriateness is examined through the analysis of usefulness and functions,
and the relationship between the two, by comparing the usefulness scores of various functions. The results show a relatively
low level of usefulness of the lexical bundles in the textbook, meaning low frequency and small range of usage for the
analysed items. The function analysis showed that textbook includes all the functions. The most common function was
referential, followed by stance, special conversational, and discourse organizing functions. The current study offers an initial
step for future research of lexical bundles and their functions, and usefulness in language teaching and teaching materials
development; speci cally, it suggests a possible methodology to be used in such research. Moreover, the results of this study
provide insights into the value of lexical bundles in teaching and the development of teaching materials.
Keywords: multiword constructions, corpus research, English textbooks, textbook design
Introduction
Textbooks in second or foreign language learning programs are typically the main or even sole source of
vocabulary input for learners in classroom contexts and thus have a major impact on the vocabulary learners
encounter (McDonough, Shaw & Mashuhara, 2012; Neary-Sundquist, 2015). However, researchers, teachers
and their learners have repeatedly questioned whether the language included in these textbooks re6ects the
language used in real life situations (Biber & Reppen, 2002). Increasingly, studies show that language from
language in use, as captured in corpora, and language teaching materials are often at odds (Gabrielatos, 2006;
Koprowski, 2005; Meunier & Gouverneur, 2009; Shortall, 2007). The availability of suitable techniques for
analyses may have prevented more extensive research in the past, but increasingly corpus linguistics, with its large
data banks of naturally occurring text, provides a promising way of investigating such questions. One such
technique involves the analysis of multi-word combinations that co-occur repeatedly within the same register in
native speaker usage but are not typically xed nor structurally or semantically complete (Csomay, 2013). Conrad
& Biber (2004) show that approximately 20 percent of the words (tokens) in written academic
*Tel. 1 905 688 5550, ext. 3757. Email: hmcgarrell@BrockU.CA, Department of Applied Linguistics, Brock University, 1812
Sir Isaac Brock Way, St. Catharines, ON, Canada, L2S 3A1

texts occur within three or four word groups of such multiword combinations, which makes them an important
focus for further investigation as they have the potential to support subsequent language learning (L2). The
question then is whether textbooks include the multiword combinations typical for the stated purpose of a given
L2 textbook, in this study speci cally in an English as a Subsequent Language (ESL) for ‘general English’
textbook.
Researchers interested in the relationship between actual language use and the language presented in
textbooks and other teaching materials have pointed out how corpora can be used to answer questions about
variations in language across registers, lexico-grammatical associations, discourse variables, language acquisition.
McCarten (2010) argues that corpora provide sources for textbook developers to compile systematic lexico-
grammatical syllabi based on authentic texts. Research that focuses on the study of frequently occurring
multiword combinations (Biber, Johansson, Leech, Conrad, & Finegan, 1999: Cortes, 2004; Sinclair, 1991), often
referred to as lexical bundles in recent work, encountered in the texts represented in corpora is particularly
relevant for the current study. These vocabulary focused studies have investigated multiword combinations and
their structural and functional characteristics in various disciplines and registers such as academic prose,
conversation, classroom discourse, demonstrating their importance in diverse naturally occurring, which in turn
makes them an important component for learners’ vocabulary development. Increasingly, research ndings
support arguments in favour of including lexical bundles in textbooks and other pedagogic materials.
Considering the discrepancies pointed out in recent research between the vocabulary included in textbooks
and its occurrence in authentic language illustrated in corpora, the current study presents an analysis of a widely
used General English textbook, English File intermediate (student’s book) b y Latham-Koenig and Oxenden (2013).
The focus of analysis is on lexical bundles and seeks to determine whether the lexical bundles included in the
textbook represent broader usage as indicated in corpora. Given the research that demonstrates the frequency of
lexical bundle in a broad range of naturally occurring texts, with different bundles and functions depending on
register, such bundles have been shown to be present in textbooks Biber, 2006; Hyland, 2005), but with important
disciplinary differences. The underlying question that motivates the current study is whether the lexical bundles
in language learning texts re6ect those bundles researchers have identi ed as particularly frequent in language
situations relevant to the stated purpose of such a textbook.
The literature review below serves to de ne key terms used in the study and to provide background from
recent, directly related, studies. The role of corpora in materials development is discussed rst. It is followed by a
de nition of lexical bundles and a discussion of their role in natural language. The section concludes with a
de nition of function in relation to lexical bundles.
Corpora and Materials Development

Corpora represent typically large, principled collections of language use in different naturally occurring spoken
or written contexts and registers. They provide information on various aspects of language including distribution,
frequency, concordances, collocations and aspects of grammar that are often omitted in learner grammars
(Carter & McCarthy, 1995). Such a resource would seem ideal for L2 learning and teaching purposes and has
had a strong impact on dictionaries (McCarthy, 2008) and is beginning to in6uence the development of reference
and learner grammars (Lee & McGarrell, 2011; Meunier & Gouverneur, 2009). Corpora have yet to be widely
exploited for the creation of learning and teaching materials, but researchers are increasingly focusing on
potential areas that could support L2 learning.
Past research has compared corpus information and teaching materials in terms of various language
learning aspects, including vocabulary and grammar, in different contexts, such as written and spoken English for
general communication as well as professional and academic purposes. For example, Biber and Reppen (2002)
analysed the relationship between the information presented in ESL-EFL text books and ndings from corpus
linguistics. The researchers surveyed six widely used traditional grammar texts and analysed them based on three

speci c areas: grammar features included (types of adjectives), order of grammar topics (simple and progressive
aspect), and vocabulary used to present these areas. Their analyses showed that the relevant
materials in the selected textbooks did not re6ect the frequency data in corpora, that the sequence of grammar
points presented was not grounded in actual use and that there was little consistency in selecting vocabulary.
Biber and Reppen concluded that the textbooks analysed were developed based on instinct rather than language
in use. Considering their ndings, they argued that frequency information should be a key factor in materials
development choices, as frequently occurring vocabulary and grammar features are likely more useful for
learners. A replication of Biber and Reppen’s study, with more recent editions of either the same or comparable
grammar texts (Lee & McGarrell, 2011), suggests increasing awareness of the existing gap between materials and
language in use. These more recent editions were either corpus-based or corpus-informed (McCarthy, 2008),
thus were expected to re6ect a more authentic description of the speci c areas being analysed. Lee &
McGarrell’s analysis showed that the more recent texts tended to represent corpus ndings more closely, but still
left considerable room for improvement in terms of re6ecting actual language use. Similarly, Cheng and Warren
(2007) examined 15 EFL textbooks endorsed by the Hong Kong Education and Manpower Bureau and
compared them to the ndings generated from the Hong Kong Corpus of Spoken English (HKCSE). Analyses
showed that the vocabulary and language forms introduced in the textbooks were low-frequency items associated
primarily with academic registers, thus more complex and explicit than the forms found in the HKCSE. Finally,
two studies investigated speci cally the use of multiword combinations. Koprowski (2005) investigated the
usefulness of lexical phrases, in terms of their frequency and range, in contemporary textbooks compared to
corpus data. The analysis involved 822 items based on their usefulness scores generated from the frequency and
range data in the COBUILD Bank of English, a computerized corpus containing 17 different British and
American native-speaker subcorpora (e.g., newspapers, magazines, books, radio, informal conversations).
Findings showed that one third of lexical phrases used in the textbooks analysed were low-frequency items, thus
unlikely useful in most real communication. Koprowski questioned the validity of lexical selections, suggesting
that they were likely again based on the textbook writers’ intuitions and experience rather than real language.
The studies discussed in this section show that despite the availability of corpora and corpus research
ndings, materials writers rely heavily on intuition. The paucity of textbooks that incorporate insights from
corpora may be attributed to the fact that early corpora tended to be designed for linguists and were dif cult to
access for materials designers and teachers. In his investigation into the attitudes of text book writers towards
corpus materials, Burton (2012) discovered that many of these authors share a lack of knowledge of corpora, in
terms of their existence, bene ts and exploitation. Considering these ndings, he agrees with McCarthy (2008) in
his conclusion that to effect change, teachers and their students will need to request that publishers produce
materials that re6ect the most accurate portrayals of language. This, in turn, underlines the need for language
teacher education programs to include readings on corpus linguistics and encourage student teachers to become
familiar with the exploitation of corpus materials for learning and teaching language. Timmis (2013) stresses the
value of viewing corpora as contributors to course materials rather than arbiters of lexical-grammatical choices.
He points out that such a view allows for corpus frequency information to be reconsidered to accommodate e.g.,
developmental sequences, local need, intuition, cultural and pedagogic considerations and concludes that
corpora do not inform practitioners what or how to teach, they do, however, provide valuable information on the
nature of language and language production for consideration in materials design.
De(nition and Importance of Lexical Bundles

Several researchers have commented on the repetitiveness of language, especially multiword combinations,
across registers (e.g., spoken discourse, academic prose, ction) but refer to these combinations with different
terms. Conrad and Biber (2004) list six characteristics of multiword items: xedness; idiomaticity; frequency;
length of sequence; completeness in syntax, semantics, or pragmatics; and intuitive recognition by 6uent speakers
of a language community (p. 57). For example, the multiword combination at the drop of a hat has such

components as xedness, idiomaticity, completeness, and intuitive recognition by native speakers of English. The
combination is an idiom that has a xed form and is recognized by native speakers as one unit (one is unlikely to
hear native speakers use a variation such as at the hat’s drop), with low transparency in meaning. Multiword
combinations thus differ from idiomatic expressions and collocations in both form and scope. For a review of full
discussion see e.g., Wray (2002). Nattinger and DeCarrico (2001) refer to multiword combinations as lexical
phrases, stressing the importance of xedness and pragmatic completeness, while Bahns and Eldaw (1993) use the
term word combination, which for their purposes does not include the xedness component. Building on their own
and earlier work, Conrad and Biber refer to multiword items as lexical bundles, and point out two main criteria:
frequency and register. The rst criterion, frequency, relates to cut-points, meaning the number of times a lexical
bundle occurs in a corpus, in relation to the size of the overall corpus and the research goals. The second
criterion relates to multitext occurrences (i.e., dispersion), typically at least ve texts in any one register, but again
dependent on the corpus and research goals. This criterion is intended to avoid personal preference by individual
writers of text in their use of lexical bundles. While Conrad and Biber (2004) recognize that other features are
involved in de ning multiword combinations, they have identi ed frequency and multitext occurrences as the
most important. They argue that such lexical bundles represent “the most frequent recurring xed lexical
sequences in a register” (p. 59).Researchers have identi ed lexical bundles of varying lengths but increasingly
focus their analyses on 4-word bundles. The structure of 4-word bundles tends to contain 3-word bundles
(Cortes, 2003; Hyland, 2008; Wood, 2013) but exclude most non-standard or meaningless bundles of two or
three words (Hyland, 2008). Wood (2013) points out that 5- and 6-word bundles are relatively less common than
4-word bundles, thus the longer bundles would provide more limited frequency data.
Research shows that the use of lexical bundles is connected to improved 6uency in learners’ spoken and
written discourse (Fan, 2009; Nation, 2001; Wood, 2010; Wood & Appel). From a psycholinguistic perspective,
there is an underlying assumption that such lexical bundles are stored as one unit, making their recognition and
retrieval easier, faster, and requiring less attention to complete a task, thereby freeing up processing capacity for
greater 6uency (Conrad & Biber, 2004; Wood, 2010). To examine the relationship between ESL learners’ use of
lexical bundles in academic writing and their English language ability, Appel (2016) analysed argumentative
essays the learners wrote for the Canadian Academic English Language (CAEL) test. The resulting corpus of
essays was divided into three subcorpora: the Lower Level Corpus (LLC), which included essays that the
examiners had judged to be at a beginner level, the Medium Level Corpus (MLC), texts produced by
intermediate level writers, and the High Level Corpus (HLC) from upper-intermediate and advanced level
writers. The lexical bundles in each subcorpus were then examined in terms of their frequency similarity, and
length. The ndings showed that high-level writers tended to use more lexical bundles than low-level writers. In
addition, HLC writers typically used shorter bundles with less repetition of usage. Appel’s study thus provides
support for the notion that lexical bundle use is correlated to ability level in ESL learners.
Functions of Lexical Bundles

Research into multiword combinations or lexical bundles shows that certain types of lexical bundles are frequent
in different texts, often due to their functional characteristics, and lexical bundles with speci c functions are
associated with speci c registers and discourses (Schmitt & Carter, 2004). Biber et al. (2004) identi ed three
categories of functions for lexical bundles: stance, discourse, and referential. Stance bundles express personal attitude or
modality towards a proposition; discourse bundles indicate the relationship between parts of discourse; while
referential bundles directly indicate the temporal, spatial, and physical attributes of an object or a subject. Conrad
and Biber (2004) investigated the role of lexical bundles in spoken and written discourse in two subcorpora, one
that included transcripts of conversations from about 500 participants over the course of one week, while the
other included research articles and extracts from academic books. The researchers identi ed, then analysed 3-
and 4-word bundles with a minimum frequency requirement of 10 occurrences per million words in each of the
two registers. The lexical bundles had to have been used by more than one speaker in the conversation corpus,

and to have occurred in at least ve different texts in the academic prose corpus. The researchers compared the
resulting 4000 bundles from the conversation sub-corpus and the 3000 bundles from the academic subcorpus
based on three criteria: frequency in each register, structural pattern, and function. The frequency analysis
showed that the bundles appeared more frequently in conversation (28%) than in academic texts (20%). The
structural analysis showed that most of the bundles in conversations included part of a verb phrase while most of
the bundles in academic texts included parts of noun phrases and/or prepositional phrases. Finally, the function
analysis, which focused only on 4-word bundles as longer bundles are less frequent and typically include 4-word
bundles as part of their structures, showed that register resulted in noticeable differences between the function
bundles. For example, epistemic stance and discourse organizing bundles were more frequent in conversation,
while referential bundles occurred widely in academic texts. An additional category of function identi ed, special
conversational bundles, covered such functions as politeness routines (thank you very much), simple inquiry (what are
you doing?), and reporting clauses (I said to him), were identi ed in conversational discourse only. Further qualitative
analysis showed that epistemic stance bundles in conversation were widely used to express personal uncertainty,
opinions, desires, and intentions, while stance bundles in academic prose re6ected personal certainty. Discourse
organizing bundles in conversation were used to introduce or focus on a topic or as clari cation, while the same
type of bundles in academic texts was used to convey explicit contrast. Conrad and Biber concluded that while
lexical bundle use is frequent in both conversations and written academic texts, the type of bundle used depends
on register, its context, and purpose. Their ndings show that lexical bundle use is not accidental but re6ects
common patterns and types of bundles that vary depending on register, context, and purposes. One conclusion
that is suggested in these ndings is that language learners would likely bene t from some explicit instruction in
some of the most common patterns and bundles relevant to their learning goals.
In related research, Wood and Appel (2013) IN their analyses of the lexical bundles from the business and
engineering textbooks, showed that referential bundles were the most frequently occurring (62%), followed by
discourse organizing (24%), and stance bundles (14%). The researchers attribute the large number of referential
bundles to the fact that textbooks typically point out and explain subject matter. Wood and Appel suggest that
awareness of high-frequency lexical bundles used in different disciplines is likely to assist teachers and materials
developers in selecting the most appropriate items to include in textbooks of various disciplines. The inclusion of
lexical bundles in language teaching thus should serve to bene t learners’ awareness and linguistic ability.
An investigation of lexical bundles and their functions in relation to discourse structure is also the focus in
Csomay (2013), who examines classroom discourse. A corpus based on selected data from the TOEFL 2000
Spoken and Written Academic Language corpus and the rst six units of 196 university classroom sessions in the
Michigan Corpus of Academic Spoken English was analysed. The 84 4-word bundles identi ed were analysed
for their functions. The ndings show that stance bundles were used more frequently in the opening phase of
classroom session, while referential bundles were used more frequently in the instructional phase of the classroom
discourse. Stance bundles were typically used to convey personal obligation (e.g., I don’t know; do you think so) ,
while directive (e.g., it is necessary to; you don’t have to) and referential bundles (e.g., at the same time; one of the most) were
used to express time, place, and the speci cation of attributes. Discourse organizing bundles (e.g., what do you
think; on the other hand) were the least frequent in classroom discourse. Similar to previous studies, Csomay
concluded that the use of different types of lexical bundles varied according to the communicative context and
purpose and also suggested that the inclusion of different types of lexical bundles in pedagogy would likely
enhance students’ understanding of these lexical items in academic settings.
The above studies support the notion that various registers are associated with different types of lexical
bundles, based on the context and purposes of a register. Further research will likely clarify and con rm the
various associations. In the meantime, the authors of the above studies tend to agree that ndings should be
re6ected in textbooks and other classroom materials. While textbooks might be expected to re6ect frequently
occurring lexical bundles, studies exploring the relationship between textbooks and relevant corpora are not yet
readily available. Yet, the underlying assumption is that explicit explanations and illustrations in appropriate text
selections will have bene cial effects on learners’ language development. To address this perceived gap in the

literature, the study described in the following was designed to determine the extent to which a textbook used for
intermediate level English learners incorporates both relevant examples of lexical bundles and their functions.
Findings serve to shed light on the relationship between textbook language and language use as re6ected in
corpora.
This Study
The general English textbook English File: Intermediate Student’s Book (Latham-Koenig & Oxenden, (2013) was
selected as it is a widely used, with much of it available online. The text is intended to focus on spoken English
for general purposes, consists of 10 units divided into sections including grammar, vocabulary, pronunciation and
practical English episodes.
Methods
The data for the current study consist of an electronic version of the textbook under investigation. The 41,752-
word corpus created includes the reading texts, dialogues, and listening transcripts from all parts of all units, but
excludes grammar and vocabulary exercises, which include tasks such as matching, ll-in-the-blank, answering
questions, and instructional language. Similarly, items with names of people, nicknames, names of countries,
states and websites and social media were also excluded from analysis to avoid coincidences related to the
textbook itself.
Four-word bundles were generated through use of kfNgram concordancing software (Fletcher, 2012), a free
tool that extracts lexical bundles and provides frequency numbers. To generate the bundles, kfNgram was set to
extract 4-word bundles that had at least three occurrences, which re6ects the frequency cut-off of 40-99 times
per million words identi ed in Biber et al. (2004). This frequency requirement resulted in a total of 222 4-word
bundles. These 4-word bundles were analysed to identify sequences that were true 4-word bundles rather than 3-
word bundles with variable slots (Wood, 2013). The procedure entails the separation of each 4-word bundle into
two 3-word bundles. If frequency counts indicate that the 3-word bundle is more frequent than the 4-word
bundle, the 3-word bundle is considered to be the base structure. For example, the 4-word bundle in other words the
can be separated into in other words and other words the. As the frequency of the former is greater than that of the
latter, in other words is considered the base structure and the article the is considered a variable slot and placed in
parentheses. Once all 4-word bundles generated by kfNgram had been analysed, a list of 169 4-word bundles with
between three to 10 occurrences resulted, as illustrated in Table 1.
Table 1
Frequency of 4-Word Bundles
Frequency Number of items Percentage Example
10 2 1.2 I don’t know; I don’t think
8 2 1.2 I don’t want; do you think you
7 6 3.6 at the end of; don’t want to
6 5 2.9 as soon as I; if you don’t
5 6 3.6 do you think you; I was going to
4 22 13 and there is a; do you have a
3 127 75.1 about going to the; can you pass the
Total 169 100%
Analyses
Three stages of analysis served to address the research question. The rst stage assessed the 4-word bundles
based on their usefulness score. The second stage identi ed the various functions of the 4-word bundles in the

textbook corpus. A quantitative and qualitative analysis of the usefulness scores and functions was carried out in
the third stage of analysis. Each stage is described in the following.
Research referred to in the above has shown that the importance of a given lexical item is re6ected in its
frequency in naturally occurring texts from different but relevant sources. Koprowski (2005) suggested a
procedure to assign usefulness scores, i.e., a value that captures the frequency of lexical items in terms of
occurrences per million words in speci c corpora in addition to information about range, which refers to the
number of registers or text types in which a given lexical item can be found. Following Koprowski, usefulness
scores were assigned to the 4-word bundles in the textbook under investigation by comparing analysed items with
the COBUILD concordance, to determine their frequency data in ve sub-corpora of different text types, where
the analysed items were most commonly found. For the rst stage of analyses, averaged frequency scores from
the ve individual frequency scores provided the usefulness score for each 4-word bundle and re6ect their
frequency and range across ve text types.
A second stage of analyses involved identifying the various functions of all 4-word bundles in the textbook
corpus. Whilst the purpose of functions varies depending on register, Conrad and Biber (2004) identi ed three
types of functions of 4-word bundles: stance expressions, discourse organizers, and referential expressions. Table 2 shows
that the function stance expressions includes bundles that re6ect personal or impersonal attitudes towards an action
or event in a text and is sub-divided into epistemic bundles and attitudinal/modality bundles, a group that is further
divided into desire, obligation/directive, intention/prediction and ability. The function discourse organizers is divided into the
sub-categories topic introduction/focus and topic elaboration/clari?cation bundles. The former introduces new topics or
directs attention toward speci c topics, the latter provides additional information or clari cation to a topic. The
third function includes referential bundles, which indicate speci c features of physical or abstract entities. Referential
bundles are divided into the four sub-categories identi?cation/focus, imprecision, speci?cation of attributes , and multi-
functional bundles. In turn, these sub-categories serve to stress the importance of an object, re6ect imprecision or
uncertainty about an object, focus on selected aspects of an object and may include quantity, physical or abstract
attributes and, the fourth sub-category, to refer to various time-related aspects. Conrad and Biber also identi ed a
speci cally conversational function, which includes categories such as politeness routines, simple inquiry and
reporting clauses. Bundles from this last function appear in their conversation sub-corpus only. A summary of
these functions and their sub-categories is offered in Table 2.
Conrad and Biber’s four functions and their sub-categories served to classify the 4-word bundles from the
textbook corpus. Bundles that did not clearly t into any of these functional categories were placed into a no-
function category for further analysis.
The third stage of analysis entailed the quantitative and qualitative analysis of the usefulness scores and
functions of the extracted 4-word bundles. Each function and its subcategories was allocated the overall
usefulness scores achieved by averaging the usefulness scores of the items under the functions and their
subcategories. The purpose of the analyses was to determine the relationship between the functions of 4-word
bundles and their usefulness and represents the nal stage in the analyses carried out to answer the research
questions. These stages served to answer three speci c research questions:
1. What is the relationship between the 4-word lexical bundles identi ed in the textbook under
investigation and corpus-research ndings in terms their frequency and range?
2. How do the 4-word lexical bundles presented in the textbook re6ect corpus-research ndings in
terms of their functions?
3. What is the relationship between the usefulness and functions of the 4-word lexical bundles in the
textbook?
A key assumption underlying these questions is that textbooks intended for general language purposes re6ect
frequently occurring 4-word lexical bundles in corpora collected from naturally occurring language.

Table 2
Functions of Bundles According to Conrad and Biber (2004)
______________________________________________________________________________
Function Stance expressions Epistemic
___________________________________________
Attitudinal/modality desire;
obligation/directive;
intention/prediction;
ability
______________________________________________________________________________
Discourse organizers Topic
Introduction/focus
______________________________________________________________________________
Referential expressions Identi cation/focus;
imprecision;
speci cation of attributes;
multifunctional;
______________________________________________________________________________
Special conversational Politeness routines;
simple inquiry;
reporting clause.
______________________________________________________________________________
Findings
The ndings from the analyses described above will be presented to respond to each of the three research
questions. The rst question What is the relationship between the 4-word lexical bundles identi?ed in the textbook under
investigation and corpus-research ?ndings in terms their frequency and range? is addressed through the usefulness score. This
score, representing frequency and range, was determined based on information from COCA and BNC and
shows that the 169 4-word bundles vary in usefulness between a high of 93.78 and a low of 0, with an average
usefulness score for all the items of 4.4. Nineteen of the 169 lexical bundles identi ed in the textbook, 11.2% of
the total number of 4-word bundles, reach a usefulness score over 10, as shown in Table 3.
A total of 20 (11.8%) of the 169 4-word bundles in General English have usefulness scores of zero,
indicating that they did not occur in either COCA or BNC, while another 88 (52%) 4-word bundles in General
English have usefulness scores between 0.005 and 0.995. In addition, 13 (7.7%) of the 4-word bundles have a raw
frequency of one to four occurrences in both corpora, or one to four occurrences in one corpus and zero
occurrences in the other. For example, the bundle it is considered bad has zero occurrences in COCA and two in
BNC. The limited number of 4-word bundles with high usefulness scores, the low average usefulness score and
large percentage of items with zero usefulness scores suggest that the 4-word bundles included in the textbook
have comparatively low range and frequency in everyday language as re6ected in COCA and BNC.

Table 3
Four-word Bundles with Usefulness Scores of over 10
Item Frequency per Frequency per Usefulness score
million words in million words in
COCA BNC
the end of the 83.24 104.32 93.78
at the end of 68.87 91.68 80.275
for the ?rst time 63.18 53.4 58.29
on the other hand 48.38 52.62 50.5
one of the most 54.18 40.49 47.335
in the middle of 48.51 28.07 38.29
the middle of the 31.58 22.11 26.845
was one of the 26.36 23.05 24.705
what do you think 30.9 12.4 21.65
the back of the 21.36 20.82 21.09
I’d like to 17.22 15.75 16.485
I was going to 18.39 10.8 14.595
do you want to 14.58 11.25 12.915
in one of the 13.52 12.1 12.81
from time to time 9.28 16.32 12.8
a member of the 23.7 0 11.85
a bit of a 7.63 15.74 11.685
what do you mean 11.8 9.72 10.76
a lot of money 13.02 7.87 10.445
To answer the second question asked in this study, How do the 4-word lexical bundles presented in the textbook reBect
corpus-research ?ndings in terms of their functions?, the 169 4-word bundles identi ed in the textbook examined were
analysed and sorted according to the different functions identi ed in Conrad and Biber (2004). The ndings
show that 55 (32.5%) of the 4-word bundles re6ect identi able functions, of which referential ones were the most
frequent, followed by stance, special conversational and, the least frequent, discourse organizer functions, as
summarized in Table 4.
The most frequent of the functions identi ed were referential expressions with 21 (12%) items, of which 6
(3.6%) items fall into the subcategory of identi cation/focus (e.g., in one of the, one of my best), 12 (7.1%) items
under speci cation of attributes (e.g., a bit of a, as soon as I), 3 (1.8%) items under multi-functional (e.g., the end
of the, and in the end), while no items were identi ed as part of imprecision subcategories. The second most
frequently occurring function in the textbook is that of stance expressions with 16 (9.5%) items, of which 9
(5.3%) are epistemic bundles (e.g., do you know if, I don’t know) and 7 (4.1%) attitudinal bundles (e.g., do you want to, I
was going to). The function for special conversational expressions included 14 (8.3%) items, 2 (1.2%) of which are
part of the subcategory of politeness routines (e.g., no thanks I’m), 12 (7.1%) items of simple inquiries (e.g., can you
tell me), none of reporting clauses. The least frequently identi ed category of functions, discourse organizers,
includes 4 (2.4%) items that belong to the topic elaboration/clari cation (e.g., on the other hand) and none for
introduction/focus subcategory.

Table 4
Summary of functions in General English textbook
Functions Number of Percentage
occurrences
Referential Total 21 12.0
Identi cation/focus 6 3.6
Speci cation of attributes 12 7.1
Multi-functional 3 1.8
Imprecision 0 0
Stance Total 16 9.5
Epistemic 9 5.3
Attitudinal 7 4.1
Special conversational Total 14 8.3
Politeness routines 2 1.2
Simple inquiries 12 7.1
Reporting clause 0 0
Discourse organizer Total 4 2.4
Topic elaboration/clari cation 4 2.4
Topic introduction/focus 0 0
No-function Total 144 66.9
Collocational phrases 14 8.3
Context speci c 28 16.6
No subcategory 71 42.0
The no-function category includes 144 (66.9%) 4-word bundles, a large enough category to suggest further
analysis. This analysis shows that 14 (8.3%) of these bundles belong to the collocational phrases (Conrad & Biber,
2004) subcategory (e.g., had a great time, o’clock in the morning), while an additional 28 (16.6%) no-function bundles
belong to the context speci c subcategory (e.g., lawyer of the defence, the docklands light railway ). The remaining 71
(42%) bundles could not be attributed to any subcategories re6ected in the literature (e.g., and there is a, I usually
have a). Figure 1 summarizes these ndings, re6ecting that all the functions and most of the subcategories
identi ed in Conrad and Biber were also identi ed in the textbook under analysis, but more than half of the 4-
word bundles in the textbook could not be attributed to any of the function categories identi ed.
The third stage of analyses was designed to answer the question What is the relationship between the usefulness and
functions of 4-word lexical bundles in the textbook? The analysis of the 4-word bundles that re6ect one of the functions
shows that their overall usefulness score is 11.9. A breakdown of the different function categories identi ed is
presented in Figure 2.

Figure 1. Overall Summary of Functions Identi ed in Textbook
25
22.7
20
15.4
15 Stance expressions
Referenal expression
Discourse organizers
10 Special conversaonal
No-funcon
6.8
0.4 0.4
0
Functions
Figure 2. Usefulness Score of Functions
Figure 2 shows that the highest usefulness score, 22.7, was achieved by referential expressions, followed by
discourse organizers at 15.4, stance expressions at 6.8 and, special conversational expressions at 0.4. The

usefulness score of the no function expressions attained a usefulness score of 6.8, while the This overall usefulness
score includes the no-function expressions, which was calculated as 0.4.
The usefulness scores for each function’s subcategories were also calculated and are re6ected in Figure 3.
Figure 3. Usefulness Score of Subcategories
It shows that the sub-categories for the referential function achieved usefulness scores of 14.8 for
identi cation/focus, 22.3 for attributes, speci cation of attributes 22.3 40.9 for multi-functional expressions.
Within stance expressions, epistemic expressions reached a usefulness score of 5.9, while attitudinal expressions’
usefulness score reached 4.1. Discourse organizers include items from only the topic elaboration/clari cation
subcategory, which obtained an overall usefulness score of 15.4. Finally, the subcategories for the special
conversational function, politeness routines and simple inquiries, achieved usefulness scores of 0 and 0.5
respectively. The no-function subcategory of collocational phrases received a usefulness score of 1.2, the context
speci c subcategory 0.6 and the uncategorized group 0.6. The above ndings show that lexical bundles with
functions tend to have higher usefulness scores than those without functions. The most useful items identi ed in
the textbook corpus are part of referential expressions, followed by discourse organizers, stance expressions, with
special conversational functions showing the lowest usefulness scores. The most useful subcategory is that of
multi-functional expressions, the least useful the one covering politeness routines.
Discussion
Key ndings from this exploratory study of the usefulness and function of lexical bundles identi ed in a textbook
for general English language learners are discussed in the order of the speci c research questions raised. The rst
research question explored the level of usefulness of the 4-word bundles generated from the textbook. Usefulness
was determined through numeric scores developed in Koprowski (2005), scores comprised of frequency and
range data from COCA and the BNC. The ndings show a comparatively low level of usefulness of the analysed
items, determined by their low frequency of usage in various registers and text types re6ected in corpora from
general language use. The ndings in this study are consistent with the ndings reported in Koprowski and in

Cheng and Warren (2007), whose work also found low-frequency items and inconsistencies between the
vocabulary items included in teaching materials and those found in actual language use re6ected in corpus data.
The ndings also re6ect the observation other studies on teaching materials (Biber & Reppen, 2002; Gabrielatos,
2006; Koprowski, 2005; Lee & McGarrell, 2011; Meunier & Gouverneur, 2009; Shortall, 2007) that the
language presented in these materials do not closely match the language from naturally occurring language
re6ected in corpora. Although the stated purpose of the textbook analysed for the current study is to improve
students’ general English abilities, the ndings suggest that most of the lexical bundles included have highly
limited usage in general communication contexts. This lack of convergence between textbook and corpus
material suggests that the textbook developers may have relied on intuition in the selection of material, as
discussed in Biber and Reppen (2002) and Lee and McGarrell (2011) rather than actual data sources, or that the
selection criteria used were unable to identify material representative of general language use.
The second research question investigated in the current study examined the functions, as de ned by
Conrad and Biber (2004), of the 4-word lexical bundles identi ed in the textbook. The ndings show that over
65% of the lexical bundles do not fall within any identi able function. This may, in part, be due to the low
frequency of lexical bundles with at least three occurrences in the textbook, suggesting that the textbook lacks the
kind of repetition typically needed for language development. The most frequently identi ed function of the
textbook bundles was referential, followed by stance, special conversational and, least frequent, discourse
organizing. The nding that referential bundles are the most frequently occurring in the textbook under
discussion re6ects an academic in6uence in its language focus, as shown in previous research. Wood (2013), in an
analysis of business and engineering textbooks, showed that the referential function was the most frequently
occurring in those academically oriented textbooks. Wood’s study shows discourse organizing and stance bundles
as the second, respectively third most frequently used functions. Similarly, Conrad and Biber showed that
referential bundles are more common in academic prose compared to other language uses. The objectives of the
textbook and the focus on academic language again suggests a misalignment between the two. The second most
frequent function of lexical bundles in the current study, stance, shows that the textbook also focuses on
conversation and speaking registers, but to a lesser extent. These functions are associated with more informal
language use, as indicated in Conrad and Biber. The latter’s investigation of bundles from conversation and
academic prose discovered that stance and special conversational bundles are more frequent in conversation. In
light of these ndings, the textbook under discussion thus presents a mix of academic and conversational
registers. This mix, combined with the relatively low recurrence of bundles, may prevent learners from
encountering relevant functions in suf cient numbers for each register to internalise them successfully. In turn,
this may impede register appropriate production as the information available to them lacks clear distinctions of
function use in different registers.
The third research question investigated the relationship between usefulness and functions of the 4-word
bundles identi ed in the textbook. To address this question, each function and its subcategories were given an
overall usefulness score by averaging the usefulness scores of the items under the functions and their
subcategories. The ndings show that the lexical bundles with functions have a higher usefulness score compared
to those that cannot be attributed to any function. This nding is linked directly to past studies that have stressed
the importance of referring to frequency information on actual language use in teaching and materials
development (Biber & Reppen, 2002; Cheng & Warren, 2007; Koprowski, 2005, Lee & McGarrell, 2011). A
detailed analysis also shows that the referential function, which is typically associated with more formal and
academic language, has the highest usefulness score in the textbook under discussion. In addition, the second
stage of the analysis shows that referential bundles are also the most common type of bundles identi ed in the
current study. They include the items with the highest usefulness scores, such as the end of the (93.7), at the end of
(80.2), for the ?rst time (58.2). This nding suggests that the inclusion of referential bundles in teaching syllabi and
textbooks working on academic registers may be particularly valuable in support of language learners’ ability to
acquire native-like multiword expressions. The second most useful type of lexical bundle belongs to the discourse
organizing function. For example, the bundle on the other hand (50.5) was also shown as frequently occurring in

academic prose in Conrad and Biber (2004). Although the discourse organizing function is considered the least
common type of function, its high usefulness score suggests that it is a valuable item for inclusion in teaching
materials. The stance function, with noticeably lower scores, is third in terms of usefulness in the current study.
As the detailed presentation in the results section shows, the stance bundles in the textbook have low usefulness
scores, with a few exceptions such as what do you think (21.6). A careful analysis of corpus data may help materials
developers identify selections that are useful in terms of broad actual language use. The fourth function, the
special conversational function, has the lowest usefulness score, even though it was found to be more frequent
than the discourse organizing function. One explanation for this may be the range criteria imposed in
determining usefulness scores, i.e., the criteria that ensures that lexical items learned are useful in varied contexts.
As the special conversational function is expected to be used in the conversational register only, the range of
bundles in this category are, by de nition, limited. Usefulness scores are comprised of both frequency and range
data, thus such types of bundles will not yield high scores, even if the bundles are frequent in their register. A
textbook concentrating on conversational English might reasonably be expected to have many lexical bundles
that fall within the conversational function. Again, careful matching of corpus data in light of the purposes of a
given textbook would seem to be a key objective for materials designers. Finally, the lexical bundles with the
lowest usefulness scores, even though they account for over half the bundles identi ed in the textbook under
discussion, were those that fell within the no-function category. One potential explanation for the large number
of no-function bundles may be the subject matter around which the textbook presents language items, subject
matter that may be guided more by introspection and intuition rather than an analysis of general language needs
in relation to data of actual language use re6ected in corpus materials. Koprowski (2005:328) noticed a similar
outcome in his analysis of the usefulness of lexical phrases in contemporary textbooks and attributes such low
usefulness to “an unprincipled and careless selection process” by textbook developers. A selection process that,
Koprowski adds, is likely focussed around the selection of themes and topics rather than the usefulness of lexical
phrases.
Summary and Conclusions

The current study draws on a corpus created from a widely used textbook intended for learners of general
English and function analysis to determine the appropriateness, de ned as usefulness and functions, of the lexical
bundles presented in the text. The three speci c research questions examined whether the 4-word lexical bundles
included in the textbook re6ect frequency, range and functions of the data re6ected in corpora of language in
use. The rst question explored whether the 4-word bundles in the text were of a frequency and range, according
to corpus data, to suggest that they would be useful in general language use. The usefulness score (Koprowski,
2005) assigned to each bundle re6ects both frequency and range based on COCA and the BNC. The ndings
show a comparatively low level of usefulness of the analysed items and re6ect similar ndings in Conrad and
Biber (2004) and Korpowski. Of 169 lexical bundles identi ed, only 19 reached a useful score of over 10, with
just four of these lexical bundles scoring over 50. To probe the usefulness of the 4-word lexical bundles further,
the second research question examined the functions of these bundles. The analysis shows that most of the 169
lexical bundles in the textbook do not have identi able functions and that such no-function bundles have low
usefulness scores. By contrast, 4-word bundles with speci c functions had relatively high usefulness scores, which
suggests that language learning materials developers and teachers might usefully focus on including function
bundles when selecting language items. Jones and Haywood (2004) and Byrd and Coxhead (2010), for example,
suggest using a list of frequent lexical bundles identi ed for a speci c discipline or register and needs analyses to
meet the needs of language learners. Depending on such needs, less frequent 4-word bundles might nevertheless
be highly relevant to learners and require teachability/learnability strategies for teachers and their learners
(Nation, 2001).
This study provides insight into the 4-word lexical bundles included in one speci c textbook in relation to
their occurrence in corpora. Whilst its ndings cannot be generalized, they raise several questions about the

identi cation and selection of lexical items for a textbook. As Timmis (2013) points out, corpora are less
appropriate as arbiters of what to teach and how to teach, but they are valuable in re6ecting details about the
nature of language and language use. In the case of general English, corpora suggest a considerable difference
between corpora and the lexical bundles and functions presented in the textbook. A broader question is the
relationship between corpora, which typically include long passages of texts, with typical textbooks and their
short, often unconnected texts representing different genres. This question has been addressed in part by studies
that highlight differences in the use of lexical bundles depending on genre (e.g., Biber, 2006; Hyland, 2008). In
the case of a textbook, one question is whether such learning and teaching materials might be developed to
include relevant, engaging topics that serve to illustrate language that is truly general and widely used. The
inclusion of frequently recurring lexical bundles is particularly important as research shows that even advanced
learners of ESL have dif culties in producing texts that re6ect native speaker usage (Grami & Alkazemi, 2016).
Yet pedagogical materials rarely include activities or instructions on which words go together (Alali & Schmitt,
2012). Increased attention to careful selection of lexical strings that re6ect actual language use re6ected in
relevant corpora can only support the challenging task of developing vocabulary skills, which include appropriate
use of lexical bundles. Combined with explorations into ways in which the acquisition of lexical strings might be
facilitated in ESL classes, as illustrated in Jones and Haywood (2004) and more recently AlHassan (2016) and
AlHassan & Wood (2016), promise to further support the very challenging task of supporting vocabulary
development in subsequent language learning.
References
Alali, F., & Schmitt, N. (2012). Teaching formulaic sequences: The same or different from teaching single words?
TESOL Journal, 3(2), 153–180.
AlHassan, L. (2016). Learning all the parts of the puzzle: Focused instruction of formulaic sequences through the
lens of activity theory. In H.M. McGarrell & D. Wood (eds.). Contact - Refereed Proceedings of TESL Ontario
Research Symposium, 42(2), 44-65. Available at: http://www.teslontario.net/publication/research-
symposium
AlHassan, L. & Wood, D. (2015). The effectiveness of focused instruction of formulaic sequences in augmenting
L2 learners' academic writing skills: A quantitative research study. Journal of English for Academic Purposes,
17, 51-62.
Appel, R. (2016). Lexical bundles in L2 English academic writing: Pro ciency level differences. In H.M.
McGarrell & D. Wood (eds.). Contact - Refereed Proceedings of TESL Ontario Research Symposium, 42(2), 66-81.
Available at: http://www.teslontario.net/publication/research-symposium
Ari, O. (2006). Review of three software programs designed to identify lexical bundles. Language Learning &
Technology, 10(1), 30-37.
Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam: Benjamin.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English.
London, UK: Longman.
Biber, D., & Reppen, R. (2002). What does frequency have to do with grammar teaching? Studies in Second
Language Acquisition, 24, 199–208.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks.
Applied Linguistics, 2(3), 371-405.
Burton, G. (2012). Corpora and coursebooks: destined to be strangers forever? Corpora, 7(1), 91-108.
Byrd, P., & Coxhead, A. (2010). On the other hand: Lexical bundles in academic writing and in the teaching of
EAP. University of Sydney Papers in TESOL, 5, 31-64. Available at:
http://faculty.edfac.usyd.edu.au/projects/usp_in_tesol/pdf/volume05/Article02.pdf
Carter, R., & McCarthy, M. (1995). Grammar and the spoken language. Applied Linguistics, 16(2), 141-158.

Cheng, W., & Warren, M. (2007). Checking understandings: Comparing textbooks and a corpus of spoken
English in Hong Kong. Language Awareness, 16(3), 190-207.
Conrad, S., & Biber, D. (2004). The frequency and use of lexical bundles in conversation and academic prose.
Lexicographica, 20, 56-71.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and
biology. English for Speci?c Purposes, 23, 397-423.
Csomay, E. (2013). Lexical bundles in discourse structure: A corpus-based study of classroom discourse. Applied
Linguistics, 34(3), 369-388.
Davies, M. (2008-) The Corpus of Contemporary American English: 520 million words, 1990-present. Available online at
http://corpus.byu.edu/
Fletcher, W. (2012). kfNgram: Information and help. Available at:
http://www.kwic nder.com/kfNgram/kfNgramHelp.html
Gabrielatos, C. (2006). Corpus-based evaluation of pedagogical materials: If-conditionals in ELT coursebooks
and the BNC. In: 7th Teaching and Language Corpora Conference. Available online at
http://eprints.lancs.ac.uk/882/
Grami, G., & Alkazemi, B.Y. (2016). Improving ESL writing using an online formulaic sequence word-
combination checker. Journal of Computer Assisted Learning, 82, 95–104.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Speci?c Purposes, 27, 4–21.
Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sentences: An exploratory study in an
EAP context. In N. Schmitt (Ed.), Formulaic sequences (pp. 269-291). Amsterdam, Netherlands: John
Benjamins.
Koprowski, M. (2005). Investigating the usefulness of lexical phrases in contemporary coursebooks. ELT Journal,
59, 322–332.
Latham-Koenig, C., & Oxenden, C. (2013). English ?le: Intermediate student’s book. Oxford, UK: Oxford University
Press.
Lee, D., & McGarrell, H. (2011). Corpus-based/corpus-informed English language learner grammar textbooks:
An example of how research informs pedagogy. In H.M. McGarrell & D. Wood (Eds.). Contact - Refereed
Proceedings of TESL Ontario Research Symposium, 37(2), 78–100. Available at:
http://www.teslontario.net/publication/research-symposium
McCarten, J. (2010). Corpus-informed course book design. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge
Handbook of Corpus Linguistics (pp. 413–427). London, UK: Routledge.
McCarthy, M. (2008). Accessing and interpreting corpus information in the teacher education context. Language
Teaching, 41(4), 563-574.
McDonough, J., Shaw, C., & Mashuhara, H. (2012). Materials and methods in ELT : a teacher's guide. Malden, MA :
Blackwell.
Meunier, F., & Gouverneur C. (2009). New types of corpora for new educational challenges: Collecting,
annotating and exploiting a corpus of textbook material. In K. Aijmer (Ed.), Corpora and Language
Teaching, (pp. 179-201). Amsterdam & Philadelphia: John Benjamins.
Nation, P. (2001). Learning vocabulary in another language. Cambridge, UK: Cambridge University Press.
Neary-Sundquist, C.A. (2015). Aspects of Vocabulary Knowledge in German Textbooks. Foreign Language Annals,
48(1), 68–81.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action: An introduction. In N. Schmitt (Ed.), Formulaic
sequences: Acquisition, processing and use (pp. 1–22). Amsterdam, Netherlands: John Benjamins.
Shortall, T. (2007). The L2 syllabus: Corpus or contrivance? Corpora, 2(2), 157-185.
Timmis, I. (2013). Corpora and materials: Towards a working relationship. In B. Tomlinson (Ed.), Developing
materials for language teaching (2nd ed.) (pp. 461-474). London, UK: Bloomsbury Academic.

Wood, D., & Appel, R. (2013). Formulaic sequences in rst year university business and engineering textbooks: A
resource for EAP. In H.M. McGarrell & D. Wood (eds.). Contact - Refereed Proceedings of TESL Ontario Research
Symposium, 39(2), 92-102. Available at: http://www.teslontario.net/publication/research-symposium
Wood, D. (2010). Formulaic language and second language speech Buency: Background, evidence and classroom applications.
Bloomsbury Publishing.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge, UK: Cambridge University Press.
About the Authors

Hedy McGarrell is a faculty member in Applied Linguistics at Brock University where she teaches undergraduate
and graduate courses.
Nga Tuiet Zyong Nguien is an ESL teacher with a special interest in corpus applications to ESL teaching. She has an
MA Applied Linguistics/TESL from Brock University.

A Corpus Comparison Approach for Estimating the Vocabulary

Load of Medical Textbooks Using The GSL, AWL, and EAP
Science Lists
Betsy Quero*
Victoria University of Wellington, New Zealand
Abstract
The main goal of this study is to report on the number of words (vocabulary load) native and non-native readers of medical
textbooks written in English need to know in order to be able to meet the lexical demands of this type of subject-speci!c
(medical) texts. For estimating the vocabulary load of medical textbooks, a corpus comparison approach and some existing
word lists, popular in ESP and EAP, were used. The present investigation aims to answer the following questions: (1) How
many words are needed beyond the General Service List (GSL; West, 1953), the Academic Word List (AWL; Coxhead,
2000), and the EAP Science List (Coxhead and Hirsh, 2007) to achieve a good lexical text coverage? and (2) What is the
vocabulary load of medical textbooks written in English? The implementation of this corpus comparison approach
consisted of: (1) making a written medical corpus of 5.4 million tokens, (2) compiling a general written corpus of the same
size (5.4 million tokens), (3) running both corpora (i.e., the medical and general) through some existing word lists (i.e., the
GSL, the AWL, and the EAP Science List), and (4) creating new subject-speci!c (medical) word lists beyond the existing
word lists used. The system for identifying medical words was based on Chung and Nation’s (2003) criteria for classifying
specialised vocabulary. The results of this investigation showed that there is a large number of subject-speci!c (medical)
words in medical textbooks. For both native and non-native speakers of English training to be health professionals, this
!gure represents an enormous amount of vocabulary learning. This paper concludes by considering the value of creating
specialised medical word lists for research, teaching and testing purposes.
Key words: medical word lists, vocabulary load, English for medical purposes, text coverage.
Introduction
One of the main purposes of this study is to propose a methodology for the creation of subject-speci!c word lists
(i.e., medical word lists) that include the most salient vocabulary in medical texts. After doing a review of the
previous studies on the vocabulary load of medical textbooks, explaining the methodology and presenting the
subject speci!c lists of the most relevant words in medical texts, the results of this investigation attempt to : (1)
identify the lexical demands of medical texts using a corpus comparison approach, and (2) provide guidelines for
the creation of medical word lists organised by levels of frequency and salience.
Vocabulary Load
The number of known words (vocabulary load) needed for unassisted reading comprehension has been
investigated by several vocabulary researchers (Hirsh & Nation, 1992; Hu & Nation, 2000; Laufer, 1989; Nation,
2006). The !rst investigations (Laufer, 1989, 1992) on the vocabulary load of academic texts suggested a reading
* Tel: + 64 2102387831; E-mail: betsy.quero@vuw.ac.nz; PO Box 14416 Kilbirnie, Wellington 6241, New Zealand

comprehension threshold of 95% text coverage. More recent research on the vocabulary load of written texts
(Hu and Nation 2000; Laufer and Ravenhorst-Kalovski 2010; Nation 2006; Schmitt, Jiang, and Grabe 2011) has
indicated that a higher lexical threshold of 98% text coverage or more is required for optimal unassisted reading
comprehension. In the present study, we explore the number of words required to be known to achieve a 98%
text coverage, and refer to 98% as an optimal lexical threshold.
Levels of Vocabulary
In order to estimate the number of words (vocabulary load) that learners of English for Medical Purposes (EMP)
need to know in order to be able to meet the vocabulary demands of medical texts written in English and achieve
a suitable reading comprehension threshold (i.e., between 95% and 98% text coverage); the various levels of
vocabulary proposed by Schmitt and Schmitt (2012) and Nation (2001, 2013) will be identi!ed in the corpus of
medical textbooks compiled for this study. Frequency (high-frequency, mid-frequency, and low-frequency words),
and text type (i.e., general, academic, scienti!c, technical or specialised) are the two main criteria currently used
to classify the vocabulary of academic and specialised texts.
Schmitt and Schmitt’s (2012) classi!cation of the levels of vocabulary is a frequency-based one, and
consists of the following three bands or levels: high-frequency, mid-frequency, and low-frequency words. The
high-frequency level includes the !rst 3,000 most frequent words in a language. The mid-frequency level refers to
those words between the 4,000 and the 9,000 frequency levels. The low-frequency level comprises those words
beyond the 9,000 frequency band. The concept of mid-frequency vocabulary was !rst introduced in Schmitt and
Schmitt’s (2012) classi!cation. The introduction of this frequency level has served to stress the importance of
mid-frequency vocabulary and of words beyond the 3,000 most frequent words of the English language.
Nation’s (2013) classi!cation, which was initially presented in 2001 and then revised in 2013, is both a
frequency and text-type based classi!cation. Nation’s (2001) frequency levels included two frequency bands (i.e.,
high-frequency vocabulary and low-frequency vocabulary) and two kinds of text type words (academic
vocabulary and technical vocabulary). In 2013 Nation added to his classi!cation of vocabulary levels the mid-
frequency band proposed by Schmitt and Schmitt in 2012. According to Nation (2013), there are three levels of
frequency based words, that is, high-frequency words, mid-frequency words and low-frequency words, and two
levels of text-type words (academic words and technical words) which are particularly likely to occur in academic
and specialised texts. Both the frequency and text-type based aspects of Nation’s (2013) classi!cation are analysed
and discussed in the !ndings and discussion sections of this study.
Word Lists in EAP and ESP

High-frequency general, academic and specialised word lists have been used in English for Academic Purposes
(EAP) and English for Speci!c Purposes (ESP) by language teachers, students, researchers, test designers, and
course material developers. To the best of our knowledge, the most extensively used and discussed high-
frequency general academic word lists in EAP and ESP have been West’s (1953) General Service List (GSL) and
Coxhead’s (2000) Academic Word List (AWL). More recently, Coxhead and Hirsh (2007) developed an EAP
Science List that was created excluding words in the GSL and the AWL.
West’s (1953) General Service List (GSL) is a high-frequency list of English words that contains roughly
2,000 words (i.e., GSL1 with the !rst 1,000 and GSL2 with the second 1,000 most frequent word families) which
are very common in all uses of the language. For more than 60 years, the GSL has been the most widely used
high-frequency word list for language curriculum planning, materials development, and vocabulary instruction.
The GSL has been criticised for its age (Hyland & Tse, 2007; Read, 2000, 2007), for its size (Engels, 1968), and
for its lack of suitability to the vocabulary needs of ESP learners at tertiary level (Ward, 1999, 2009). For
decades, vocabulary researchers constantly stated that the GSL was in need of revision (Coxhead, 2000; Hwang
& Nation, 1989; Wang & Nation, 2004); however, it was not until its 60 th anniversary that two new general
vocabulary lists (Brezina & Gablasova, 2013; Browne, 2013) were created. Despite the criticism West’s (1953)

GSL has received over the years, this is the general word list used in this study to replicate the corpus comparison
approach. The GSL is used in this investigation in order to: (1) serve as a starting point when estimating the
vocabulary load of medical texts, and (2) allow comparisons with previous studies in ESP that have also used the
GSL to look at the number of words in the health and medical sciences.
The other existing word list used in the present study is Coxhead’s (2000) Academic Word List (AWL). The
AWL works in conjunction with the GSL. That is, it includes words that do not occur in the GSL. Up to the
present, the AWL has been extensively used to learn, teach, and research academic vocabulary. To make the
AWL, Coxhead (2000) gathered a corpus of 3,513,330 tokens. This corpus was comprised of a variety of
academic texts from 28 academic subject areas, seven of which were grouped into one of the following four
disciplines: Arts, Commerce, Law, and Science. The AWL contains 570 word families and provides around a
10% text coverage for academic texts. For validating the AWL, Coxhead (2000) created a second academic
corpus (comprising 678,000 tokens) which accounted for 8.5% coverage.
Two new academic word lists have been recently developed: (1) The New Academic Word List (NAWL)
created by Browne, Culligan, and Phillips in 2013 and available at http://www.newacademicwordlist.org/, and
(2) The New Academic Vocabulary List (AVL) created by Gardner and Davies (2014) and available at
http://www.academicvocabulary.info/download.asp. Both the NAWL and the AVL were developed from large
academic corpora of 288 and 120 million tokens, respectively. Despite the current availability of these more
recently developed academic word lists (i.e., the NAWL and the AVL), the decision to use Coxhead’s (2000) AWL
for the present study is based on the fact that for more than a decade the AWL has been widely researched and
used by ESP researchers to calculate the lexical demands posed by written academic texts.
Drawing on some aspects of the methodology used by Coxhead (2000) to create the AWL, various subject-
speci!c word lists have been developed: an EAP Science Word List (Coxhead & Hirsh, 2007), three medical
academic word lists (Chen & Ge, 2007; Lei & Liu, 2016; Wang, Liang, & Ge, 2008), a nursing word list (Yang,
2015) a pharmacology word list (Fraser, 2007), some engineering word lists (Mudraya, 2006; Ward, 1999, 2009),
a business word list (Konstantakis, 2007), and an agricultural word list (Martínez, Beck, & Panza, 2009). While
some of these subject-speci!c lists have been developed to work in conjunction the GSL (e.g., Yang’s (2015)
Nursing Word List, and Wang, Liang & Ge’s (2008) Medical Academic Word List), other word lists have been
created to work in conjunction with both the GSL and AWL (e.g., Coxhead and Hirsh’s (2007) EAP Science List,
and Fraser’s (2007) Pharmacology Word List).
Coxhead and Hirsh’s (2007) EAP Science List is another existing word list used in the present study to
estimate the vocabulary load of medical textbooks. Coxhead and Hirsh’s (2007) study aims at creating a science
word list that could help increase the lower coverage of the AWL over science texts (Coxhead, 2000). Criteria of
range, frequency of occurrence, and dispersion were considered for selecting the words to be added to the EAP
Science List. This list is based on a written science corpus of English comprising a total of 2,637,226 tokens. As
Coxhead and Hirsh (2007, p. 72) reported, the 318 word families in the EAP Science List cover 3.79% over the
science corpus compiled to create this list. Moreover, the EAP Science list covers 0.61% over the Arts subcorpus,
0.54% over the Commerce subcorpus, 0.34% over the Law subcorpus, and 0.27% over the !ction corpus
compiled by Coxhead (2000). The above mentioned coverage results con!rm the scienti!c nature of the EAP
Science List. Coxhead and Hirsh’s (2007) study also attempts to draw a line between the percentage of general
vocabulary versus the percentage of science-speci!c vocabulary in science texts written in English that EAP
students are required to read at university. In addition to the GSL and the AWL, Coxhead and Hirsh’s (2007)
EAP Science List is used in the present investigation when adopting the corpus comparison approach to estimate
the vocabulary load of medical textbooks.
Since the present study focuses on investigating the vocabulary load of the most commonly used existing
general, academic and scienti!c word lists, these lists are used as the starting point to estimate the lexical
coverage of medical texts. By choosing a set of commonly used general/academic/scienti!c word lists, this study
tries to focus on general/academic/scienti!c vocabulary that has extensively been presented in EAP and ESP
teaching materials, assessments, and research. However, this investigation by no means attempts to undermine
the value of more recently created general (i.e., the two NGSLs) and academic (i.e., the NAWL and the AVL)

word lists. Also, to the best of our knowledge, no study has so far estimated the vocabulary load of medical
textbooks having as a starting point for this quanti!cation this set of widely used word lists (i.e., the GSL, the
AWL, and the EAP Science List) in EAP and ESP.
Moreover, existing pedagogical vocabulary lists of general high-frequency words (West’s GSL) and
academic words (Coxhead’s AWL), and scienti!c words (Coxhead and Hirsh’s EAP Science List) cannot provide
a complete coverage of the kinds of vocabulary in subject-speci!c texts. This happens particularly because the
GSL, the AWL and the EAP Science List were not designed to identify all the different kinds of vocabulary of
specialised texts. For this reason, a more inclusive approach to identify the various levels of vocabulary that occur
in medical texts could provide a clearer picture of the vocabulary demands of medical textbooks.
Research Questions
The present investigation looks at the vocabulary load of medical texts and explores the role played by the levels
of vocabulary proposed by Nation (2013) and Schmitt and Schmitt (2012). In particular, the three frequency-
based levels of vocabulary (high, mid, and low-frequency words) and four topic-based word lists (the GSL, the
AWL, the EAP Science List, and some specialised medical lists) that draw on words from these three frequency
levels were used in the analyses of the lexical frequency pro!les of medical texts here investigated. With the main
goal of estimating the vocabulary load of medical textbooks in mind, the !ndings of this study provide answers
to the following research questions:
1) How many words are needed beyond the General Service List (GSL; West, 1953), the Academic Word
List (AWL; Coxhead, 2000), and the EAP Science List (Coxhead and Hirsh, 2007) to achieve a good
lexical text coverage?
2) What is the vocabulary load of medical textbooks written in English?
Methodology
The methodology used to estimate the number of words (vocabulary load) associated with the various levels of
vocabulary found in a corpus of medical textbooks is discussed in this section. The implementation of this
methodology involves compiling the medical and general corpora, adopting a corpus comparison approach,
adapting a semantic rating scale, creating a series of medical word lists, and justifying the unit of counting
selected for the present study.
Compiling the Corpora

The estimation of the vocabulary load of medical textbooks using a corpus comparison approach required the
use of two different corpora: a specialised (medical) corpus and a general corpus. For the medical corpus, two
widely consulted handbooks of general medicine were selected (i.e., Harrison’s Principles of Internal Medicine
by Fauci et al., 2008, and Cecil Textbook of Internal Medicine by Goldman & Ausiello, 2008) . These two
medical textbooks include a comprehensive range of medical topics, and are commonly consulted by both
medical students (from the !rst year of medical studies) and health professionals. In relation to the general corpus
created to serve as a general comparison corpus for this study, it was compiled using most sections of seven
general English corpora, namely the FLOB corpus (British English 1999), FROWN corpus (American English
1992), KOLHAPUR corpus (Indian English 1978), LOB corpus (British English 1961), WWC corpus (New
Zealand English 1993), BROWN corpus (American English 1961), and ACE corpus (Australian English 1986).
Only section J (i.e., the learned section) was removed from all the general corpora used before compiling them.
Both the medical and general corpora are the same size (5,431,740 tokens each) so that distortion from adjusting
for various corpus sizes could be avoided when using the corpus comparison approach.

Adopting a Corpus Comparison Approach

The use of the corpus-comparison approach involved largely following Chung and Nation’s (2003) procedure to
!nd potential technical vocabulary. Corpus-comparison entails the use of two different corpora: a non-technical
corpus and a technical corpus to compare word frequencies. Moreover, the corpus comparison procedure
involves comparing word frequencies in two corpora and choosing words that are much more frequent in the
technical corpus than in a non-medical comparison corpus, or that are unique to the technical corpus, as
potential technical words. Also, words occurring only in the technical corpus or with a higher frequency in the
technical corpus are more likely to be technical words. Range (Heatley, Nation, & Coxhead, 2002) was the
software used to carry out the frequency comparison of the medical and the general corpora.
Adapting a Semantic Rating Scale

As part of the procedures followed in this study to classify medical words and estimate the lexical demands of
medical textbooks, Chung and Nation’s (2003, 2004) methodology for identifying content area (technical) words
was used. Their methodology is twofold, involving the use of a semantic rating scale, and a corpus comparison
approach. Here we propose a semantic rating scale for classifying words related to health and medicine. This
rating scale approach for identifying medical words will be combined with a corpus-based approach for looking
at technical words in medical texts. These potential technical words need ed to be checked systematically to
decide if they were truly technical words. This required the development of a checking system.
Only a yes/no decision-making procedure was required to decide whether words were to be considered as
medical or not, therefore the classi!cation needed to use a semantic rating scale with two levels, namely, general
purpose vocabulary versus content area (technical) vocabulary. To guide this decision-making, four sub-levels of
medical words were used to ensure consistency. The starting point for the system was Chung and Nation’s (2003)
rating scale which was originally designed to identify the specialised vocabulary used in anatomy and applied
linguistic texts. The adaptation of Chung and Nation’s (2003) rating scale for this study consisted of grouping the
four levels of their semantic rating scale into two main levels to classify vocabulary into (1) general purpose
vocabulary and (2) content area (technical) vocabulary. Meaning is the main feature used to classify the
vocabulary in medical texts according to the rating scale developed for the present study. The primary purpose
of the semantic rating scale is to draw the line between (1) general purpose vocabulary, and (2) content area
(technical) vocabulary in medical texts written in English. The four sub-levels of content area (medical)
vocabulary for the present study are as follows:
• Sub-level 1: Some topic-related words are also general purpose words used in the medical !eld with the
same meaning they most frequently have in other general !elds and everyday usage. Examples are words such as
nurse, doctor, child, medicine, blood, pain, health.
• Sub-level 2: Some topic-related words are general purpose vocabulary used in the medical !eld, but with
a particular meaning not so frequently encountered in general !elds and everyday usage. Examples are words
such as transcription, pressure, antagonists.
• Sub-level 3: Some topic-related words are associated with more than one particular specialised subject
area with the same meaning. An expert in this particular !eld where these words come from would identify these
words as words speci!c to their discipline. Examples are words such as nitrogen, ethanol, 8uorine from Chemistry;
and species, organisms, nature from Biology. These words are also used to talk and write about health and medicine.
• Sub-level 4: Some topic-related words are unique to the medical !eld, and they are only associated with
highly specialised medical topics. These medical words have a subject-speci!c meaning, and are very unlikely to
be found in other disciplines. That is, they will only or almost exclusively be used within the medical !eld. An
expert in the medical and health sciences can identify them as technical or scienti!c words speci!c to the subject
area. Examples of highly technical words in the medical !eld are schistosomiasis, polycythemia, dermatomyositis,
enteropathy and hemochromatosis. These highly specialised medical words are most likely to be only known by
specialists in the medical and health sciences.
These criteria were also used to decide whether the words from the GSL, AWL, EAP Science List and
medical word lists were general or medical words. Manual checking was used to identify the words found by

corpus comparison. General words referring to abbreviations, living organisms, parts of the body, participants in
the health and medical community were classi!ed as medical words. The manual checking of all the word types
(including content words, abbreviations, acronyms and proper nouns) classi!ed using the semantic rating scale
involved: (1) looking up word types with unclear medical meaning in a specialised medical dictionary and (2)
con!rming the medical senses of these in their actual context of occurrence in the medical corpus.
Developing New Medical Word Lists Through Corpus-Comparison

The General Service List (West, 1953), the Academic Word List (Coxhead, 2000), and the EAP Science List
(Coxhead & Hirsh, 2007) were used because the words in these lists are assumed to already be known by !rst and
second year medical students. The words not found in any of the lists were organised and classi!ed following two
different procedures to make medical word lists: one with the words occurring in both corpora using frequency
comparison, and one with the words occurring only in the medical corpus. That is, the creation of the medical
word types was done by: (1) choosing the most frequent 3,000 medical word types occurring in both the medical
and general corpora and ranking !rst the word types with higher relative frequency (medical frequency of each
word type divided by general frequency of the same word type), and then (2) selecting 23,000 unique medical
word types and ranking the word types by their absolute frequency of occurrence in the medical corpus. These
26 new medical word lists included only word types that have been previously classi!ed as medical words using
the yes/no decision-making procedure developed using the semantic rating scale previously mentioned. We
decided it would be better to keep the medical words occurring in both the medical and general corpora, and the
medical words unique to the medical corpus in separate word lists. The rationale behind this decision is twofold:
(1) ranking the two kinds of word lists separately provides better coverage with a smaller amount of word types
than ranking them together, and (2) these two kinds of medical words may involve different learning procedures.
Selecting the Unit of Counting

The decision about which unit of counting to use (word types, lemmas or families) depends on the goals of the
study and beliefs about relationships between lemma and word family members (see Nation, 2016 for further
discussion on units of counting). The word type is the unit of counting selected in the present study to create new
medical word lists, discuss the !ndings and estimate the vocabulary load of medical textbooks. The decision to
use the word type as the unit of counting for this study was made because even though the GSL, AWL and EAP
Science List word family members share the same core meaning, some word family members belong to different
word classes and only one word type has a technical meaning. This happens because these lists (i.e., the GSL,
AWL and EAP Science List) were made without grouping the word family members into word classes and taking
meaning into consideration. Examples of word types belonging to different word classes and having different
meanings in general and medical English are words such as culture, patient, and radical.
Results
How many words are needed beyond the General Service List (GSL; West, 1953), the Academic
Word List (AWL; Coxhead, 2000), and the EAP Science List (Coxhead and Hirsh, 2007) to achieve
a good lexical text coverage?
This question is answered by presenting the cumulative text coverage results of running three sets of word lists:
(Set 1) the GSL, AWL, EAP Science List, (Set 2) the three 1000 MGEN lists, and (Set 3) the twenty-three MED
lists through the medical corpus using the Range software (Heatley et al., 2002). First, the cumulative coverage
of the GSL1 and GSL2, the AWL and the EAP Science List, and the words outside these lists is presented in
Table 1. Then, the cumulative text coverage of these three sets of word lists is summarised in Table 2.
Table 1 suggests that a 22.12% of the words outside the lists (i.e., the GSL1 and GSL2, AWL, and EAP
Science List) is still needed to achieve an optimal lexical threshold of 98% (i.e., 75.88% coverage of word types
in the lists plus 22.12% coverage of word types outside the lists). In order to !nd out how many more word types
are required beyond the four existing word lists summarised in Table 1, we applied the semantic rating scale

described in the methodology section of the present study. This rating scale served as a semantic checking system
to classify over 30,000 medical word types (see Quero, 2015) occurring in the medical corpus and create the 26
medical word lists whose text coverage results are summarised in Table 2.
Table 1
Cumulative Coverage of the GSL1 and GSL2, the AWL and the EAP Science List over the Medical Corpus including the Words
outside the Lists
Word List Coverage % Number of Word Types
GSL1, GSL2, AWL, EAP Science List 75.88 9,412
Words outside the lists 24.12 45,942
Total 100.00 55,354
Table 2
Cumulative Coverage of the GSL, the AWL, the EAP Science List, the three 1,000 MEDGEN Lists, and the Twenty-three 1,000
MED Lists
Word List Number of Tokens Coverage % Number of Word Types
GSL1, GSL2, AWL, EAP Science List 4,121,539 75.88 9,412
MGEN (three 1,000) lists 607,498 11.18 3,000
MED (twenty-three 1,000) lists 542,747 10.00 23,000
Cumulative total of existing lists 5,271,784 97.06 35,414
Note in Table 2 that the cumulative text coverage of the GSL1, GSL2, AWL and EAP Science List
(75.88%) indicates that an additional 21.18% coverage is required to achieve a 97.06% text coverage. Moreover,
the results in Table 2 show that 26,000 new medical word types (i.e., 3,000 medical word types in the MGEN
lists, and 23,000 medical word types in the MED lists) need to be added to the GSL, AWL, and EAP Science List
for readers of medical texts to be able to understand 97.06% of the words they meet when they read medical
textbooks in English.
What is the vocabulary load of medical textbooks written in English?

The answer to this question is approached by looking at the behaviour of the three sets of word lists above
mentioned, namely, the GSL, the AWL, and the EAP Science list (set 1), the three 1,000 MGEN word lists (set 2),
and the twenty-three 1,000 MED lists (set 3).
We start by looking at the text coverage results of the existing lists (i.e., the GSL, AWL, and EAP Science
List). Then, we present the text coverage of the twenty-six 1,000 new medical word lists (i.e., the MGEN and
MED lists) created for this study. In this section, the results of the coverage and frequency of occurrence of the
word types across the GSL, AWL, EAP Science List, MGEN lists, MED lists and the words outside these lists are
summarised in Tables 3, 4, 5 and 6.
As shown in Table 3, the GSL accounts for 55.62% of the medical corpus. Because the GSL1 includes the
most frequent words of the English language and comprises the highest text coverage of the tokens in medical
texts, this is a list of words worth learning for students of medical English. Regarding the lexical coverage of the
GSL2, this list accounts for 5.97% of the tokens in the medical corpus. In general, it may be worth highlighting
to medical students which are the medical words in the GSL that occur most frequently in medical textbooks. For
instance, using the semantic rating scale described in the methodology section of this study, we identi!ed 626
medical word types (out of a total of 4,119 word types) in the GSL1, and 371 medical word types (out of a total
of 3,708 word types) in the GSL2. Examples of medical word types in the GSL are bleeding, stroke, and illness in
the GSL1 and health, pain, and brain in the GSL2.

Table 3
Coverage of the GSL1 and GSL2, the AWL and the EAP Science List over the Medical Corpus
GSL1 55.62 3,291
GSL2 5.97 2,415
AWL 8.23 2,418
EAP Science List 6.06 1,288
Cumulative total 75.88 9,414
In relation to the coverage of the AWL over medical texts, Table 3 shows that the AWL accounts for
8.23% of the 5.4 million tokens of the medical corpus. 527 of the 3,107 word types in the AWL were identi!ed
as medical words. Examples of medical words in the AWL include depression, labour, and topical. When compared
with the coverage of the GSL1 over medical texts, the 8.23% coverage of the AWL seems a good coverage of
academic words over medicine. Since the lexical coverage by the AWL is 2.26% higher than that of the GSL2,
these coverage results suggest that it may be more useful for ESP medical students to start learning the AWL
right after they have acquired the words in the GSL1. The AWL is a particularly useful word list to learn when
ESP medical students need to focus on academic words. For this reason, the AWL is a helpful list for medical
students taking !rst year ESP reading courses.
As also indicated in Table 3, the high coverage of the EAP Science List over medicine (6.06%), when
compared with the coverage of the GSL1, GSL2, and the AWL over medical texts, shows that EAP Science List
plays an important complementary role in helping ESP medical students become familiar with scienti!c words
that occur in texts of health and medicine (see Coxhead & Quero, 2015, for further discussion on the behaviour
of the EAP Science List over medical texts). Examples of some scienti!c words with a medical meaning in the
EAP Science List are cell, anatomy, and digest. These results also suggest that the EAP Science List is of particular
interest to science and medical students rather than to learners of general English. Additionally, the lexical
coverage results of the GSL, AWL and EAP Science List over the medical corpus suggest that the learning of
high frequency general, academic and scienti!c words in English could be sequenced differently for ESP medical
students.
Table 4
Cumulative Coverage of the Three 1,000 MGEN Lists
MGEN1 8.49 1,000
MGEN2 1.82 1,000
MGEN3 0.87 1,000
Cumulative total 11.18 3,000
Let us now look at the text coverage of the new medical word lists (i.e., the three 1,000 MGEN lists). These
3,000 medical word types are divided into three 1,000 word lists and referred to as MGEN1, MGEN2, and
MGEN3 in Table 4. Examples of medical words in the MGEN lists are syndromes, radiologist, and anatomical. Note
also in Table 4 that the three 1,000 MGEN lists provide a coverage of 11.18%. This means that the GSL, AWL,
EAP Science List and the three MGEN list together cover 87.06% (i.e., 75.88% for the GSL, AWL and EAP
Science List, and 11.18% for the three MGEN lists) of medical texts. This cumulative coverage of 87.06%
indicates that a 10.94% coverage is still needed to reach an optimal lexical threshold of 98%.
Table 5 gives the coverage details of the twenty-three frequency-ranked 1,000 MED word lists that are
unique to the medical corpus. As can be observed in Table 5, there is a large amount of low-frequency medical
words occurring in medical texts. Examples of medical words in the 23 MED lists are subcutaneously, polyarteritis,
and catarrhalis.

Table 5
Coverage of the Twenty-three 1,000 MED Lists
Word List Coverage % Number of
Word Types
MED1 5.16 1,000
MED2 1.46 1,000
MED3 0.82 1,000
MED4 0.54 1,000
MED5 0.39 1,000
MED6 0.30 1,000
MED7 0.23 1,000
MED8 0.18 1,000
MED9 0.15 1,000
MED10 0.12 1,000
MED11 0.10 1,000
MED12 0.09 1,000
MED13 0.07 1,000
MED14 0.06 1,000
MED15 0.06 1,000
MED16 0.05 1,000
MED17 0.04 1,000
MED18 0.04 1,000
MED19 0.04 1,000
MED20 0.04 1,000
MED21 0.02 1,000
MED22 0.02 1,000
MED23 0.02 1,000
Cumulative 10.00 23,000
total
Table 6 shows that 2.94% of the tokens and 19,942 word types occur in the medical corpus but not in the 30
existing word lists. These words outside the lists include single letters of the alphabet or roman numerals,
marginal medical words (e.g., chap an abbreviation of chapter), pre!xes (e.g., non-, and micro-), and low-frequency
medical words (e.g., encephalographic, and haematologist).
Table 6
Coverage of the GSL, the AWL, the EAP Science List, the Three 1,000 MEDGEN Lists, and the Twenty-three 1,000 MED
Lists Including Words outside the Existing Lists
Word List Number of Tokens Coverage % Number of Word Types
Cumulative total of existing lists 5,271,784 97.06 35,414
Words outside the lists 159,956 2.94 19,942
Total 5,431,740 100.00 55,354
The cumulative coverage of all the 30 existing lists (i.e., the GSL, AWL, EAP Science List, and the three
MGEN lists, and the twenty-three MED lists) and words outside these lists is compared in Table 6. The results in
Table 6 show that if readers of medical texts want to get closer to a 98% text coverage over medical texts, a large
number of the 19,942 word types left outside all these 30 word lists are required to achieve a 98% coverage.
Based on the cumulative total coverage (97.06%) of the word lists shown in Table 6, we conclude that at least

twenty-two 1,000 low-frequency medical word lists would need to be added to these already existing 30 lists to
increase the text coverage from 97.06% to 97.50% and start getting closer to 98% (the optimal lexical threshold).
Another way to get closer to 98% with a smaller amount of word types could be to add word lists of high and
mid-frequency words with general academic meaning that, for different reasons, are not included in the existing
general, academic, and scienti!c word lists (i.e., the GSL, AWL, EAP Science List) used as part of the present
investigation. (See also Appendix A with text coverage and occurrence !gures of all the lists discussed in this
study).
Discussion
Next, we discuss the value of the twofold methodology here adopted for identifying medical words. This
discussion refers to the following aspects of the present study: (1) the semantic rating scale, (2) the size of the
corpus, (3) the corpus comparison approach, and (4) the new medical word lists.
The Semantic Rating Scale

The replication of Chung and Nation’s (2003) semantic rating scale involved the identi!cation of thousands of
medical words (over 30,000) occurring in the medical corpus used. Despite the usefulness of the semantic rating
scale for making decisions on the number of content area vocabulary items found in medical texts, the need to
classify thousands of words made the use of this rating scale a very time-consuming process (as also reported by
Chung & Nation, 2004; Fraser, 2005, 2006). Likewise, there were still over 8,000 word types, most of them words
occurring only once, that remained unclassi!ed. The adaptation of Chung and Nation’s (2003) rating scale for
the present study has enabled u s to provide a comprehensive account of the lexical demands of medical
textbooks. Hence, the use of Chung and Nation’s semantic rating scale has proven effective to identify a large
amount of words with medical meaning in the medical corpus – occurring in existing word lists such as West’s
(1953) GSL, Coxhead’s (2000) AWL, Coxhead and Hirsh’s (2007) EAP Science List, and the 26 medical word
lists (i.e., the three 1,000 MGEN, and the twenty-three MED lists). In sum, the use of Chung and Nation’s rating
scale made possible the identi!cation of a large number of content area (medical) words found in medical
textbooks.
The Size of the Corpus

The size of the medical corpus was determined by the amount of specialised texts from a variety of medical
topics available in digital format. The presence of a wide range of medical topics in the medical corpus
facilitated the estimation of the lexical demands of medical textbooks. In fact, the size of the medical corpus was
large enough in number of running words (5,431,740 tokens) and coverage of medical topics to estimate the
lexical pro!le of medical texts and provide a representative sample of the lexis found in medical textbooks.
The Corpus Comparison Approach

As previously mentioned in the methodology section, two corpora (i.e., a medical corpus and a general corpus)
were compiled to enable the implementation of the corpus comparison approach. These two corpora were
characterised by having the same size (i.e., 5,431,740 tokens), but comprising different topics, namely, a variety of
health and medical topics in the medical corpus, and a wide range of general topics in the general corpus. First
of all, the medical corpus was created for identifying medical vocabulary, using Chung and Nation’s semantic
rating scale, in popular existing word lists – such as West’s (1953) GSL, Coxhead’s (2000) AWL, and Coxhead
and Hirsh’s (2007) EAP Science List – and beyond these lists. Then, the general comparison corpus of the same
size was compiled to apply the corpus comparison approach for creating the medical word lists needed to
estimate the lexical demands of medical textbooks. The use of two corpora (i.e., the medical and general
corpora) of the same size but very different in their range of topics made possible the successful implementation
of the corpus comparison approach for estimating the 98% lexical threshold of medical textbooks in this study.

The New Medical Word Lists

A series of medical word lists were created using two different frequency-based procedures. These two
procedures were used to rank and group the medical words previously classi!ed using an adaptation of Chung
and Nation’s (2003) semantic rating scale. The !rst procedure included medical words occurring both in the
medical and general corpora: a total of three 1,000 MGEN word lists beyond the GSL, AWL, and EAP Science
List were created applying this !rst procedure. The sets of medical word lists created following this !rst
procedure were ranked by placing the medical word types with the highest relative frequency – which was
calculated by dividing the frequency of a word type in the medical corpus by the frequency of the same word in
the general corpus – at the top of the lists. The relative frequency, instead of the absolute frequency, was the
criterion selected for ranking the medical words classi!ed applying this !rst procedure, because it provided the
best return – i.e., the smallest number of word types to obtain the highest coverage results. In relation to the
second procedure, it included medical words that only occurred in the medical corpus. Following the second
procedure, the medical word types were ranked by their highest absolute frequency of occurrence in the medical
corpus. A total of twenty-three 1,000 medical word lists were created, including words beyond the GLS, AWL,
and EAP Science List.
The creation of a series of medical word lists, using the above mentioned twofold methodology (i.e.,
semantic rating scale and corpus comparison approach), has made possible the identi!cation of the number of
words (vocabulary load) required for students of medicine in general and for non-native medical students in
particular to be able to cope with the lexical demands of medical textbooks. The enormous number of medical
words to learn highlights the importance of acquiring subject-speci!c (medical) vocabulary as early as possible.
Conclusions and Implications

The use of the twofold methodology (i.e., semantic rating scale and corpus comparison approach) has enabled
the creation of a comprehensive set of medical word lists to deal with the lexical demands of medical textbooks.
The series of medical word lists here developed can serve several purposes. For instance, these medical word lists
can be used as a guide for designing the vocabulary syllabus of an English for Medical Purposes course, making
more informed decisions on the vocabulary worth focusing on when planning and teaching an ESP lesson,
assessing and testing the learner’s performance, and instructing medical students in the vocabulary learning
strategies necessary for them to take control of the learning of content area (medical) vocabulary inside and
outside the ESP classroom.
The text coverage results presented in this study demonstrate the large numbers of content area (medical)
vocabulary – at least 26,000 different word types – making up medical texts. These words range from very high
frequency words to many words occurring only once in the corpus, and represent an enormous amount of
learning for both native speakers and non-native speakers training to be doctors. This very large number of
medical words to learn stresses the importance of devising a plan for ESP reading courses, a plan that underlines
the value of (1) strategy training in the ESP reading courses for medical students, (2) learning medical vocabulary
as early as possible, (3) having a reasonable vocabulary size before starting medical study, and (4) testing the
vocabulary size of ESP learners.
Some comments on the perceived limitations encountered during this investigation in relation to the size of
the corpora, the identi!cation of medical words, the nature of medical texts used for making the medical corpus,
and the pedagogical value of the medical word lists created can be summarised as follows:
1. Size of the corpora. It is not always possible to adopt the corpus comparison approach with different
corpora of similar or the same size, but comprising different (e.g., medical vs. general) topics, as in the
case of the present investigation. A common solution to this problem is to normalize the frequency
scores to a common base.
2. Identi?cation of medical vocabulary. The replication of Chung and Nation’s (2003) semantic rating scale
involved the identi!cation of thousands (at least 26,000) o f medical words occurring in the medical

corpus used for creating the lists. In spite of the usefulness of the semantic rating scale for making
decisions on the number of content area vocabulary found in medical texts, its implementation proved
to be a very demanding and time-consuming.
3. Medical corpus limited to textbooks. The medical texts included in the medical corpus compiled for the
present investigation was restricted to textbooks. For future research to estimate the vocabulary load of
medical texts, it would be worth including a variety of text types (such as medical articles in specialised
journals and scienti!c magazines, book chapters, technical reports, and laboratory manuals) when
creating a specialised corpus of medical texts written in English.
4. Pedagogical value of the medical word lists. The results of this investigation have shown that readers of
medical textbooks need to know about 26,000 medical word types beyond existing word lists – as
represented by the GSL, AWL, and EAP Science List, respectively – to be able to meet the lexical
demands of medical textbooks. As detailed in Appendix A, the pedagogical value of the last two-thirds
of the new medical word lists (i.e., around 16,000 medical word types needed for an additional 1%
cumulative text coverage) is questionable. The acquisition of 26,000 medical words is a vocabulary
learning goal that seems unrealistic to achieve in the restricted time span (one to two years at most) of
most English of Medical Purposes reading courses. The need to learn these 26,000 medical word
types clearly indicates that the technical vocabulary of medicine is very large and represents a major
learning burden for the students learning to read medical texts written in English.
Vocabulary expansion of medical terms should be an important goal for teachers of English for Medical
Purposes. In order to help ESP learners better cope with the lexical demands of medical texts and the large
number of medical words required to achieve an adequate lexical threshold, ESP teachers need to:
1. Design a lexical syllabus to teach the vocabulary learning strategies, such as guessing from context,
using mnemonic techniques, using word cards, doing extensive reading, that enable medical students to
cope with most of the new vocabulary independently.
2. Encourage learners to do extensive reading on topics that address the vocabulary they are trying to
learn.
3. Promote the use of genuine lexical contexts and provide authentic examples of medical vocabulary.
Examples of authentic reading materials for meeting and learning medical terms in context are
medical textbooks like those used to create the medical corpus mentioned in the present study.
4. Emphasise word relationships such as lexical bundles, word frequency, and phraseology.
5. Set ambitious vocabulary learning goals for your students of around 50 words per week.
6. Group the vocabulary that needs to be learnt in a manageable format (e.g., word family lists).
In conclusion, it is important to equip medical students in the ESP classes at university with the vocabulary
learning strategies necessary to manage the acquisition of the massive number of words required to achieve good
reading comprehension of medical texts written in English.
References
Brezina, V., & Gablasova, D. (2013). Is there a core general vocabulary? Introducing the new general service list.
Applied Linguistics, 1–13. https://doi.org/10.1093/applin/amt018
Browne, C. (2013). The new general service list: Celebrating 60 years of vocabulary learning. The Language Teacher,
37(4), 13–16.
Chen, Q., & Ge, G.-C. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL
word families in medical research articles (RAs). English for Speci?c Purposes, 26(4), 502–514.
Chung, T. M., & Nation, I. S. P. (2003). Technical vocabulary in specialised texts. Reading in a Foreign Language,
15(2), 103–116.
Chung, T. M., & Nation, I. S. P. (2004). Identifying technical vocabulary. System, 32(2), 251–263.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.

Coxhead, A., & Hirsh, D. (2007). A pilot science word list for EAP. Revue Francaise de Linguistique Appliqueé, 7(2), 65–
78.
Coxhead, A., & Quero, B. (2015). Investigating a Science Vocabulary List in university medical textbooks.
TESOLANZ Journal, 23, 55–65.
Engels, L. K. (1968). The fallacy of word-counts. IRAL - International Review of Applied Linguistics in Language
Teaching, 6(3), 213–231. https://doi.org/10.1515/iral.1968.6.1-4.213
Fauci, A. S., Braunwald, E., Kasper, D. L., Hauser, S. L., Longo, D. L., Jameson, J. L., & Loscalzo, J. (2008).
Harrison’s principles of internal medicine (17th Edition). New York: McGraw-Hill. Retrieved from
http://highered.mcgraw-hill.com/sites/0071466339/information_center_view0/table_of_contents.html
Fraser, S. (2005). The lexical characteristics of specialized texts. In K. Bradford-Watts, C. Ikeguchi, & M.
Swanson (Eds.), JALT2004 conference proceedings (pp. 318–327). Tokyo: JALT. Retrieved from http://jalt-
publications.org/archive/proceedings/2004/E115.pdf
Fraser, S. (2006). The nature and role of specialized vocabulary: What do ESP teachers and learners need to
know. Hiroshima Studies in Language and Language Education, 9, 63–75.
Fraser, S. (2007). Providing ESP Learners with the Vocabulary They Need: Corpora and the Creation of
Specialized Word Lists. Hiroshima Studies in Language and Language Education, 10, 127–145.
Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327.
https://doi.org/10.1093/applin/amt015
Goldman, L., & Ausiello, D. (Eds.). (2008). Cecil textbook of internal medicine (23rd edition). Philadelphia, PA: W.B.
Saunders Elsevier. Retrieved from http://www.us.elsevierhealth.com/cecil-medicine/goldman-cecil-
medicine-expert-consult/9781416028055/
Heatley, A., Nation, I. S. P., & Coxhead, A. (2002). Range [Computer software]. en, Wellington, New Zealand:
Victoria University of Wellington.
Hirsh, D., & Nation, I. S. P. (1992). What vocabulary size is needed to read unsimpli!ed texts for pleasure?
Reading in a Foreign Language, 8, 689–696.
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign
Language, 13(1), 403–30.
Hwang, K., & Nation, I. S. P. (1989). Reducing the vocabulary load and encouraging vocabulary learning
through reading newspapers. Reading in a Foreign Language, 6(1), 323–335.
Hyland, K., & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly, 41(2), 235–253.
https://doi.org/10.1002/j.1545-7249.2007.tb00058.x
Konstantakis, N. (2007). Creating a business word list for teaching business English. Elia, 7, 79–102.
Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? Special Language: From Humans
Thinking to Thinking Machines, 316–323.
Laufer, B. (1992). How much lexis is necessary for reading comprehension. In H. Béjoint & P. J. Arnaud (Eds.),
Vocabulary and applied linguistics (Vol. 3, pp. 126–132). London: Macmillan.
Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners’
vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30.
Lei, L., & Liu, D. (2016). A new medical academic word list: A corpus-based study with enhanced methodology.
Journal of English for Academic Purposes, 22, 42–53. https://doi.org/10.1016/j.jeap.2016.01.008
Martínez, I. A., Beck, S. C., & Panza, C. B. (2009). Academic vocabulary in agriculture research articles: A
corpus-based study. English for Speci?c Purposes, 28(3), 183–198.
Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Speci?c Purposes,
25(2), 235–256.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language
Review/La Revue Canadienne Des Langues Vivantes, 63(1), 59–82.
Nation, I. S. P. (2013). Learning vocabulary in another language (Second edition). Cambridge: Cambridge University
Press.

Nation, I. S. P. (2016). Making and Using Word Lists for Language Learning and Testing . Amsterdam: John Benjamins
Publishing Company. Retrieved from http://www.jbe-platform.com/content/books/9789027266279
Quero, B. (2015). Estimating the vocabulary size of L1 Spanish ESP learners and the vocabulary load of medical textbooks.
(Unpublished PhD thesis). Victoria University of Wellington, Wellington, New Zealand.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Read, J. (2007). Second language vocabulary assessment: Current practices and new directions. International
Journal of English Studies, 7(2), 105–125.
Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading
comprehension. The Modern Language Journal, 95(1), 26–43.
Schmitt, N., & Schmitt, D. (2012). A reassessment of frequency and vocabulary size in L2 vocabulary teaching.
Language Teaching, available on CJO2012. https://doi.org/10.1017/S0261444812000018
Wang, J., Liang, S., & Ge, G. (2008). Establishment of a medical academic word list. English for Speci?c Purposes,
27(4), 442–458.
Wang, K., & Nation, I. S. P. (2004). Word meaning in academic English: Homography in the Academic Word
List. Applied Linguistics, 25(3), 291–314. https://doi.org/10.1093/applin/25.3.291
Ward, J. (1999). How large a vocabulary do EAP engineering students need? Reading in a Foreign Language, 12(2),
309–324.
Ward, J. (2009). A basic engineering English word list for less pro!cient foundation engineering undergraduates.
English for Speci?c Purposes, 28(3), 170–182.
West, M. P. (1953). A general service list of English words. London: Longman.
Yang, M.-N. (2015). A nursing academic word list. English for Speci?c Purposes, 37, 27–38.
https://doi.org/10.1016/j.esp.2014.05.003

Appendix A
Text Coverage and Frequency of Occurrence of the Medical Corpus by the GSL, AWL, EAP Science List and the Twenty-Six
Medical Word Lists
Word List Tokens # Tokens % Types # Types % Families #
GSL1 3,021,029 55.62 3,291 5.95 981
GSL2 324,020 5.97 2,415 4.36 886
AWL 447,254 8.23 2,418 4.37 565
EAP Sc. List 329,236 6.06 1,288 2.33 316
MGEN1 461,169 8.49 1,000 1.81 n/a
MGEN2 98,853 1.82 1,000 1.81 n/a
MGEN3 47,476 0.87 1,000 1.81 n/a
MED1 280,114 5.16 1,000 1.81 n/a
MED2 79,208 1.46 1,000 1.81 n/a
MED3 44,413 0.82 1,000 1.81 n/a
MED4 29,254 0.54 1,000 1.81 n/a
MED5 21,085 0.39 1,000 1.81 n/a
MED6 16,127 0.30 1,000 1.81 n/a
MED7 12,593 0.23 1,000 1.81 n/a
MED8 10,018 0.18 1,000 1.81 n/a
MED9 8,168 0.15 1,000 1.81 n/a
MED10 6,635 0.12 1,000 1.81 n/a
MED11 5,546 0.10 1,000 1.81 n/a
MED12 4,773 0.09 1,000 1.81 n/a
MED13 4,000 0.07 1,000 1.81 n/a
MED14 3,502 0.06 1,000 1.81 n/a
MED15 3,000 0.06 1,000 1.81 n/a
MED16 2,978 0.05 1,000 1.81 n/a
MED17 2,000 0.04 1,000 1.81 n/a
MED18 2,000 0.04 1,000 1.81 n/a
MED19 2,000 0.04 1,000 1.81 n/a
MED20 2,000 0.04 1,000 1.81 n/a
MED21 1,333 0.02 1,000 1.81 n/a
MED22 1,000 0.02 1,000 1.81 n/a
MED23 1,000 0.02 1,000 1.81 n/a
Words outside 159,956 2.94 19,942 36.03 0
the lists
Total 5,431,740 100.00 55,354 100.00 2,748

Acknowledgements
I would like to thank Emeritus Professor Paul Nation and Dr. Averil Coxhead of Victoria University of
Wellington for their unfailing assistance and advice on an earlier version of this article. I am also grateful to the
two anonymous TESOL International Journal reviewers for their constructive critiques and comments that have
helped enhance the quality of this work.
About the Author

Betsy Quero is a language teacher and researcher with over !fteen years of experience. She has taught in New
Zealand, England, and Venezuela. She holds a PhD in Applied Linguistics and is currently investigating the
vocabulary load of academic texts. Her research interests include ESP pedagogy, specialised word lists,
vocabulary testing, and language learning strategies.

76173800

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

76173800

Загружено:

Авторское право:

Доступные форматы

TESOL

Volume 12 Issue 1 2017 ISSN 2094-3938

© English Language Education Publishing Brisbane

Chief Editor: Dr. Xinghua Liu

Trends in Vocabulary Research

“I Used Them Because I Had to . . .”: The Effects of Explicit Instruction of

The Effect of Input Enhancement on Vocabulary Learning:

Vocabulary Teaching: Insights from Lexical Errors

Lexical Transfer in the writing of Chinese learners of English

Helping Language Learners Get Started with Concordancing

Self-assigned Ranking of L2 Vocabulary

Recognition Vocabulary Knowledge and Intelligence as Predictors of

Using Category Generation Tasks to Estimate Productive Vocabulary Size

How General is the Vocabulary in a General English Language Textbook?

A Corpus Comparison Approach for Estimating the Vocabulary Load of Medical

Trends in Vocabulary Research

The Guiding Ideas

Key Topics in Vocabulary Research

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Papers in This Issue

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

About the authors

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

A new inventory of vocabulary learning strategy for

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Vocabulary Learning Strategy Classi#cation

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Cognitive Strategies are divided into the following eight subgroups:

Social/affective Strategies involve three subcategories:

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Classi#cation of VLS in This Study

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Imagery Relating new words to concepts in memory by means of meaningful visual

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Assessment of Vocabulary Learning Strategies

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Stoffer’s (1995) VOLSI

Schmitt’s (1997) list of VLS and Kudo’s (1999) VLS questionnaire

Gu and Johnson’s (1996) VLQ Version 3, Section 3 – Vocabulary Learning Strategies

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Tseng, Dornyei and Schmitt’s (2006) Self-regulating Capacity in Vocabulary Learning

Compilation of SIVL and Initial Validation

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Re#ning and Shortening the 110-item SIVL

Results and Discussion

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Revalidating the 72-item SIVL

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Results and Discussion

Figure 1. Con#rmatory Factor Analysis of the Hypothesised Model

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

(Hair et al., 1998, p. 624)

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

2017 TESOL International Journal Vol. 12 Issue 1  ISSN 2094-3938

Limitations and Implications