Академический Документы
Профессиональный Документы
Культура Документы
International
Journal
Teaching English to Speakers of Other Languages
Special Issue
Teaching, Learning, and Assessing Vocabulary
Guest Editors
Marina Dodigovic
Stephen Jeaco
Rining Wei
Chief Editor
Xinghua Liu
Published by the TESOL International Journal
http://www.tesol-international-journal.com
This book is in copyright. Subject to statutory exception no reproduction of any part may take place without
the written permission of English Language Education Publishing.
No unauthorized photocopying
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying or otherwise, without the prior written permission
of English Language Education Publishing.
ISSN. 2094-3938
TESOL International Journal
Chief Editor
Xinghua Liu
Shanghai Jiao Tong University, China
Associate Editors
Hanh thi Nguyen Dean Jorgensen
Hawaii Pacic University, USA Gachon University, South Korea
Joseph P. Vitta Khadijeh Jafari
Queen’s University Belfast, UK Islamic Azad University of Gorgan, Iran
Editorial Board
Ai, Haiyang - University of Cincinnati, USA Li, Chili - Hubei University of Technology, China
Anderson, Tim - University of British Columbia, Canada Li, Liang - Jilin Normal University, China
Arabmofrad, Ali - Golestan University, Iran Li, Yiying - Wenzao Ursuline University, Taiwan
Batziakas, Bill - Queen Mary University of London, UK Lo, Yu-Chih - National Chin-Yi University of Technology, Taiwan
Behfrouz, Behnam - University of Buraimi, Oman Nguyen, Ha Thi - Monash University, Australia
Bigdeli, Rouhollah Askari - Yasouj University, Iran Niu, Ruiying - Guangdong University of Foreign Studies, China
Bretaña – Tan, Ma Joji - University of the Philippines, O'Brien, Lynda - University of Nottingham Ningbo, China
Philippines
Çakir, İsmail - Erciyes University, Turkey Rozells, Diane Judith - Sookmyung Women’s University, S. Korea
Chang, Tzu-shan - Wenzao Ursuline University of Languages, Salem, Ashraf Atta Mohamed Safein - Sadat Academy for
Taiwan Management Sciences, Egypt
Choi, Jayoung - Georgia State University, USA Sultana, Shahin - B. S. Abdur Rahman University, India
Chuenchaichon, Yutthasak - Naresuan University, Thailand Ta, Thanh Binh - Monash University, Australia
Chung, Edsoulla - University of Cambridge, UK Tran-Dang, Khanh-Linh - Monash University, Australia
Cutrone, Pino - Nagasaki University, Japan Ulla, Mark B. - Mindanao State University, Philippines
Dang, Doan-Trang Thi - Monash University, Australia Witte, Maria Martinez - Auburn University, USA
Deng, Jun - Central South University, China Wu, Chiu-hui - Wenzao Ursuline University of Languages, Taiwan
Derakhshan, Ali - Golestan University, Iran Yan, Yanxia - Xinhua University, China
Dodigovic, Marina - American University of Armenia, Armenia Yu, Jiying - Shanghai Jiao Tong University, China
Farsani,Mohammad Amini - Kharazmi University, Iran Zhang, Xinling - Shanghai University, China
Floris, Flora Debora - Petra Christian University, Indonesia Zhao, Peiling - Central South University, China
Hos, Rabia - Zirve University, Turkey Zhao, Zhongbao - Hunan Institute of Science and Technology,
China
Ji, Xiaoling - Shanghai Jiao Tong University, China
Jiang, Xuan - St. Thomas University, USA
Kambara, Hitomi - University of Oklahoma, USA
Khajavi, Yaser - Shiraz university, Iran
Lee, Sook Hee - Charles Sturt University, Australia
Contents
A New Inventory of Vocabulary Learning Strategy for Chinese Tertiary EFL Learners
Xuelian Xu, Wen-Cheng Hsu 7
Stephen Jeaco
Xi’an Jiaotong-Liverpool University, China
Rining Wei
Xi’an Jiaotong-Liverpool University, China
*Tel.: +374 60612-740; Email: mdodigovic@aua.am; 40 Marshal Baghramyan Ave., Yerevan, 0019, Armenia
through which a new word must pass as it gains entry into the learner’s lexicon (p. 373). According to Gu and
Johnson (1996), vocabulary learning strategies are classi2ed into four groups: metacognitive, cognitive, memory,
and activation strategies. Metacognitive strategies, include selective attention as well as self-initiation strategies,
while cognitive strategies include the use of dictionaries, guessing and note taking strategies. Memory strategies
consist of rehearsal and encoding strategies. Finally, activation strategies are those that learners utilize in order to
use the new words in various contexts. Schmitt (1997) classi2es vocabulary learning strategies into two groups.
The 2rst group determines the meaning of new vocabulary items which the learners face for the 2rst time, and
contains determination and social strategies. The second group, on the other hand, entails strategies which
consolidate the meaning of vocabulary items when encountered again by the learners. This group consists of
cognitive, metacognitive, memory, and social strategies. However, not much is known about the relative
frequency or effectiveness of each of the above strategies. Xu and Hsie (this issue) look into such strategies and
their representation.
There are two types of vocabulary knowledge: receptive and productive (Nation, 2006). Receptive
vocabulary enables the learner to comprehend readings or listenings. In this volume, Masrai and Milton (this
issue) discuss this aspect of vocabulary. Productive vocabulary, on the other hand, facilitates the productive skills
of speaking and writing. In addition to vocabulary size, which is expressed in the number of words a learner
knows, vocabulary is also measured in terms of depth (Beglar & Nation, 2007). Depth concerns everything a
learner knows about a word, including ways of spelling and pronouncing it, the sentence structure it requires, its
part of speech, the functions it can have in connected discourse, the contexts in which it can possibly occur, other
words that may accompany it, the idiomatic expressions it is known to build and the connotations it can have
(Folse, 2004). Brumbaugh and Heift (this issue) build on the concept of vocabulary depth. It is expected that in
productive skills, such as speaking and writing, a larger vocabulary size would have the effect of a greater lexical
range used, while a greater depth of vocabulary knowledge would result in a more accurate and skillful use of
vocabulary.
Tests such as Vocabulary Size Test (VST) are often used to measure the size of the learners’ vocabulary
(Beglar & Nation, 2007). This test has been speci2cally developed to “provide a reliable, accurate, and
comprehensive measure” (Beglar, 2010, p. 103) of L2 English learners’ receptive vocabulary in its written form,
including the 14,000 most frequent word families in English. Other such tests are described in this issue.
However, it is more dif2cult to measure vocabulary depth in relation to productive vocabulary size. This is further
discussed by Roghani and Milton (this issue).
Instances of language use lacking in accuracy, otherwise known as language errors, are signi2cant in three
respects: they inform the teacher about what should be taught; they inform the researcher about the course of
learning; they are outcomes of the learner’s target language hypothesis testing (James, 1998). The sources of
error are deemed to be the redundancy of code (intralingual), various sources of interference (interlingual) and
unsuitable presentation (George, 1972). Similarly, James (1998) distinguished between a slip, an odd mistake or a
systemic error. A slip is expected to result in self-correction, a mistake calls for feedback, while error requires full
correction of the erroneous utterance. In this volume, Augustin-Llach (this issue) examines a range of lexical
errors.
According to Cook and Singleton (2014), second language acquisition (SLA) is primarily concerned with the
interplay between a learner’s 2rst (L1) and an additional language (L2). Thus Li (2014) identi2es such an
interplay in the interlanguage of Chinese learners of English. According to Wang (2014), this is characterized by
the structural and lexical patterns of Chinese in the learner’s grammatical and lexical choices in English, which
are not necessarily transparent to other speakers of English, thus potentially obscuring comprehension. In
particular, lexis in L2 often adopts the L1 semantic features (Cook & Singleton, 2014). An example of this is a
Chinese student asking at the end of a presentation: “Do you have a problem?” The Chinese equivalent “ 问 题
(wen ti)” means both a question and a problem. Collocations or multi-word units present another challenge for
L2 learners (Yamashita & Jiang, 2010). An example of the in'uence of L1 on collocations in English as L2 is “eat
medicine” (rather than “take medicine”), based on Chinese “ 吃药 (chi yao)”. These examples represent evidence
of subordinate bilingualism, which according to Cook and Singleton (2014) has its roots in translation as a
teaching/learning method. Dodigovic (2014) found that learners with limited vocabulary use bilingual
dictionaries with only one English translation equivalent, which also restricts the depth of their English
vocabulary (Schmitt, 2010). In line with this, Dodigovic, Ma and Jing (this issue) pursue such patterns in the
writing of Chinese learners of English.
Vocabulary is ideally suited to corpus linguistic approaches in research and teaching. The term corpus
commonly ‘‘refers to an electronic text’’ (Holmes, 1999, p. 241) and is often in fact a compilation of text samples
that one wants to examine for vocabulary use or other features. Special software is applied to 2nd out, for
example, which words or expressions are most frequently used by an author or a group of authors. Having a
corpus of authentic language data gives one the opportunity to either postulate very speci2c hypotheses or
identify patterns through corpus data analysis (Tognini-Bonelli, 2001). As a method, corpus linguistics allows for
a quantitative approach, in that it counts the occurrences of the examined linguistic phenomena.
When applied to language learning, this method can be very helpful. It can be used to gain a better
understanding of how the target student population uses language and what misconceptions the students might
have about the additional language they are learning. While some researchers prefer to pro2le the vocabulary
(Cobb, 2004) of either the learners or their learning resources, others use learner corpora to gain a better
understanding of learner errors (Granger, 2003). The latter include Dodigovic, Ma and Jing (this issue).
Furthermore, many have used target language corpora to teach language, a technology enhanced approach that
is sometimes called data-driven (Allan, 1999; Levy, 1997). This topic is also pursued by Jeaco (this issue).
Vocabulary is also important in educational needs analysis. Needs analysis refers to a procedure in
language planning (Nunan, 1988). This procedure serves three main purposes (Richards, 1984, cited in Nunan,
1988). Firstly, it can be used to obtain wider input into content, design and implementation of a language
programme. Secondly, it can be used to develop goals, objectives and content for a language programme. Finally,
it can provide data for programme evaluation. It can on the one hand be based on soft data and opinions or on
hard data, such as linguistic facts (Johns, 1997).
With its force of hard data evidence, the corpus approach is particularly useful in raising the teacher’s
awareness of their students’ learning needs, but it can also be used to demonstrate to the students and the
respective institution how their use of language differs from the targeted standard. Indeed, the level of
institutional language awareness can be raised to the point at which the institution becomes able to anticipate
learning problems and better facilitate teaching, learning and assessment. In particular, corpus analysis can help
institutions decide whether the teaching materials and methods used are conducive to learning success.
Technology plays a key role in making that hard evidence readily available. In this volume, Quero (this issue) as
well as McGarrell and Nguien (this issue) take this approach. Other uses of technology with respect to
vocabulary are described by Jeaco (this issue) as well as Brumbaugh and Heift (this issue).
study compares the improvements in productive use of target structures for a treatment group, who received
explicit instruction, and a contrast group. The results demonstrate the gains of explicit instruction for the
production of topic-induced phrases and the paper explores some of the attitudes of the language learners
through analysis of interviews.
Jones and Waller present a quasi-experimental study examining textual and aural input enhancement for
vocabulary teaching at an elementary level in a higher education context. The enhancements provided for the
treatment group consisted of the bolding of target words in a menu and three repetitions of the modeling of the
words by the teacher. Their results demonstrate some clear bene2ts of both kinds of enhancement when
teaching lexis.
Augustin-Llach takes the evidence of lexical errors for a theoretical exploration of EFL vocabulary teaching,
reviewing previous research and suggesting new ways to engage pedagogically with lexical errors. By drawing on
a solid research base, the fusion of analyses from different studies in this important area leads directly into some
practical implications and calls for broader appreciation of the need for explicit vocabulary instruction through a
range of approaches.
Dodigovic, Ma and Jing reveal insights into 2rst language (L1) lexical transfer within the context of L1
Chinese learners of English through analysis of individual words, collocations and multi-word units. In a cross-
sectional study of written work from university students, they demonstrate that the most frequent cause of errors
is the L1 polysemy of individual words, with multi-word units (MWU) and collocation errors following after.
They also 2nd a slight but not statistically signi2cant drop in the frequency of lexical transfer errors in the more
advanced learner group in all three of these areas.
Jeaco discusses the use of corpora in vocabulary learning and reports on an evaluation of a concordancing
tool which was designed for English language learners and teachers. The software tool, called The Prime Machine
(Jeaco, 2015), includes support features for conducting searches on vocabulary and language patterns,
encouraging language discovery processes for the comparison of speci2c words and collocations. This paper
introduces some of the pedagogical perspectives on the software design, and reports on the positive reception of
the software from students with little or no prior experience in concordancing work.
Brumbaugh and Heift present an empirical investigation into the use of a Computer Assisted Language
Learning (CALL) tool for the assessment of the depth of vocabulary knowledge of intermediate L2 English
learners. The study introduces the design and use of Bricklayer and the 2ndings provide evidence of the validity of
this assessment tool, and the paper explains how such an approach strengthens models of both knowledge and
behavior for CALL adaptive systems.
Masrai and Milton’s paper explores predictors of academic achievement, building on work on general and
academic vocabulary knowledge (Townsend et al., 2012) and general intelligence (Laidra et al., 2007). Their
examination of these and additional factors adds to a predictive model, drawing on L1 vocabulary knowledge,
L2 general and academic vocabulary knowledge, and intelligence (IQ). They demonstrate the way in which each
element in the model makes unique contributions, and how the four elements explain different aspects of
variance in the academic achievement data.
Roghani and Milton investigate the usefulness and effectiveness of a category generation task for productive
vocabulary size assessment. For the assessment, learners would be asked to make a list of words within a speci2c
category and be asked to list words. The resulting list of words can then be compared with receptive vocabulary
size estimates. Through analysis of results from learners at different levels of performance, and comparison with
two standardised tests, they demonstrate that the category generation tasks are reliable and effective.
McGarrell and Nguien tackle the question of optimal language input for institutional contexts where
textbooks form the basis for instruction. They present an analysis of lexical bundles in a popular textbook of
General English, comparing these with frequently occurring lexical bundles in corpora. The analysis looks at the
functions of the lexical bundles covered and their usefulness . Their 2ndings demonstrate limitations in the
usefulness of the lexical bundles in the textbook, and the authors argue for more attention to be paid to lexical
bundles in language teaching and materials development.
Last but not least, Quero reports on a subject-speci2c study into the vocabulary load of English medical
handbooks, considering the lexical demands in terms of the number of words needed for comprehension of
medical texts. The study used a corpus approach, drawing on existing word lists and making comparisons
between the medical text corpus and a corpus built from seven general English corpora. The results provide
insights into the vocabulary needs of medical students and health professionals, with a long list of subject-speci2c
(medical) words having been generated through this approach.
Conclusion
This issue covers a range of topics related to teaching, researching, learning and assessing vocabulary in an
additional language. Each of the papers furthers our understanding of issues such as incidental and deliberate
vocabulary learning in terms of vocabulary depth or size, and each considers their roles in areas such as
academic success, teaching of lexical phrases and their representation in textbooks as well as the vocabulary
required to succeed in certain academic disciplines. The editors are con2dent that each reader will be able to
identify at least some points of relevance in relation to their own research or practice.
References
Alderson, J. C. (2005). Diagnosing foreign language pro7ciency. London: Continuum.
Allan, M. (1999). Language awareness and the support role of technology. In R. Debski and M. Levy (Eds.)
WORLDCALL. Global Perspectives on Computer-Assisted Language Learning (pp. 303 – 318). Lisse: Swets &
Zeitlinger.
Beglar, D. (2010). A Rasch-based validation of the vocabulary size test. Language Testing, 27 (1), 101-118.
Beglar, D. & Nation, P. (2007). Vocabulary size Test.
Cobb, T. (2004). The Compleat Lexical Tutor, www.lextutor.ca, Acessed 4 June 2004.
Cook, V. & Singleton, D. (2014). Key Topics in Second Language Acquisition. Bristol: Multilingual Matters.
Coxhead, A. (2000). The academic word list. TESOL Quarterly. 34 (2), 213 – 238.
Dodigovic, M. (2014). Strategies used by successful learners within the context of incidental vocabulary
acquisition in an additional language. RDF Project Completion Report, XJTLU.
Erman, B. (2009). Formulaic Language from a learner perspective: What the learner needs to know. In R.
Corrigan, E. A. Moravcssik, H. Ouali, and K.M. Wheatley (Eds.), Formulaic Language, Volume 2, (pp. 323-
346) Philadelphia: John Benjamin Publishing Company.
Folse, K. (2004). Vocabulary myths: Applying second language research to classroom teaching. Ann Arbour: University of
Michigan Press.
George, H. (1972) Common Errors in Language Learning. Rowley: Newbury House.
Granger, S. (2003). Error-tagged learner corpora and CALL: A promising synergy, CALICO Journal 20 (3), 465 –
480.
Gu, Y., & Johnson, R. K. (1996). Vocabulary learning strategies and language learning outcomes. Language
Learning, 46, 643-679.
Hatch, E. & C. Brown. (1995). Vocabulary, Semantics and Language Education. New York: Cambridge University Press.
Holmes, G. (1999). Corpus CALL: Corpora in language and literature. In K. Cameron, (Ed.) CALL: Media, design
and applications (pp. 239 – 270). Lisse: Swets and Zeitlinger.
James, C. (1998). Errors in language learning and use: Exploring error analysis. Routledge.
Johns, A. M. (1997) Text, Role and Context. Developing Academic Literacies. New Yourk: Cambrige University Press.
Levy, M. (1997). Theory-driven CALL and the Development Process. Computer Assisted Language Learning, 10 (1),
41-56
Li, W. (2014). New Chinglish: Translanguaging Creativity and Criticality. Keynote speech, AILA World Congress
2014, Brisbane.
Nation, I. S. P. (2006). Language education - vocabulary. In K. Brown (ed.) Encyclopaedia of Language and Linguistics,
2nd Ed. Oxford: Elsevier. Vol 6: 494-499.
Nunan, D. (1988). The Learner-Centred Curriculum. Cambridge: Cambridge University Press.
Paribakht, T.S. & Wesche, M. (1996). Enhancing Vocabulary Acquisition Through Reading: A Hierarchy of
Text-Related Exercise Types. The Canadian Modern Language Review, 52(2), 155-178.
Schmitt, N. (1997). Vocabulary learning strategies. In Schmitt & McCarthy 1997:199-227.
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. London: Palgrave Macmillan.
Song Y. & Fox, R. (2008). Integrating Incidental Vocabulary Learning Using PDAs into Academic Studies:
Undergraduate Student Experiences. Lecture Notes in Computer Science 5169:2008, 238 – 249.
Tognini-Bonelli, E. (2001). Corpus linguistics at work. Amsterdam: John Benjamins.
Townsend, D., Filippini, A., Collins, P., & Biancarosa, G. (2012). Evidence for the importance of academic word
knowledge for the academic achievement of diverse middle school students. The Elementary School Journal,
112(3), 497-518.
Wang, Y. (2014). Chinese Speakers’ Attitudes Towards Their Own English: ELF or Interlanguage. Teaching
English in China, 5, 7 – 12.
Yamashita, J. & Jiang, N. (2010). L1 In'uence on the Acquisition of L2 Collocations: Japanese ESL Users and
EFL Learners Acquiring English Collocations. TESOL Quarterly, 44 (4), 647 – 668.
Stephen Jeaco, PhD, is an Associate Professor at Xi'an Jiaotong-Liverpool University. He has worked in China
since 1999 in the 2elds of EAP, linguistics and TESOL. His PhD was supervised by Professor Michael Hoey and
focused on developing a user-friendly corpus tool based on the theory of Lexical Priming.
Rining WEI (Tony), PhD, is a Lecturer in the Department of English, Xi'an Jiaotong-Liverpool University,
Suzhou, China. His areas of research include argumentation, TESOL, and quantitative research methods. His
papers have appeared in journals including English Today, Asian EFL Journal, Journal of Multilingual and Multicultural
Development, and World Englishes.
Xuelian Xu*
Xi’an Jiaotong-Liverpool University, China
Wen-Cheng Hsu**
Xi’an Jiaotong-Liverpool University, China
Abstract
The past three decades have witnessed a boost of interest in vocabulary learning in EFL contexts since Meara (1980)
identi#ed it as ‘a neglected aspect of language learning’ (p. 221). A mushrooming amount of literature has emerged in
various aspects of vocabulary and its acquisition (e.g., Carter, 1998; Coady & Huckin, 1997; Manyak, 2010; Meara, 1995,
2005; Nation, 1990, 2006; Read, 2000; Schmitt, 2000; Schmitt & McCarthy, 1997). With a movement from teaching-
orientedness to learner-certeredness and learner autonomy, vocabulary learning strategies seem to have gained its legitimacy
as one auxiliary approach to vocabulary learning. Despite this, there appears no satisfactory instrument particularly for
assessing vocabulary learning strategy use in an EFL context, although a few researchers have tried to do so (e.g., Gu &
Johnson, 1996; Schmitt, 1997). To this aim, a new inventory for vocabulary learning, the Strategies Inventory for Vocabulary
Learning (SIVL) was proposed for Chinese EFL university learners. To validate the instrument, con#rmatory and
exploratory factor analyses were employed to assess its psychometric properties. Results showed that the hypothesized
theoretical model proved to be a good representation of the sample data, and that the SIVL exhibited satisfactory
psychometric features. This positive evidence indicates that the SIVL can serve as a reliable and valid research instrument
for assessing Chinese EFL university learners’ vocabulary learning strategy use. It is suggested that the SIVL can be a
valuable resource for EFL learners and practitioners in that it can raise their awareness of strategy use and strategy training
by employing this instrument, leading to more successful vocabulary teaching and learning.
Key words : Vocabulary learning, Learning strategies, Vocabulary learning strategies, Strategy classi#cation,
Strategy inventory, Factor analysis
* Tel: (+86) 512 88161328. Email: xuelian.xu@xjtlu.edu.cn. Language Centre, Xi’an Jiaotong-Liverpool University, No. 111
Ren’ai Road, HET, SIP, Suzhou, Jiangsu Province, P R China 215123
** Tel: (+86) 512 88161144. Email: wencheng.hsu@xjtlu.edu.cn. Language Centre, Xi’an Jiaotong-Liverpool University, No.
111 Ren’ai Road, HET, SIP, Suzhou, Jiangsu Province, P R China 215123
Introduction
To date, vocabulary learning strategies (VLS) have drawn increasing attention as one auxiliary approach to
vocabulary learning, with a movement from teaching-orientedness to learner-centredness and learner autonomy
thanks to the complexities of the processing of word knowledge and the range of factors involved in knowing,
processing, storing, and applying a word (Carter, 1998), which entails varying strategies. VLS is even more
important ‘because of the large number of low-frequency words and because of their infrequent occurrence and
narrow range, it is best to teach learners strategies for dealing with these words rather than to teach the words
themselves’ (Nation, 1990, p. 159). However, foreign language classrooms are always notorious for their precious
classroom teaching time, and it is impossible to teach everything about a word that students must become
independent word learners (Waring, 2002). The use of VLS can help students to deal with their vocabulary
learning independently. Schmitt (2000) claims that, in contrast to language tasks that involve several linguistic
skills, many learners do seem to use strategies for their vocabulary learning possibly due to the fact that the
‘relatively discrete’ nature of vocabulary learning compared to ‘more integrated’ language activities makes it
easier to utilize strategies effectively. In addition, Nation and Newton (1997) point out that, ‘[t]ime may be set
aside for the learning of strategies and learners’ mastery of strategies may be monitored and assessed’ (p. 241).
VLS has thus become essential inside and outside the classroom.
Theory and practice of VLS mainly stem from language learning strategies (LLS). The earlier literature in
SLA usually assumes strategies as a cognitive learning process (e.g., O'Malley & Chamot, 1990), while scholars in
educational psychology regard strategies from the social cognitive point of view which stresses metacognitive,
affective and social domains (e.g. Schunk, 2001; Zimmerman, 1989, 2000). In recent decades, a few in SLA
attempt to look at strategies from a volitional perspective focusing on metacognitive and affective domains (e.g.
Dörnyei, 2005; Tseng, Dörnyei, & Schmitt, 2006). Although different scholars claimed that they have their own
theoretical underpinnings, a consideration of metacognitive, cognitive and social cognitive perspectives can offer
a more holistic picture of VLS. It is under this proposition that an inventory that can tap into all the phases of
vocabulary learning strategies can be produced.
interact with others or ideationally control over affect. The three types are further categorised into several
subgroups respectively. Metacognitive Strategies include four subgroups, which are de#ned as below:
Selective attention: Focusing on special aspects of learning tasks, as in planning to listen for key words or
phrases.
Planning: planning for the organisation of either written or spoken discourse.
Monitoring: reviewing attention to a task, comprehension of information that should be remembered, or
production while it is occurring.
Evaluation: checking comprehension after completion of a receptive language activity, or evaluating language
production after it has taken place (O’Malley & Chamot, 1990, p. 46)
Another inMuential classi#cation of LLS is Oxford’s (1990) system (Table 1), which is divided into Direct
Strategies for handling the target language and Indirect Strategies for generally managing the learning of the
target language. The former is composed of Memory Strategies, Cognitive Strategies, and Compensation
Strategies. The latter includes Metacognitive Strategies,Affective Strategies and Social Strategies.
Table 1
Oxford’s (1990) System
Learning Strategies
General Speci#c
Direct Memory Creating mental linkages Grouping
Strategies Strategies Associating/elaborating
Placing new words into a context
Applying images and sounds Using imagery
Semantic mapping
Using keywords
Representing sounds in memory
Reviewing well Structured reviewing
Employing action Using physical response or sensation
Using mechanical techniques
Cognitive Practising Repeating
Strategies Formally practicing with sounds and writing systems
Recognising and using formulas and patterns
Recombining
Practicing naturalistically
Receiving and sending messages Getting the idea quickly
Using resources for receiving and sending messages
Analysing and reasoning Reasoning deductively
Analysing expressions
Analysing contrastively (across languages)
Translating
Transferring
Creating structure for input and Taking notes
output Summarising
Highlighting
Compensation Guessing intelligently Using linguistic clues
Strategies Using other clues
Overcoming limitations in speaking Switching to the mother tongue
and writing Getting help
Using mime or gesture
Avoiding communication partially or totally
Selecting the topic
Adjusting or approximating the message
Coining words
Using a circumlocution or synonym
Indirect Metacognitive Centring your learning Overviewing and linking with already known material
Strategies Strategies Paying attention
Delaying speech production to focus on listening
Arranging and planning your Finding out about language learning
learning Organising
Setting goals and objectives
Identifying the purpose of a language task
Planning for a language task
Seeking practice opportunities
Evaluating your learning Self-monitoring
Self-evaluating
Affective Lowering your anxiety Using progressive relaxation, deep breathing, or
Strategies meditation
Using music
Using laughter
Encouraging yourself Making positive statements
Taking risks wisely
Rewarding yourself
Taking your emotional temperature Listening to your body
Using a checklist
Writing a language learning diary
Discussing your feelings with someone else
Social Strategies Asking questions Asking for clari#cation or veri#cation
Asking for correction
Cooperating with others Cooperating with peers
Cooperating with pro#cient users of the new language
Empathising with others Developing cultural understanding
Becoming aware of others’ thoughts and feelings
From the above we can clearly see that there exists a substantial amount of overlap between the two LLS
classi#cation systems. First, O’Malley and Chamot’s (1990) Metacognitive Strategies have a straight counterpart
in Oxford’s (1990) system. This category generally functions as planning, organising, and evaluating one’s own
language learning. Second, both systems involve strategies handling affect and social interaction. Affective
Strategies are techniques for learners to manage their emotional and motivational states, and Social Strategies
techniques about learning the target language with other people. O’Malley and Chamot classify Affective
Strategies and Social Strategies as one single type: Socio-affective Strategies, whereas Oxford categorises them as
separate groups and lists a lot more affective and social strategies than O’Malley and Chamot. Third, O’Malley
and Chamot’s Cognitive Strategies roughly match a combination of Oxford’s Memory Strategies and Cognitive
Strategies, with an exception of ‘guessing from context (inferencing)’, which is part of O’Malley and Chamot’s
cognitive category but is listed by Oxford as a compensation strategy to make up for missing knowledge. Unlike
O’Malley and Chamot, Oxford intentionally divides Memory Strategies off from Cognitive Strategies as
‘Memory Strategies appear to have a very clear, speci#c function that distinguishes them from many Cognitive
Strategies’ (Hsiao & Oxford, 2002, p. 371). In other words, although Memory Strategies assist cognition in
nature, the operations referred as Memory Strategies are particular mnemonic devices helping learners store and
transfer information to long-term memory and retrieve it whenever necessary. Most Memory Strategies tend to
be associated with shallow processing while Cognitive Strategies tend to make a contribution to deep processing
(Dörnyei, 2005). Lastly, Oxford classi#es Compensation Strategies as a separate category because she seems to
believe that it is essential to make up for missing knowledge in any of the four language skills: listening, reading,
speaking, or writing. This category is intended to enable learners to use the target language for either
comprehension (i.e., listening, reading) or production (i.e., speaking, writing) in spite of the missing knowledge.
In the more speci#c area of VLS, some researchers have also sought to develop a vocabulary-speci#c strategy
classi#cation system. There are two main typologies: Schmitt’s (1997) and Stoffer’s (1995). Schmitt (1997) claims
that Oxford’s classi#cation system is generally suitable for VLS but still not satisfactory in a number of aspects:
1. No category in Oxford’s system satisfactorily depicts the type of strategies employed by an individual
learner when he/she is faced with discovering a new word’s meaning without others’ help;
2. In Oxford’s system, it seems dif#cult to classify some strategies which could easily #t into two or more
groups;
3. In Oxford’s system, it remains unclear whether some strategies should be categorised as Memory Strategies
or Cognitive Strategies.
Therefore, Schmitt (1997) offers a vocabulary-speci#c strategy classi#cation system by grouping VLS into
two broad categories: Discovery Strategies, i.e., strategies for the discovery of a new word’s meaning, and
Consolidation Strategies, i.e., strategies for consolidating a word once it has been encountered. The former
category involves two subcategories: Determination Strategies and Social Strategies. The latter group includes
Social Strategies, Memory Strategies, Cognitive Strategies, and Metacognitive Strategies. Schmitt also stresses
that the goal of both Cognitive Strategies and Memory Strategies is to aid recall of words through some form of
language manipulation, and more criteria should be used to separate Memory Strategies from Cognitive
Strategies. Therefore, he adopted the #ve areas of storing and memory strategies of Purpura’s (1994, cited in
Schmitt, 1997, pp. 205-206) as the other criteria, namely, Repeating, Using mechanical means, Associating,
Linking with prior knowledge, and Using imagery.
Schmitt’s (1997) system seems to be the most comprehensive VLS taxonomy to date, and is a useful attempt
to display where general LLS and VLS intersect. However, Schmitt’s (1997) system still has its weaknesses. Firstly,
it does not include affective strategies. Secondly, a number of items fall into more than one subcategory; for
instance, ‘Mashcards’ is grouped into both Determination Strategies and Cognitive Strategies. This would cause
confusion in de#ning and classifying strategy categories. Lastly, there is no clear-cut distinction between
Discovery and Consolidation strategies.
While the systems discussed above are all based on theoretical induction, Stoffer’s (1995) study attempted to
categorise strategies from an empirical study of her own. She developed an inventory of nine categories from the
analysis of the data from a self-composed 53-item vocabulary learning strategy questionnaires. The nine factors
resulting from a factor analysis are listed below:
1. Strategies involving authentic language use
2. Strategies involving creative activities
3. Strategies used for self-motivation
4. Strategies used to create mental linkages
5. Memory strategies
6. Visual/auditory strategies
7. Strategies involving physical action
8. Strategies used to overcome anxiety
9. Strategies used to organize words
However, such classi#cation tends to result in an unidenti#able group of strategies in each factor. For
example, Item 13 ‘Use rhymes to remember new words’ falls into three factors: 5, 6 and 7. Item 18 ‘Break lists
into smaller parts’ falls into both factors 5 and 9.
Considering all the strengths and limitations existing in the above classi#cation systems, we developed an all-
encompassing inventory.
Table 2
Classi:cation of VLS in This Study
Categories Subcategories Descriptions/De#nitions
Metacognitive Paying Attention Deciding in advance to pay attention in general to a vocabulary learning
Strategies task and to ignore distractions by directed attention, and/or to pay
(MET) attention to speci#c aspects of vocabulary learning tasks or to situational
details.
Arranging & Planning Involving #nding out about vocabulary learning, organising the schedule,
setting goals and objectives, considering task purposes, planning for tasks,
and seeking chances to practise words.
Monitoring & Evaluation Identifying errors in understanding or producing the new word, tracking
the source of important errors, trying to eliminate such errors, and
evaluating one’s own progress in vocabulary learning.
Cognitive Guessing Seeking and using linguistic or other (i.e., background knowledge) clues in
Strategies order to guess the meaning of a new word.
(COG) Using Dictionaries Using dictionaries as a resource to #nd out the meaning and use of a new
word, and ways of looking up a word in the dictionary.
Using Study Aids Using resources other than dictionaries to help learn or practise new words.
Taking Notes Putting synonyms or antonyms together in the notebook, or writing down
the meaning of vocabulary when it is thought to be commonly used or
interesting, when it is looked up in the dictionary, or when it can help
distinguish between the meanings of words.
Repetition Saying, listening to, or writing a new word over and over.
Word Lists Using word lists and Mashcards for the initial exposure to a word and
reviewing it afterwards.
Activation Practising new words in listening, speaking, reading and writing, and
practising new words in imaginary/realistic settings.
Memory Grouping Classifying words based on topic, type of word, practical function,
Strategies similarity and opposition, etc.
(MEM) Association/Elaboration Relating new words to known words or concepts, or relating one piece of
information to another, to create associations in memory.
Word Structure Structurally analysing a new word to determine or consolidate its meaning.
Auditory Encoding Representing a new word’s phonological form to facilitate recall by creating
a meaningful, sound-based association between new words and known
words, using phonetic spelling, and using rhymes.
Semantic Encoding Producing semantic networks or grids to remember words.
Contextual Encoding Memorising new words in a context.
Structured Reviewing Going over new words soon after the initial meeting, and then at carefully
planned intervals.
Using Keywords Remembering a new word by using auditory and visual links.
Paraphrasing Reformulating a word’s meaning to improve recall of the word.
Physical Action Physically acting out a new word, or meaningfully relating a new word to a
physical feeling or sensation.
Socio-affective Questioning for Asking others to explain, paraphrase, correct, or give examples.
Strategies Clari#cation/Correction
(SOC) Cooperation Working with peers or pro#cient English users inside and/or outside class.
Managing Emotion Relaxing, encouraging and rewarding oneself, paying attention to signals
given by body, and discussing feelings with someone else.
Socio-affective strategies relate to social and affective domains. They are interrelated and complementary
and are not mutually exclusive, as for language learners, especially Chinese EFL learners, who are inclined to be
shy and reticent, while using social strategies like ‘I interact with native speakers’, tend to use strategies relating to
affect or emotion, like ‘I try to relax whenever I am afraid of using a word.’ at the same time, to keep the
conversation going. This is the reason why the two dimensions are classi#ed into one single group of strategies.
Socio-affective Strategies are thus referred to as the ways in which learners choose to interact with others or
ideationally control over affect (O'Malley & Chamot, 1990). They include Questioning for
Clari#cation/Correction, Cooperation, and Managing Emotion.
As a result, the four strategy categories are further divided into 25 subcategories. The four categories and
their subcategories underlay the basic framework for the SIVL.
Oxford’s (1990) SILL (Version for Speakers of Other Language Learning English)
Oxford developed the SILL based on her strategy taxonomy. This has been the most popular and practical
instrument for assessing language learning strategy use in different cultural ESL/EFL contexts. The SILL is a 5-
point Likert-type scale containing 50 individual items, divided into six parts as discussed above. With regard to
the psychometric properties of the instrument, much evidence has shown that the SILL appears to have utility,
reliability and validity in varying EFL contexts. The SILL proved to be very useful particularly in EFL
classrooms, with the main goal of revealing the relationship between strategy use and language performance,
between strategy use and individual differences such as gender, motivation, learning styles, etc. (Oxford & Burry-
Stock, 1995). Dörnyei (2005) also admits that the SILL is ‘a useful instrument for raising student awareness of L2
learning strategies and for initiating class discussions’ (p. 183). In addition, the reliability of the SILL has been
checked across many cultural groups. For example, in the Taiwanese/Chinese EFL context, the SILL has
obtained a high reliability coef#cient (Cronbach alpha) of .94 (Yang, 1999). As for the criterion-related and
construct validities of the SILL, there is considerable evidence mainly based on ‘its predictive and correlative link
with language performance (course grades, standardised test scores, ratings of pro#ciency), as well as its
con#rmed relationship to sensory preferences’ (Oxford & Burry-Stock, 1995, p. 1).
section includes 91 items, divided into two broad categories: Metacognitive Regulation, and Cognitive Strategies.
Metacognitive Regulation was further categorised into Selective Attention (7 items) and Self-initiation (5 items).
Cognitive Strategies were further categorised into 6 main groups. The internal consistency reliabilities of the
majority of the categories and subcategories in the Vocabulary Learning Strategies section were over .60
(Cronbach alpha), as suggested by Dörnyei (2003). Evidence for the validity of the instrument could be assumed
to some extent due to the fact that ‘the questionnaire, written in Chinese, reMected previous quantitative and
qualitative research (e.g., Ahmed, 1989; Oxford, 1990; Politzer & McGroarty, 1985) and item analyses that
removed redundant items from two earlier pilot versions’ (Gu & Johnson, 1996, p.648). However, it should be
noted that although this questionnaire was developed particularly for Chinese university EFL learners, it has
lacked items related to social or affective strategies.
Procedure
The 110-item SIVL was administered to 125 randomly-selected undergraduates at a Chinese university. After an
initial elimination of unusable data, 107 valid cases remained, including 59 males and 48 females. Most students
spent about 20 minutes #nishing the questionnaires.
Three statistical methods were employed to reduce the strategy items. i.e., item analysis by using reliability
procedure, descriptive analysis, and correlation analysis. First, an item analysis for a single construct was
conducted. Items whose item-to-total correlations were less than 0.30 were considered to be removed, as
according to Denscombe (2003, p. 263), any correlation coef#cient between 0.30 and 0.70 (plus or minus) is
generally regarded as a reasonable correlation between two variables. Descriptive statistics were then used to
obtain the means of the remaining individual items; the items whose means were less than 2.35 were deleted.
The reasons for setting 2.35 as the cut-point were two-fold. First, the gap between the means above and below it
was relatively clear and wide (0.06), compared to other possible cut-off points among the individual strategies of
low frequency use. Second, after some more items were deleted at this cut-point, the remaining in the SIVL
could still reMect a comprehensive pro#le in strategy frequency use. Next, correlation analyses (Spearman rho) on
the remaining items in the SIVL were executed to test the relationships between individual items and the four
strategy categories. Each item whose correlation with its corresponding strategy category was weaker than those
with any of the other three strategy categories was then under the consideration of being omitted from the SIVL.
Lastly, reliability analysis was run to assess the reliability and validity of the newly-developed SIVL. A
combination of item analysis with correlation analysis for multiple sub-constructs was run to validate the
theoretically assumed four-fold categorization system in the SIVL (cf: Green & Salkind, 2003).
All the procedures above resulted in a new 72-item SIVL, involving 16 items under Metacognitive Strategies,
25 under Cognitive Strategies, 24 under Memory Strategies, and 7 under Socio-affective Strategies (See
Appendix A).
Table 3 provides a detailed assessment of the model in terms of model #t criteria, levels of acceptable #t,
and evaluation of the instrument (i.e., the SIVL). In addition to Bagozzi and Yi (1988), the criteria provided in
Table 3 also referred to Bagozzi (1981), Byrne (2001), Doll, Xia, and Torkzadeh (1994), Hair et al. (1998), Marsh
and Hocevar (1985), and Tabachnick and Fidell (2001).
In terms of the preliminary #t criteria, three aspects of statistics were checked: the correlations among the
four variables were very good, ranging between .60 and .73; factor loadings fell within the acceptable range,
varying between .71 and .87, and their standard errors were appropriate. These results suggested that no
redundant variables existed; the four subscales (i.e., strategy categories) seemed to be distinguished from each
other properly, and the construct validity of the SIVL could be acceptable.
Regarding the overall model #t, three clusters of goodness-of-#t measures were adopted. The #rst cluster of
#t statistics yielded a χ2 value of 5.06 with 2 degree of freedom and a probability greater than .05 (p = .08), a
standardised RMR value of 0.01, and a RMSEA value of 0.054, thereby suggesting that the hypothesised model
was an adequate representation of the sample data and could be accepted. Besides, both GFI (.995) and AGFI
(.977), basically comparing the hypothesised model with the null model, were consistent in reMecting a good #t to
the sample data.
Regarding the second set of #t statistics, a number of incremental/comparative #t indices were all far
beyond the suggested value (>.90). Both NFI and CFI, comparing the hypothesised model with the
independence model and providing a measure of complete covariation in the data (Byrne, 2001), were .996 and .
997 respectively, as shown in Table 3, consistently indicating that the hypothesised model represented an
excellent #t to the sample data. The IFI, tackling the issues of parsimony and sample size and acknowledged to
be linked to the NFI, was .997, uniformly pointing to a well-#tting model. The TLI/NNFI, like the other indices
discussed above, produces values ranging from zero to 1.00, with values close to .95 (for large samples) reMecting
good #t (Hu & Bentler, 1999). Accordingly, its value of .992 for the hypothesised model once again suggested an
excellent #t. Although those indices discussed above evaluated a model from slightly different perspectives, they
unanimously suggested that the hypothesised model was appropriate and signi#cant.
Table 3
Evaluation of the Measurement Model
Model Fit Criteria Levels of Acceptable Fit Evaluation of the SIVL
Preliminary Fit Criteria
Correlations among variables Not too close to or greater than 1.00 Very good (.60 ~.73)
Factor loadings .50<λ<.95 Very good (see Figure 1)
Standard errors Absence of too large or small Very good (.04 ~.06)
standard errors
Overall Model Fit
Chi-square Value Nonsigni#cant with Good (χ 2=5.06, df=2, p=.08)
p-value≥.05
χ 2/df ≤2~5 Good (2.53)
Standardised Root mean square residual ≤ .05 Good (0.01)
(RMR)
Goodness of #t index (GFI) >.90 Very good (.995)
Adjusted goodness of #t index (AGFI) >.90 Very good (.977)
Incremental #t index (IFI) >.90 Very good (.997)
Normed #t index (NFI) >.90 Very good (.996)
Comparative #t index (CFI) >.90 Very good (.997)
Tucker-Lewis index (TLI) Close to .95 Very good (.992)
Root mean square error of approximation <.05 ~.08 Good (.054)
(RMSEA)
Hoelter’s Critical N (CN) Hoelter’s .05 and .01 CN values > Very good (N=625 at .05, N=961
200 at .01)
Ratio of sample size to number of free Ratio > 5:1 Very good (ratio≈66:1)
parameters
Fit of Internal Structure of a Model
Individual item reliability Pi≥.50 Good (see Figure 6.1)
Composite reliability Pc≥.70 Very good (.89)
Variance extracted ≥.50 Very good (.66)
Signi#cant parameter estimates t value > ±1.96 at p<.05, or t value Very good (all > 17 at p <.01)
con#rming hypotheses > ±2.576 at p<.01
Regarding the last set of #t statistics, Hoelter’s Critical N (CN) (labelled as Hoelter’s .05 and .01 indices), is
considerably different from those discussed earlier due to the fact that its focus is directly on the adequacy of
sample size rather than on model #t (Byrne, 2001). In other words, it is to estimate a sample size that would be
large enough to yield an adequate model #t for a test. A value over 200 indicates that a model suf#ciently
represents the sample data (Hoelter, 1983). As displayed in Table 3, both the .05 and .01 CN values for the
hypothesised model were in excess of 200 (625 and 961 respectively). This #nding indicates that the sample size
in this study (N = 528) was satisfactory. Moreover, the ratio of sample size to number of free parameters was
66:1, showing additional evidence for the hypothesised model being well-#tting and meaningful.
All the results and #ndings regarding overall model #t seemed to address an overall adequacy of the
hypothesised model; that is, this model was an excellent representation of the sample data. Nevertheless,
information on the nature of individual parameters and other aspects of the internal structure of a model did
not seem to be explicitly provided. It is critical to look at such information for the present situation, as there was
still a possibility of certain parameters corresponding to hypothesised relations being nonsigni#cant, and/or
measures of low reliability existing even when the overall model #t reMected a satisfactory model (Bagozzi & Yi,
1988). In other words, having the overall model #t is a necessary but insuf#cient proof of model adequacy.
Therefore, the #t of the internal structure of a model was scrutinised for the reliability of the construct. As listed
in Table 3, four criterion aspects were examined, i.e., individual item reliability, composite reliability of a whole
scale, average variance extracted from a set of measures of a latent variable, and signi#cant parameter estimates
con#rming hypotheses. While the individual item reliabilities, i.e., the squared multiple correlations of the four
indicators, and signi#cant parameter estimates can be obtained directly from Amos 20, the composite reliability
and average variance extracted need to be calculated manually by using the following two formulas:
1.
(Sum of standardised loadings)2
Construct reliability =
(Sum of standardised loadings)2+Sum of indicator measurement error
2.
Sum of squared standardised loadings
Variance extracted =
Sum of squared standardised loadings + Sum of indicator measurement error
As shown in Table 3, the individual item reliabilities for the four strategy categories ranged from moderate to
high in value. The composite reliability was quite high, with the value of .89 greatly exceeding the recommended
threshold value of .70. As a complementary measure to the construct reliability value, the overall amount of
variance extracted in the four indicators (i.e., strategy categories) accounted for by the latent construct of VLS
reached 66%, which also went beyond the suggested level of .50. These results imply that the SIVL is a practical
construct with a satisfactory overall reliability. As for parameter estimates, all the parameter estimates turned out
to be signi#cant at the level of .01, with all the t-values greater than ±2.576 (all actually above 17 at p<.01). This
#nding suggests that all the four indicators were justi#able and key to the hypothesised model (Bagozzi & Yi,
1988; Byrne, 2001).
The evaluation of the measurement model discussed above established the construct reliability and validity
of the SIVL but did not explicitly provide information in terms of the unidimensionality of the scale. An EFA
using principal axis factoring method was conducted for this purpose. The EFA results revealed that the majority
of the variance was explained by one single factor (above 74%), and the eigenvlaue of the second largest factor
was marginal in comparison with the #rst (.45 vs 2.98). The factor loadings of the four strategy categories on the
one unrotated factor were .83 for Metacognitive Strategies, .87 for Cognitive Strategies, .83 for Memory
Strategies, and .71 for Socio-affective Strategies, which displayed a consistently high pattern. All the above results
provide good evidence for the unidimensionality of the scale; that is, the four strategy categories tapped into one
single underlying trait.
After the unidimensionality of the instrument was ensured, it would be more justi#able to assess its internal
consistency reliability using Cronbach alpha, as one of the assumptions of Cronbach alpha is that the
unidimensionality of a scale exists (Hair et al., 1998). As a whole, the 72-item SIVL turned out to have very good
internal consistency reliability, with the Cronbach alpha index of being .95. The theoretically assumed four
strategy categories were also revealed to statistically have consistent reliability, as the Cronbach alpha indices for
each category were acceptable, with .84 (MET), .89 (COG), .91(MEM) and .75 (SOC) respectively, all beyond the
recommended threshold level of .60 (Dörnyei, 2003).
The last stage of the validation procedures was to statistically explore the theoretically assigned subcategories
within each of the four strategy categories by running an EFA.
Within Metacognitive Strategies, four factors were retained, explaining a total variance of about 54%. Table
4 demonstrates the factor loadings of each item on its corresponding factor(s). Although four items seemed to
load on two factors, their loadings on one of the factors (e.g. .72 for Item 9 on F1) were far beyond the ones on
the other (e.g. .35 for Item 9 on F4). Therefore, Metacognitive Strategies could be decomposed into four
subcategories, which were identi#ed and labelled as follows:
Organising and Monitoring (F1, 5 items, i.e., Items 2, 9-10, 15-16)
Directed Attention (F2, 4 items, i.e., Items 11-14)
Selective Attention (F3, 4 items, i.e., Items 3-6)
Learning to Learn (F4, 3 items, i.e., Items 1, 7-8)
Consequently, the three theoretically assumed subcategories (Paying Attention (Items 1-6), Arranging &
Planning (Items 7-13), and Monitoring & Evaluation (Items 14-16) were replaced by the four newly yielded
counterparts, which seemed to be supported both statistically and practically.
Table 4
Factor Loadings of the Four Subcategories within Metacognitive Strategies
Metacognitive Strategies
Item Brief Description F1 F2 F3 F4
stra9 Plan schedule to have enough time for word study .72 .35
stra10 Have clear goals of improving vocabulary .65 .32
stra16 Self-test vocabulary .62
stra15 Think about progress in learning words .62 .33
stra2 Break lists into parts .47
stra12 Use various means to make clear unsure words .72
stra11 Care about words the teacher doesn’t emphasise .68
stra14 Aware when I incorrectly used a word and use the information to do better .66
stra13 Associate a new word with a known one that sounds similar .63
stra4 Know when a new word is essential for comprehension .72
stra3 Know when to skip a new word .72
stra5 Know important words for learning .70
stra6 Look up interesting words .46 .37
stra7 Try to #nd ways as many as possible to use new words .73
stra8 Try to #nd ways to become a better word learner .30 .72
stra1 Pay attention to vocabulary use in speech .49
Within Cognitive Strategies, seven factors were extracted, accounting for a total variance of over 63%. Seven
out of 25 items turned out to load on two factors. We decided to cluster each of them into the factor on which it
had a higher loading, although one of the 7 items (i.e., Item 26) had a high loading on both of the two factors (F4
and F6). This can be that F4 and F6 are both about referring to resources. The results turned out to be generally
consistent with the theoretically assigned subcategories, except that Using Dictionaries were split into two factors
(i.e., F3 and F4). On a closer look, the 5 items pooled together under F3 seemed to focus on referring to
dictionaries as a lexical resource, while the 4 items under F4 seemed to be more concerned about how to look up
a word. Consequently, the seven factors within Cognitive Strategies were identi#ed and labelled as follows:
Activation (F1, 5 items, i.e., Items 37-40)
Guessing (F2, 4 items, i.e., Items 17-20)
Choosing Dictionaries as a Lexical Resource (F3, 5 items, i.e., Items 21-25)
Looking Up (F4, 4 items, i.e., Items 26-29)
Taking Notes (F5, 3 items, i.e., Items 32-34)
Using Study Aids (F6, 2 items, i.e., Items 30-31)
Repetition (F7, 2 items, i.e., Items 35-36)
Within Memory Strategies, #ve factors, accounting for a total variance of over 54%, were retained. 14 out of
24 items loaded on more than one factors. 13 items were located under whichever factor they had a higher
loading on. The only item (i.e., Item 61), loaded on two factors, i.e., F2 and F3, was put under F3 which had a
slightly lower loading with it than F2 (.48 and .43, respectively). As a result, the #ve factors were identi#ed and
labelled as:
Association/Elaboration (F1, 7 items, i.e., Items 43-46, 48-50)
Word Structure (F2, 4 items, i.e., Items 42, 51-53)
Other Memory Strategies (F3, 6 items, i.e., Items 60-65)
Applying Images (F4, 4 items, i.e., Items 47, 54-55, 59)
Visual Encoding (F5, 3 items, i.e., Items 56-58)
Compared with the theoretically assumed subcategories within Memory Strategies, three factors (i.e., F1, F2
& F5) were named after three of the theoretically assumed subcategories, as they were in general consistent with
each other, although several individual items under the three subcategories were relocated. F4 was termed as
Applying Images in that the four items under it were all concerned with using images to memorise vocabulary. As
for F3, involving the three items theoretically assigned to Contextual Encoding, and the other three originally
representing three theoretically assumed subcategories (i.e., Reviewing, Using Keywords, and Paraphrasing), it
seemed unlikely to label it in the way of labelling other factors whose items were much more easily found to have
something in common. Therefore, we labelled it as Other Memory Strategies.
Within Socio-affective Strategies, two clear factors were extracted, explaining a total variance of about 61%.
This #nding turned out to be in accordance with the theoretically assumed two subcategories: Questioning for
Clari#cation (F2, 2 items, i.e., Items 66-67) and Managing Emotion (F1, 5 items, i.e., Items 68-72).
Therefore, it seems that the theoretically assumed subcategories within each of the four strategy categories
are generally supported by the results from factor analysis. On the one hand, the results provide plenty of
evidence for the existence of the theoretically assumed subcategories within Cognitive Strategies and Socio-
affective Strategies. On the other hand, the four categories within Metacognitive Strategies resulted from factor
analysis seem to be more justi#able than the original three subcategories, although at the same time, they do
share similarities to a certain extent. Memory Strategies turned out to be a more complicated category with
multiple subcategories.
Conclusion
The Strategy Inventory for Vocabulary Learning (SIVL) was developed through three stages. In the #rst stage,
170 items were pooled from various existing inventories and was reduced to 110 items. In the second stage, the
instrument was shortened by mainly using the results of descriptive statistics and item analysis, and was then
validated by using reliability analysis, and a combination of item analysis and correlation analysis. As a result, a
shorter version of the 72-item SIVL emerged with good reliability and content-related and construct-related
validities. Finally, the psychometric properties of the re#ned SIVL were assessed by using con#rmatory and
exploratory factor analyses. The results revealed that the SIVL had satisfactory psychometric features and that
the hypothesised theoretical model had a good #t to the sample data. This con#rming evidence implies that the
SIVL can serve as a reliable and valid research instrument for evaluating Chinese EFL learners’ vocabulary
learning strategy use at the tertiary level.
References
Ahmed, M. O. (1989). Vocabulary learning strategies. In P. Meara (Ed.), Beyond Words (pp. 3-14). London: CILT.
Bagozzi, R. P. (1981). An examination of the validity of two models of attitude. Multivariate Behavioural Research,
16, 323-359.
Bagozzi, R. P., & Yi, Y. (1988). On the evaluation of structural equation models. Journal of the Academy of Marketing
Science, 16(1), 74-94.
Byrne, B. M. (2001). Structural Equation Modeling with AMOS: basic concepts, applications, and programming. Mahwah,
New Jersey: Lawrence Erlbaum Associates.
Carter, R. (1998). Vocabulary: applied linguistic perspectives (2nd ed.). London: Routledge.
Chester#eld, R., & Chester#eld, K. B. (1985). Natural order in children's use of second language learning
strategies. Applied Linguistics, 6(1), 45-59.
Coady, J., & Huckin, T. (Eds.). (1997). Second Language Vocabulary Acquisition. Cambridge: CUP.
Dörnyei, Z. (2003). Questionnaires in Second Language Research: construction, administration, and processing. Mahwah, New
Jersey: Lawrence Erlbaum Associates, Inc.
Dörnyei, Z. (2005). The Psychology of the Language Learner: individual differences in second language acquisition. Mahwah,
NJ: Lawrence Erlbaum.
Denscombe, M. (2003). The Good Research Guide: for small-scale social research projects (2nd ed.). Buckingham: Open
University Press.
DeVellis, R. F. (1991). Scale Development Theory and Applications. London: Sage Publications.
Doll, W. J., Xia, W., & Torkzadeh, G. (1994). A con#rmatory factor analysis of the end-user computing
satisfaction instrument. MIS Quarterly, 18(4), 453-461.
Gefen, D. (2003). Assessing unidimensionality through LISREL: an explanation and example. Communications of
the Association for Information Systems, 12, 23-47.
Green, S. B., & Salkind, N. J. (2003). Using SPSS for Windows and Macintosh: analyzing and understanding data (3rd ed.).
Upper Saddle River: Prentice Hall.
Gu, Y., & Johnson, R. K. (1996). Vocabulary learning strategies and language learning outcomes. Language
Learning, 46(4), 643-679.
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate Data Analysis (5th ed.). Upper
Saddle River, New Jersey: Prentice-Hall.
Hatch, E., & Lazaraton, A. (1991). Design and Statistics for Applied Linguistics: the research manual. New York: Newbury
House Publishers.
Hoelter, J. W. (1983). The analysis of covariance structures: goodness-of-#t indices. Sociological Methods & Research,
11, 325-344.
Hsiao, T., & Oxford, R. (2002). Comparing theories of language learning strategies: a con#rmatory factor
analysis. The Modern Language Journal, 86, 368-383.
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for #t indexes in covariance structure analysis: conventional
criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1-55.
Kudo, Y. (1999). L2 vocabulary learning strategies (NFLRC NetWork #14) [HTML document]. Honolulu:
University of Hawai`i, Second Language Teaching & Curriculum Center. Retrieved [2003-06-24], from
http://www.nMrc.hawaii.edu/NetWorks/NW14/
Manyak, P. C. (2010). Vocabulary instruction for English learners: Lessons from MCVIP The Reading Teacher,
64(2), 143-147.
Marsh, H. W., & Hocevar, D. (1985). Application of con#rmatory factor analysis to the study of self-concept:
#rst- and higher order factor models and their invariance across groups. Psychological Bulletin, 97(3), 562-
582.
Meara, P. (1980). Vocabulary acquisition: a neglected aspect of language learning. Language Teaching and Linguistics:
abstracts, 15(4), 221-246.
Meara, P. (1995). The importance of an early emphasis on L2 vocabulary. The Language Teacher, 19(2), available at
http://jalt-publications.org/tlt/#les/95/feb/meara.html (date of access: 19 January 2005).
Meara, P. (2005). Lexical frequency pro#les: A Monte Carlo analysis. Applied Linguistics, 26(1), 32-47.
Nation, I. S. P. (1990). Teaching and Learning Vocabulary. Boston: Heinle & Heinle.
Nation, I. S. P. (2006). Vocabulary: Second Language. In K. Brown (Ed.), Encyclopedia of Language & Linguistics
(2nd ed., pp. 448–454). Oxford: Elsevier.
Nation, I. S. P., & Newton, J. (1997). Teaching vocabulary. In J. Coady & T. Huckin (Eds.), Second Language
Vocabulary Acquisition (pp. 238-254). Cambridge: CUP.
O'Malley, J. M., & Chamot, A. U. (1990). Learning Strategies in Second Language Acquisition. Cambridge: CUP.
Oxford, R. (1990). Language Learning Strategies: what every teacher should know. Boston: Heinle & Heinle.
Oxford, R., & Burry-Stock, J. (1995). Assessing the use of language learning strategies worldwide with the
ESL/EFL version of the strategy inventory for language learning (SILL). System, 23(1), 1-23.
Politzer, R., & McGroarty, M. (1985). An exploratory study of learning behaviors and their relationship to gains
in linguistic and communicative competence. TESOL Quarterly, 19, 103-123.
Read, J. (2000). Assessing Vocabulary. Cambridge: CUP.
Schmitt, N. (1997). Vocabulary learning strategies. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: description,
acquisition and pedagogy (pp. 199-227). Cambridge: CUP.
Schmitt, N. (2000). Vocabulary Language Teaching. Cambridge: CUP.
Schmitt, N., & McCarthy, M. (Eds.). (1997). Vocabulary: description, acquisition and pedagogy. Cambridge: CUP.
Schunk, D. H. (2001). Social cognitive theory and self-regulated learning. In B. J. Zimmerman & D. H. Schunk
(Eds.), Self-regulated Learning and Academic Achievement: theoretical perspectives (2nd ed., pp. 125-151). Mahwah,
NJ: Lawrence Erlbaum Associates.
Stoffer, I. (1995). University Foreign Language Students' Choice of Vocabulary Learning Strategies as Related to Individual
Difference variables. Doctoral Dissertation, the University of Alabama, Tuscaloosa, Alabama.
Tabachnick, B. G., & Fidell, L. S. (2001). Using Multivariate Statistics (4th ed.). Boston: Allyn and Bacon.
Tseng, W., Dörnyei, Z., & Schmitt, N. (2006). A new approach to assessing strategic learning: the case of self-
regulation in vocabulary acquisition. Applied Linguistics, 27(1), 78-102.
Waring, R. (2002). Basic principles and practice in vocabulary instruction Retrieved 19 January, 2005, from
http://www.jalt-publications.org/tlt/articles/2002/07/waring
Yang, N. (1999). The relationship between EFL learners' beliefs and learning strategy use. System, 27, 515-535.
Zimmerman, B. J. (1989). A social cognitive view of self-regulated academic learning. Journal of Educational
Psychology, 81, 329-339.
Zimmerman, B. J. (2000). Attaining self-regulation: a social cognitive perspective. In M. Boekaerts, P. R. Pintrich
& M. Zeidner (Eds.), Handbook of Self-regulation (pp. 13-39). San Diego: Academic Press.
Appendix A
The 72-item SIVL
29. I try to integrate dictionary de#nitions into the context where the unknown was met and arrive at a
contextual meaning by adjusting for complementation and collocation, part of speech, and breadth of
meaning.
30. I use audio, video, computer aids to learn or consolidate my vocabulary.
31. I learn words written on commercial items.
32. I make a note of the meaning of a new word when I think it is commonly-used or interesting.
33. I take notes when I look up a word.
34. I make notes when I want to help myself distinguish between the meanings of two or more words.
35. I remember a new word by saying it repeatedly.
36. I memorise a new word by writing it repeatedly.
37. I try to read as much as possible so that I can make use of the words I tried to remember.
38. I make up my own sentences using the words I just learned.
39. I try to use the newly learned words as much as possible in speech and writing.
40. I try to use newly learned words in real situations.
41. I try to use newly learned words in imaginary situations in my mind.
61. I deliberately read books in my areas of interest so that I can #nd out and remember the special terminology
that I know in Chinese.
62. I associate a new word with its preceding/following words to remember it better.
63. I review new words soon after the initial meeting.
64. I link new words to similar sounding Chinese words.
65. I paraphrase the word’s meaning.
Dr. Wen-Cheng Hsu obtained his PhD from the University of Nottingham. His teaching experience spans more
than 15 years across various levels and cultures. He is now teaching in the Language Centre, Xi’an Jiaotong-
Liverpool University. His research interests revolve around learning strategies, learner autonomy, teacher
autonomy, and EAP.
Jelena Colovic-Markovic*
West Chester University of Pennsylvania, USA
Abstract
This study attempts to determine whether the students who receive explicit instruction make more gains in their abilities to
use topic-induced phrases in their writing than those who do not. Additionally, through interviews with a selected group of
students from the treatment group, the study attempts to glean insights into the approaches learners use for written
production of the target phrases. Data was collected from 54 ESL students in high-intermediate writing classes at an IEP
who were assigned to the contrast (N=19) and treatment (N=35) groups based on their class enrollment. Over a period of
four days, the treatment group received training on 15 target structures. The contrast group received no vocabulary
instruction. Both groups were exposed to the target phrases through reading the same course materials and discussing them
in class. The data included the scores participants received on the production of the target structures in their essays at the
beginning and end of term. A repeated-measures ANOVA revealed that while both groups made improvement, it was the
treatment group that made more signi1cant gains in their abilities to produce topic-induced phrases than the contrast. The
interviews’ 1ndings indicated the students’ perceptions of the usefulness of the target structures may in3uence whether or
not learners employ them in writing. The study 1ndings suggest that explicit instruction is helpful for the writers’ abilities to
produce topic-induced phrases. These 1ndings have implications for ESL writing pedagogy.
Key words: explicit instruction, topic-induced phrases, topic-related vocabulary, ESL writing.
* Tel: + 1 610-436-3371; E-mail: jmarkovic@wcupa.edu; 233 Mitchell Hall, West Chester University of Pennsylvania, West
Chester, PA 19383, United States of America
The importance of vocabulary in writing is also seen from the perspective of ESL learners. In a survey that
Leki and Carson (1994) employed with 128 ESL undergraduate students to gather data on the student perceived
effectiveness of an English for academic purposes writing course, learners reported that it was vocabulary
instruction that they had needed the most. Similarly, and more recently, in an interview that Coxhead (2012)
conducted with learners of English as an additional language in New Zealand, learners reported the need for
technical, academic, or professional words to express their ideas in writing.
The evidence coming from literature on vocabulary and writing, research on the factors contributing the
ESL essay quality, assessment tools used in evaluation of ESL essays, and students’ perceptions of what needs to
be included in the ESL writing instruction emphasize the need for focused attention on vocabulary in ESL
writing instruction.
Research Questions
To 1ll this gap in literature, the present study aims to answer the two questions presented below. The 1rst
question is addressed through quantitative and the second through qualitative data elicitation and analysis, as
described in the next section of the paper.
1. Do the students who receive explicit instruction make more gains in their abilities to use topic-induced
phrases in their writing than those who do not?
2. If so, how do the students receiving explicit instruction go about producing topic-induced phrases in
their writing?
Methodology
Overview of the research design
This study was a part of a larger study investigating the effects of explicit teaching of multi-word phrases on ESL
writers. The research project uses a quasi-experimental design in which the study participants are assigned to
treatment and contrast groups based on the class in which they are enrolled. The study was conducted in writing
classes for intermediate-level pro1ciency students at an Intensive English Program (IEP). Instructional periods in
the IEP are divided into terms of eight weeks, with two terms occurring each semester. The writing teacher
taught the contrast group 1rst and then the treatment group.
The class focused on writing argumentative essays. For the writing course, participants wrote three multi-
draft essays. Both groups followed the same syllabus. They read and discussed the same reference materials prior
to submission of the 1nal draft of each essay. They completed the same activities from the textbook for the
course and were taught by the same instructor, who was different from the researcher, to reduce the effects of the
teacher variable on the results. The contrast and treatment groups were given the same composition assignments.
As noted previously, there were three multidraft essays students wrote for the class. This study concerns the
students’ essays written on the topic examined third for the class. The treatment group was taught topic-induced
phrases on the topics of the two other essays prior to submission of their respective 1nal drafts.
The contrast group received no explicit instruction on topic-induced phrases. The teacher was directed to
instruct students in this group as she had been doing prior to the participation in the present study. It is possible,
however, that the teacher explained the meaning of speci1c vocabulary including the target items when students
asked about or appeared confused by some words during post-reading activities and in-class discussions. The
group was exposed to the target phrases only through reading, in-class discussions and textbook activities, that is,
in a manner of delivery that the teacher had been using prior to the present study.
Besides gathering data for quantitative analysis, the study attempts to glean insights into the approaches
learners used for written production of the target phrases, speci1cally the strategies that would distinguish
between the learners whose production was limited (low performing) from those whose production was more
extensive (high performing). For this purpose, individual semistructured interviews were conducted at the end of
the treatment with a subset of students from the treatment group. The interview questions have a reference to
the students’ writing topic and are as follows:
a) How did you go about incorporating the phrases about international adoptions in your 1rst/second
essay?;
b) In the writing class, your teacher used many different activities to help you learn the phrases on the topic
of international adoptions. In your opinion, which of these activities helped you learn the phrases best?; and
d) Which of the activities were not helpful to you?
The wording and sequence of the interview questions remained the same for each informant; however,
probes were used to elicit additional information as the need arose.
Participants
Data was collected from 54 ESL students from 1ve intact high-intermediate writing classes at an Intensive
English Program (IEP) in the western United States. The ESL courses at the IEP are designed to support
development of language skills for academic studies primarily, but also professional communication. The study
participants had all taken a standardized English pro1ciency placement exam for the IEP. Some directly placed
in the high-intermediate level class by the internal placement test and others moved from the intermediate to the
high-intermediate level after successfully passing the 1nal exams in the previous level. There were 19 students in
the contrast and 35 students in the treatment group. The participants came from various language backgrounds
(Arabic=11, Bambara=1, French=1, Japanese=26, Korean=6, Mandarin=1, Portuguese=1, Russian=1,
Spanish=2, Thai=2, and Turkish=1). 41% were male and 59% were female. 46% of the participants were under
the age of 20, 50% were between the ages of 21 and 30, 2% were between the ages 31 and 40, and 2% were
over the age of 41.
Target Items
The target topic-induced word combinations were taken from the passages in Numrich’s (2009) Raise the Issues: An
integrated approach to critical thinking, Unit 3. The texts were a part of required reading materials on the topic of
international adoption. The target items were initially located by using KeyWords extractor v.1 (2007), N-gram
Phrase Extractor v.4 (Cobb, 2010) and subsequently submitted to manual investigation. Both programs were
available at no cost at www.lexitutor.com. KeyWords Extractor v.1, a lexical computer software used to identify
single words that with unusual frequency appear in a text when compared to a reference, calculates the word
frequencies on a per million word basis and uses the Brown corpus, a corpus of one million words of American
English, as a reference. The N-gram Phrase Extractor program generates a list of n-grams occurring with the
frequency of two and higher in the texts under investigation.
The words identi1ed through KeyWords Extractor v. 1 (2007) were manually compared to the formulaic
sequences produced by N-gram Phrase Extractor (Cobb, 2010) analysis because there were several instances in
which the words the former program identi1ed as key words were not found in the list generated by the N-gram
Phrase Extractor program. Since the writing assignment required that the students write in favor of or opposition
to international adoptions, it was important to select word combinations that could be used in support of both
sides of the controversial issue. The subsequent manual investigation yielded additional word combinations that
were included in the 1nal list of target vocabulary items. There were 15 topic-induced word combinations used
in explicit instruction (i.e., a victim/s of violence; adoption agency/ies; corruption in a country/adoption; criteria for adoption;
foreign adoptions; inter-country adoptions; international adoptions; orphaned children; place a child for adoption; place a child in a
foreign family; prospective adoptive parents; reopen adoption to foreigners; requirements for adoption; to be adopted into; to be placed
with a family/families).
Table 1
Overview of the Research Design
Week Treatment group Contrast group
2 Data collection (pre-test)
7-8 Explicit teaching No explicit teaching
8 Data collection (post-test)
Over a period of four classes, the students completed 1ve multi-step activities (see Table 2). They spent
about 60 minutes of class time on the activities. The teacher referred to the topic-induced word combinations as
“phrases”, monitored students’ production of the target phrases and provided feedback when necessary.
At the start of the 1rst class, students were given a passage of 249 words on the topic of international
adoptions to read as many times as they could within a 1ve-minute time frame. The passage was created by the
researcher based on the reading materials from the course textbook. After the students read the text, they were
presented with the same text but with segments of the target vocabulary removed. They were asked to 1ll in the
missing word parts and upon completion to compare answers with a partner. Next the students were presented
with a set of questions to productively recall the target items.
At the end of the second class, the treatment group was asked to do a matching cloze-type activity
consisting of selected topic-induced word combinations offered in a box and referred to as a “word bank” and
sentences with blanks. The activity required that students a) examine selected phrases in a word bank and
sentences below the phrases and b) complete the sentences using the items in the word bank. They were directed
to make changes to the phrases in order to produce grammatical sentences. Students worked in pairs.
In the next class, students were engaged in the 2/1/30 activity which is a modi1ed version of 4/3/2
activity (Nation & Gu, 2007). They sat in two rows facing one another. The learners sitting in one row were
assigned the role of a speaker and those siting in the other row the role of a listener. They were given a copy of
the text used in the activity from previous day to read as many times they could within three minutes. After
reading the passage, the speakers were directed to retell the passage to one partner within 2 minutes, to another
within one minute, and 1nally to the third within 30 seconds. The listeners were directed to listen, take notes, and
not to interrupt the speakers. Having delivered their speeches to three different partners, learners changed roles.
When done, learners were asked to brie3y review their notes and compare their own performance to the
performance of their partners when doing the speaking task.
The 1nal activity was entitled “Build an argument.” It was a two-part writing activity. The students worked
in pairs. First, the students were directed to utilize a selected subgroup of target word combinations in building
three arguments and write them down. One argument had to be written in support of international adoption.
The second argument had to be created in opposition to international adoption. For the third argument, students
could choose whether to support or refute the controversial issue. Once the writing was completed, students
underlined the target phrases in the written arguments and exchanged them with another pair of students for
peer review. In the second part of the activity, students were asked to revise and edit the three arguments
completed by the other group to the best of their abilities. They were asked to focus their attention to the use of
the target phrases (those that had been underlined in the arguments).
All of the activities as a type (e.g., matching, 1ll in the blanks, build an argument, etc.), with the exception
of 2/1/30 activity, were piloted with a group of high-intermediate students not included in the study. Based on
the input received from the teacher, the matching activity was modi1ed from a group to a pair activity and less
material was removed from target phrases in the 1ll-in-the blanks activities.
Table 2
Overview of the Activities by Lessons for the Treatment Group
Lesson Treatment group
1 5-minute read and word completion
Answering questions
2 Matching cloze
3 “2/1/30” activity
4 “Build an argument” activity
The posttest was administered at the end of the treatment, which coincided with the end of the term.
During the posttest, just as writers have access to their writing resources, participants in the study were allowed
access to the reading materials on the topic of international adoptions required for the course. The texts
accessible to the treatment group had no target phrases in bold type. The students wrote essays by hand. The
essays were collected in the classroom.
The contrast group, as noted previously, read and discussed the same texts as the treatment group. While
the treatment group was receiving explicit instruction, the contrast group was engaged in extended discussion
tasks based on the content of the reading materials, analysis of the arguments presented in the texts, and brief
writing-oriented tasks, as the teacher devised.
3 - correct phrase; spelling issues possible but cannot be mistaken for the issues with in3ectional
and/or derivational af1xation;
2 - correct phrase; problems with in3ectional morphology (e.g., reopen adoption to
foreigner instead of reopen adoption to foreigners)
1 - incorrect phrase but an attempt at production of correct phrase evident which can be described
as one of the following:
a) Problems with derivational morphology (e.g., victims of violent instead of victims of violence)
b) Substitution of a preposition (e.g., place a child of adoption instead of place a child for
adoption)
c) Omission of a function word inside the phrase (e.g., place a child adoption instead of place a
child for adoption)
0 - no attempt to produce a target phrase OR any combination of the issues described under the
rating of 1.
Figure 1. Scale for Measuring the Production of Topic-Induced Word Combinations in Writing.
Two computer programs, namely, Text-Lex Compare v.2. 2 (Cobb, 2010) and Microsoft Windows version
2007, were used for identi1cation of the target topic-induced word combinations in participants' compositions.
While the former was employed to detect the presence of the target items in the students’ texts, the latter, with its
search feature “Find”, was used to identify the location of the target structures in the participants’ compositions.
Each time the target structure was located, the researcher examined the topic-induced word combination to
determine whether a) the form and use of the structure matched the form and use of the target item; b) the word
combinations were a part of students’ prose or the quoted and/or unquoted reference materials; c) there were
instances of an overlap of two or more target items. The researcher bolded all of the target items in the
document and recorded her notes in the table along with the results of the Text-Lex Compare program.
After the topic-induced word combinations identi1ed by the Text-Lex Compare program were located and
marked in bold in the text, the researcher continued the examination of the compositions using the Microsoft
Word program and its feature "Find" to locate possible 3awed structures (e.g., issues with spelling, problems with
morphology, dropped words within the formulaic sequences). The search was conducted by entering partially
realized forms of the target items as search criteria. To illustrate, when the essays were examined for the
occurrences of victims of violence, the following search criteria were submitted: victim and violen. The topic-induced
phrases that appeared in the essay prompt (orphaned children, international adoption) were included in the analysis.
The process of identi1cation of the target items in the students’ compositions was repeated three times
over a period of two days to assure the reliability of scoring of data. The researcher took 15- to 30-minute breaks
between searches after every 5 target items.
After the researcher located and bolded the target structures in the students' compositions, she reviewed the
essays to exclude from the analysis the word combinations that appeared to be a part of the material borrowed
from reference sources and not student-generated text. The researcher evaluated the formulaic sequences using
the scoring guide presented in Figure 1. The 1nal score given to an essay was a sum of the scores given to each
phrase occurrence in the text. If there were multiple occurrences of the same topic-induced word combination,
an average of scores assigned to each occurrence was computed and included in the calculations of the 1nal
score.
The data for the study included the scores students received on the pre- and posttests on the production of
topic-induced word combinations in an unannounced in-class 40-minute argumentative essay.
Results
Differences Between Contrast And Treatment Groups
Table 3 offers the means and standard deviations for the scores participants received on the production of topic-
induced word combinations in essays at the start and end of the term.
Table 3
Mean Scores and Standard Deviations for both Measures by Group
Measure n M SD n M S
D
Contrast Treatment
Pretest 19 4.08 2.16 35 3.79 1.90
Posttest 19 4.84 2.27 35 8.71 5.40
The research question that motivated this study was whether the students who received explicit instruction
improved their abilities to use the target topic-induced phrases in writing more, from pre-test to posttest, than
those who did not. To compare the gains over time between the two groups, an ANOVA with repeated measures
was performed with time (pretest vs. posttest) as a within and group (treatment vs. contrast) as a between subjects
factors. The assumptions of normal distribution of data and the homogeneity of variances were not met.
Larson-Hall (2016) explains that the problem with violating these assumptions is that statistical differences that
exist between groups of participants may not be found (p.100). The analysis for this study 1nds statistically
signi1cant results as described below.
There was a statistical interaction between group and time, meaning that the groups did not perform the
same way at the two time periods (F(52,1=10.84, p=.0017886, generalized eta-squared =.08). The interaction
between group and time accounted for 8% of the variance in the model. Because there were only two choices
(one for time and another for group), data about sphericity was not offered.
There was also statistical effect for time (F(52,1)=32.75, p<.0001, generalized eta-squared=.20). In this
model, time makes a bigger difference to the variance, accounting for 20% of the variance. Since there are only
two times tested, from the mean scores (see Table 1), it is concluded that the participants did better on the
posttest than the pretest. There was a statistical effect for group (treatment vs. contrast), ( F(52,1)=5.27, p=.03,
generalized eta-squared=.06). The effect for group was not as great as the effect for time, accounting for 6% of
the variance. Since there are only two groups, from the mean scores (see Table 1) it is concluded that the
treatment group performed better than the contrast group.
The results suggest that that both groups made gains in their abilities to produce topic-induced word
combinations from pretest to posttest, but that the treatment group had greater gains than the contrast. Such
1ndings suggest that, at least for the intermediate ESL writers, those students who receive direct instruction seem
to improve their abilities to employ the topic-induced word combinations in their compositions more than the
learners who do not.
Interviews
Follow-up interviews were conducted with a subset of participants from the treatment group who were selected
on the basis of their abilities to produce topic-induced word combinations on the posttest. Three informants
were male and two were female. Interviews followed a semistructured guide comprised of open-ended questions
about the students’ backgrounds, academic goals, English language training, and, more importantly, about the
strategies students applied to producing the phrases and the attitudes towards the instructional intervention (see
section Overview of the research design for speci1c questions). Interviews were conducted and tape-recorded by the
researcher. The researcher listened to the information as many times as was necessary in order to represent the
information accurately and take notes while listening. The researcher analyzed the data from the interview by
looking for patterns in the responses of the informants. Pseudonyms are used for all of the informants to ensure
con1dentiality. Their language and education pro1les are presented in Table 4.
Table 4
Informants’ Language and Education Pro@les
Al Jumi Jack Jihan Ju
Language Japanese Arabic Turkish Portuguese
Educational High-school diploma from a home county Master’s degrees in Bachelor’s degree in
Background business and in business from a home
business country
administration from a
home country
Future plans University University education in Employment in home Employment in the US
education in the US country
home country
Al was a recent high-school graduate from Japan. He had lived in the US for two months. His academic
goal was to pursue a degree in teaching English as a foreign language in his native country. Prior to enrolling in
the ESL classes in the US, he had had little opportunity to write extensively in English. He reported that when
writing the in-class essay at the beginning of the semester, he was focused on the content of his composition and
the ideas to use in support of his position on international adoptions; however, when writing in-class on the same
topic at the end of the semester and after having been taught the target phrases, he was focused more on the
vocabulary, paying attention not only to what to say but also how to say it. He used the target phrases in the end-
of-the-semester timed essay because they were important in the discussion of the topic and because he felt that
the phrases could help him express ideas clearly. His attitude towards all of the in-class vocabulary activities was
positive. He felt that all activities provided substantial practice in production. From the course syllabus and
previous writing experience in class, he knew that he would be expected to write another essay, so he paid
attention to the activities in class.
Jumi, similar to Al, recently graduated high-school in Japan. She had been studying English in the U.S. for
10 months during which time she had completed four terms at the IEP. Her plan was to study sports medicine at
a university in the United States. Although she had taken four writing classes prior to participation in the study,
she found writing dif1cult. She explained that the lack of knowledge on the essay topic and of the words to use to
discuss the topic were reasons for a limited use of the topic-induced phrases in the in-class pretest essay. This was,
however, not the case on the posttest when she employed the target phrases in her composition. Among the
activities used in teaching topic-induced phrases in the writing class, Jumi found the one with an immediate
connection to her own writing the most useful. When discussing the pros and cons of international adoptions,
she was in favor of inter-country adoptions, which is why she found useful writing an argument for foreign
adoptions, a segment of the Build an Argument activity and why she viewed negatively the other activity segment
asking her to write against inter-country adoptions. Speaking under time constraints for the 2/1/30 activity was
not enjoyable. The remainder of the activities used in class she found moderately useful.
Ju was a female participant from Brazil. She held a bachelor’s degree in business from her native country
and had been attending ESL classes at the IEP for thirteen months with the goal of 1nding employment in the
US. Similar to Jumi and Al, Ju was searching for ideas to use in the essay paying limited attention to the
vocabulary to use. On the posttest, however, having realized that the target phrases were important and necessary
in a discussion of the topic, she purposely used the target phrases and alternated synonymous phrases (e.g., inter-
country adoption, foreign adoption) to improve the quality of her text. Ju reported learning the target phrases on the
topic of adoption in class and was proud that at posttest, she was able to write them down from memory. She
found the phrases taught in class very useful because they related to the topic of the essay she would be asked to
write next. She pointed out how the teacher had been using them in class, how the peers produced them in class
discussions, and how the authors employed them in the texts she read. She saw a purpose in using the topic-
induced phrases in her writing. Similar to Al, she reported that the phrases were important. They helped her
express ideas clearly and talk about the same idea without repeating the same phrase. She concluded her answer
to the question on how she went about using the target phrases in a downward tone indicating that there was
nothing more to be said except I used them because I had to use them! Ju reported that among the activities used in
teaching topic-induced formulaic sequences the activity Build an Argument was the most useful and could not think
of any activities used in instruction that were not helpful to her.
Jihan was a male participant from Turkey. He held one master’s degree in business and another in
engineering. He had been in the US for about nine months. He had completed four sessions of ESL classes. His
professional goal was to 1nd employment in a prestigious foreign 1rm in his home country. He explained that the
phrases taught in class were not the vocabulary he felt he needed to learn. The vocabulary he explored and
focused on, was the vocabulary he self-selected either because the items were new or interesting to learn. He said
that when writing essays for the class, he was focusing on creating a well-organized, uni1ed, and coherent essay; it
was a problem for him to focus on vocabulary. His approach was to think in his native language and then
translate to English, paying special attention to the writing conventions taught in the writing class. Although
Jihan’s attitude towards the instruction on topic-induced phrases in the writing class was generally negative, he
thought that Build an Argument activity was useful.
Jack was a male student from the United Arab Emirates. He had been in the United States for a year and
three months. His goal was to continue his academic studies in the United Sates. He did not use the target
phrases in his writing because he wanted to talk about adoptions in general not necessarily about international
adoptions. His attitude towards the activities used in instruction of topic-induced phrases was generally neutral
but he, similar to other informants, had a more positive attitude towards the activity Build an Argument.
In summary, while each interview participant reported dif1culties in focusing on the vocabulary aspect of
their writing at pretest, only those with a positive attitude towards instructional intervention at posttest, knew the
content enough to allocate attention to the use of target phrases. This group of students concurred that the
topic-induced word combinations taught in class helped them express their ideas better and clearer, which is one
of the main reasons why students employed them in writing. On the other hand, those who failed to recognize
the contribution the target phrases make to the discussion of the topic as well as to appreciate most of the in-
class vocabulary-focused activities, also failed to use the topic-induced phrases in their writing. Interestingly
enough, when a vocabulary-focused activity was both integrated with the writing task and also closely aligned
with the major writing assignments, all of the interview participants expressed appreciation for the teaching
strategy.
Discussion
The 1ndings of this study suggest that ESL learners can improve their abilities to use topic-induced word
combinations in writing when reading texts on a given topic and discussing their content with peers in class.
These results are not surprising. The target phrases the study considered are essential for an effective discussion
of a topic (Erman, 2009); that is, when relatively few topic-induced phrases are used, they key a reader into the
content of the text. Students seemed to recognize this. Also, in class they had exposure to the target items; for
four days, they read on the topic and discussed the readings. Because the students had access to the reading
materials as they wrote their essays both at pretest and posttest, it cannot be claimed that they produced the
target phrases from memory. This may apply particularly to the production of the two target phrases ( international
adoption and orphaned children) that were additionally present in the writing prompt (see section Instruments for
qualitative data elicitation and evaluation) given that the previous research reports that ESL writers often borrow lexical
phrases from the writing prompts they are given (e.g., Ohlrogge, 2009), There were also no participants from the
contrast group interviewed to provide further evidence on how they went about using the phrases in their essays.
What we do know, however, is that through extended exposure they became familiar enough with the target
phrases to recognize their usefulness and employ them in their own writing. What we still need to 1nd out
through a qualitative analysis is which types and forms of the target topic-induced phrases the students in the
contrast group used in their essays.
The study 1ndings also suggest that the ESL students who receive explicit instruction improve their abilities
to employ the topic-induced word combinations in their compositions more than the learners who do not receive
this instructional intervention. These 1ndings, to an extent, support the 1ndings of Lee (2003) and Lee and
Muncie (2006) on the positive effects of direct instruction on the topic-related vocabulary use in writing. The
1ndings of this study were a result of carefully planned explicit instruction consisting of giving students reading
materials with topic-induced word combinations in bold type; stressing the contribution of the target phrases to
the message of a text; having students produce the topic-induced phrase in controlled situations; directing them
to read, listen, speak, and write the target phrases in an activity under time constraints; and asking them to use
the target phrases in a writing task that is aligned in purpose with the very next major written assignment.
Another very important feature of the instructional intervention was that the target phrases were assumed to be
useful to L2 writers because they had an immediate application to their writing. The 1ndings of the study
support the call for integration of the explicit teaching of vocabulary in writing (i.e., Coxhead & Byrd, 2007;
Folse, 2008; Schmitt, 2000), particularly the teaching of vocabulary students need for their writing (Folse, 2008).
Where discrete differences in the use of the target topic-induced phrases lie between the two groups of students
may be more directly observed through a qualitative analysis of the types and forms of the target topic-induced
phrases in the students’ essays. It might be that the students receiving direct instruction were able to use a greater
variety of the phrases with, perhaps, better accuracy in their end-of-the-term essays. If so, it might be that the
instructional intervention helped students improve the vocabulary use overall in their essays. However, whether
or not the use of the topic-induced phrases helped students in the contrast group improve lexical quality of their
writing remains to be investigated.
As noted previously, the students were allowed reading materials as they wrote their essays, so it may or it
may not be the case that they were producing the phrases from memory. What we do know is that due to explicit
teaching, they recognized the relevance and utility of the target phrases to their own writing more than the
students in the contrast group did; and thus incorporated the phrases better in their compositions written at the
end of the term. This suggests, as previous research within the contexts of L1 (Cortes, 2006) and L2 (Jones and
Haywood, 2004) academic writing has indicated, that due to direct instruction, students may increase their
awareness about the importance of the use of multi-word combinations in writing. It is possible, however, that
some of the students in the present study learned, due to the treatment, the target phrases well enough to
produce them from memory. One of the informants who was considered a high performing participant based on
her ability to use the target phrases at posttest, claimed to have recalled the target items from memory.
The interview data provided details about the students’ strategies for production of the target phrases in
writing, their attitudes towards the target phrases, and the activities used in explicit instruction. The participants
concurred that their written production was affected by their perceived need to employ the target phrases in their
writing. The informants who understood how relevant the target phrases were to the topic their essays examined,
were those who employed them more in their writing, and those who did not, chose, for the most part, to
disregard them. In addition, it seemed that most of the time, the production of the target phrases was motivated
by students’ intention to showcase knowledge on the topic.
Additionally, helping students realize the utility of the topic-induced phrases in the reading materials on a
speci1c subject is worth noting. Some students were alerted to the importance of the topic-induced phrases upon
receipt of the reading materials with the target phrases in bold type reoccurring in a single and/or across
multiple texts.
With respect to the strategies for production of the target phrases in writing on the 1rst timed essay,
students grappled with generating content for their essays which ultimately affected the vocabulary choices they
made, so fewer target phrases were used. On the second timed essay, the high performing students felt they knew
the content enough to pay attention to how to convey meaning with precision and clarity that topic-induced
phrases allowed.
Relative to the activities used in the instructional intervention, the interview data indicate that high
performing students value all of the activities focusing on the topic-induced phrases while low performing
students enrolled in writing classes appreciate activities with a direct connection to their own writing. All of the
informants, low performing and high performing alike, noted that one activity that resembled the upcoming
major assignment in purpose and content was most useful.
There are several limitations to be noted in the present study. First, the number of participants in the study
was small and they were all at one level of language pro1ciency (i.e., high-intermediate). To obtain more
generalizable results and to compare the effect of treatment across pro1ciency levels, future research would need
to include more participants at various levels of language pro1ciency. In addition, since the reading materials
were accessible to the students during the writing sessions, the study could not gather the information on the
effects of explicit instruction on the students’ abilities to produce topic-induced phrases in free production. Third,
in an effort to minimize the task effects on the students in the treatment group and also to avoid possibly alerting
students to the study begin conducted, the target phrases related to the topics of the two other essays were
explicitly taught prior to submission of their respective 1nal drafts. Although the topic-induced phrases
concerned topics different from the one used in data collection, the explicit teaching sessions were similar to the
treatment activities before the data collection in that that the students received reading material with the target
phrases marked in bold and completed activities that focused on the production of the target phrases. Future
research could control for this variable. Fourth, in an attempt to minimize the teacher-investigator variable in the
study, the course instructor was different from the study investigator. The researcher was present on the days
when the data for the study was collected. She was in regular contact with the course instructor to provide
materials for the study, to con1rm with the teacher that vocabulary was not explicitly taught during the data
collection from the contrast group, and to receive reports on the delivery of the explicit teaching sessions;
however, observations of actual teaching were not conducted. Future research should consider including
observations of the teaching sessions or possibly recording the session for later viewing and review. Fifth, the
present study did not examine descriptively the types and forms of the target topic-induced phrases in the essays
written by the contrast and treatment groups nor did it explore whether and to what extent the treatment had an
impact had on the students’ quality of writing. Further research on the aforementioned limitations is warranted
to re1ne our understating of the effects of explicit teaching of the topic-induced phrases on ESL writers.
Relative to the design of tasks that integrate vocabulary and writing, teachers may want to link them as closely as
possible to the purpose for which students are writing their major assignments. By so doing, they are more likely
to contextualize explicit teaching of the topic-induced word combinations thus making instruction meaningful to
the students.
References
Barkaoui, K. (2010). Do ESL Essay Raters’ Evaluation Criteria change with experience? A mixed-methods,
Cross-Sectional study, TESOL Quarterly, 44, 1, 31-57.
Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for Speci1c
Purposes 26, 263-86.
Biber, D., & Conrad, S. (1999). Lexical bundles in conversation and academic prose. In H. Hasselgard and S.
Oksefjell (Eds.), Out of Corpora: Studies in Honor of Stig Johansson (pp. 181-190). Amsterdam:
Rodopi.
Biber, D., Johansson, D., Leech, G., Conrad, S., & E. Finegan, E. (1999). Longman grammar of spoken and
written English. London: Longman.
Cobb, T. (2010). N-gram Phrase Extractor (Version 4) Retrieved from http://lextutor.ca/tuples/eng/
Cobb, T. (2010). Text Lex Compare (Version 2.2). Retrieved from http://www.lextutor.ca/text_lex_compare/
Cortes, V. (2006). Teaching lexical bundles in the disciplines: An example from a writing intensive history class.
Linguistics and Education, 17, 391-406.
Cowie, A.P. (1992). Multiword lexical units and communicative language teaching. In Arnaud & H. Bejoint
(Eds.), Vocabulary and applied linguistics. London: MacMillan.
Coxhead, A. (2000). A new academic word list, TESOL Quarterly, 34, 213-238.
Coxhead, A. (2008). Phraseology and English for academic purposes: Challenges and opportunities. In F.
Meunier and S. Granger (Eds.), Phraseology in foreign language learning and teaching (pp. 149-162).
Amsterdam, PA: John Benjamin’s Publishing Company.
Coxhead, A. (2012) Academic vocabulary, writing and English for academic purposes: perspectives from second
language learners. RELC Journal, 43, 137–45.
Coxhead, A. (2014). New Ways in Teaching Vocabulary. Alexandria, Virginia: TESOL Inc.
Coxhead, A., & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar of academic
prose. Journal of Second Language Writing, 16, 129–147.
Ellis, N., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second language speakers:
Psycholinguistic, corpus linguistics, and TESOL. TESOL Quarterly, 42, 375-396.
Ellis, R. (2009). The Study of Second Language Acquisition. Oxford: Oxford University Press.
Engber, C. A. (1995). The Relationship of Lexical Pro1ciency to the Quality of ESL Compositions, Journal of
Second Language Writing, 4, 139-55.
Erman, B. (2009). Formulaic Language from a learner perspective: What the learner needs to know. In R.
Corrigan, E. A. Moravcssik, H. Ouali, and K.M. Wheatley (Eds.), Formulaic Language, Volume 2, (pp.
323-346) Philadelphia: John Benjamin Publishing Company.
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20, 29–62.
Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 pro1ciency.
TESOL Quarterly, 28 (2), 414-420.
Ferris, D. (2015). Supporting multilingual writers through the challenges of academic literacy: Principles for
English for academic purposes and composition In N. W. Evans, N.J. Anderson, W. G. Eggington., (Eds.),
ESL Readers and Writers in Higher Education: Understanding Challenges. New York: Routledge.
Folse, K. (2008). Myth 1: Teaching vocabulary is not the writing teacher's job. In J. Reid (Ed.), Writing Myths:
Applying Second Language Research to Classroom Teaching (pp.1-17). Ann Arbor, MI: University of Michigan
Press.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing. Collocations and formulae. In A. P. Cowie
(Ed.), Phraseology: Theory, analysis, and applications (pp. 146-160). Oxford: University Press.
Harley, B., & King, M. L. (1989). Verb lexis in the written compositions of young L2 learners, Studies in Second
Language Acquisition, 11, 415-439.
Hinkel, E. (2004). Teaching academic ESL writing: Practical techniques in vocabulary and grammar. Mahwah,
NJ: Lawrence Erlbaum Associates.
Howarth, P. (1998). The phraseology of Learners’ Academic Writing. In A.P.Cowie (Ed.), Phraseology: Theory,
analysis and applications (pp. 161-187). Oxford: Clarendon Press.
Hyland, K. (2008). Lexical clusters: Text pattering in published and post-graduate writing, International Journal
of Applied Linguistics, 18, 41-61.
Jacobs, Har1eld, Hughey, & Wormeth (1981). Testing ESL composition: A practical approach. Boston: Newbury
House.
Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sequences. In N. Schmitt, (Ed.),
Formulaic sequences (pp. 269-300), Philadelphia: John Benjamins Publishing.
KeyWords Extractor (Version1)
Larson-Hall, J. (2016). A Guide to Doing Statistics in Second Language Research Using SPSS and R (2nded). New York:
Routledge.
Lee, S. H. (2003). ESL learners’ vocabulary use in writing and the effects of explicit vocabulary instruction.
System. 31, 537-561.
Lee, S. H. & Muncie, J. (2006). From receptive to productive: Improving ESL learners’ Use of vocabulary in a
Postreading Composition Task. TESOL Quarterly, 40, 295-320.
Leki, I., & Carson, J. (1994). Students’ perceptions of EAP writing instruction and writing needs across the
discipline. TESOL Quarterly, 28, 81-101.
Lewis, M. (1997). Pedagogical implications of the lexical approach. In J. Coady 8: T. Huckin (Eds.), Second
language vocabulary acquisition: A rationale for pedagogy (pp. 255—270). Cambridge: Cambridge University
Press.
Li, J., & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: A longitudinal case study.
Journal of Second Lang Writing, 18, 85-102.
Linnarud, M. (1986). Lexis in composition: A performance analysis of Swedish learners’ written English.
Malmö, Sweden: Liber Fölag Malmö.
McClure, E. (1991). A comparison of lexical strategies in Ll and L2 written English narratives. Pragmatics and
Language Learning, 2, 141-154.
Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press.
Nation, I. S. P. (2005). Teaching and learning vocabulary. In E. Hinkel (Ed.), Handbook of Research in Second
Language Teaching and Learning (pp. 581-596). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Nation, I.S.P. & Gu, P. Y. (2007). Focus on Vocabulary. Sydney, Australia: National Center for English Language
Teaching and Research Macquarie University.
Numrich, (2009). Raise the Issues: An integrated approach to critical thinking. Pearson Education ESL.
Ohlrogge, A. (2009). Formulaic expressions in intermediate EFL writing assessment. In R. Corrigan, E. A.
Moravcssik, H. Ouali, and K.M. Wheatley (Eds.) Formulaic Language, Volume 2, (pp. 375-386)
Philadelphia: John Benjamins Publishing Company.
Santos, T. (1988). Professors' reactions to the academic writing of nonnative-speaking students, TESOL
Quarterly, 22, 69-90.
Scott, M., & Tribble, C. (2006). English for academic purposes: Building an account of expert and apprentice
performances in literary criticism. In M. Scott and C. Tribble (Eds.), Textual patterns: key words and
corpus analysis in language education (pp. 131-159). Amsterdam, PA: John Benjamin’s Publishing
Company.
Simpson-Vlach, R., & Ellis, C. N. (2010). An academic formulas list. New methods in phraseology research.
Applied Linguistics, 31, 487-512.
Song, B. & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native-English
speaking and ESL students? Journal of Second Language Writing, 5, 163-182.
Thornbury, S. (2002). How to Teach Vocabulary. Harlow: Pearson Longman.
Yorio, C. A. (1989). Idiomaticity as an indicator of second language pro1ciency. In K. Hyltenstam & L.K. Obler
(Eds), Bilingualism across the lifespan (pp.55-72), Cambridge: Cambridge University Press.
Zimmerman, C. B. (2009). Word knowledge: A vocabulary teacher's handbook. Oxford: Oxford University Press
Christian Jones
University of Liverpool, UK*
Daniel Waller
University of Central Lancashire, UK**
Abstract
This article reports on a quasi-experimental study investigating the effectiveness of two different teaching approaches,
explicit teaching and explicit teaching combined with textual and aural input enhancement used to teach lexical items to
elementary level learners of Turkish in a higher education context. Forty participants were divided into two equal groups
and given a pre-test measuring productive and receptive knowledge of nine targeted lexical items naming common types of
food and drink. Each group was then given sixty minutes instruction on ‘restaurant Turkish’, using a direct communicative
approach. Group one (contrast group) received explicit teaching only, while group two (treatment group) received the same
teaching but also used a menu where the target words were bolded (textual input enhancement) and listened to the target
words modelled by the teacher three times (aural input enhancement). Following the treatment, tests measuring productive
and receptive knowledge of the target items were administered. This process was repeated with a delay of two weeks
following the treatment. Analysis of gain scores for receptive and productive tests made at the pre-, post- and delayed stage
reveal larger gains for the treatment group in each test. These were statistically signi(cant when compared with the contrast
group’s scores for production at the immediate post-test stage. Within group tests showed that each treatment had a
signi(cant impact on receptive and productive knowledge of vocabulary targeted, with a larger short term effect on the
treatment group. Previous studies in this area have tended to focus on the use of input enhancement in relation to the
learning of grammatical forms but these results demonstrate some clear bene(ts when teaching lexis, which have clear
implications for further research and teaching.
Key words: Input enhancement; textual enhancement; aural enhancement; Turkish vocabulary; beginners
Introduction
The importance of learning vocabulary explicitly from the early stages of studying a second language is now
well-established (e.g., McCarthy, 1999; Schmitt, 2000.) While there has also been a great deal of research which
gives clear suggestions and about how many and which lexical items and chunks may be of primary importance
to teach learners (e.g., O’Keeffe, McCarthy & Carter, 2007; Shin & Nation, 2007), there is less consensus about
how instruction can best aid this process. There is evidence that explicit teaching of grammatical and lexical
items has a greater impact upon learning than implicit teaching (Norris & Ortega, 2000, 2001; Spada & Tomita,
2010) but as yet there are no de(nitive answers to the type of explicit teaching which results in the most effective
learning of second languages. It may also be the case that the effects of explicit teaching can be increased
* Tel. 44 1517942724, Email: Christian.jones2@liverpool.ac.uk, Department of English, 19-23 Abercromby Square, University
of Liverpool, Liverpool, L69 7ZG
**Tel. + 44 1772 893672, Email: dwaller@uclan.ac.uk, School of Language and Global Studies, University of Central
Lancashire, Preston, PR1 2HE
through making the input learners receive as salient as possible. One area of consistent focus in the research has
been upon the use of input enhancement (IE) as a means of promoting noticing and learning and in particular
upon the use of textual enhancement (TE) of various kinds. TE commonly involves enhancing a text though
making target items bold, italicised or underlined.
The impact of TE has been researched in regard to a range of second languages alongside forms of
explicit teaching (e.g., Alanen, 1995) and as a variable in its own right (e.g., Petchko, 2011) but results have been
mixed (Han, Park & Combs, 2008). Aural enhancement (AE), whereby listening texts are manipulated to increase
the saliency of target items (such as making the recording of those items louder or repeating target items) has
been researched a great deal less, and what results which are available are similarly inconclusive (e.g., Reinders &
Cho, 2011). However, much TE research has focused upon grammatical structures as opposed to lexical items
and AE and TE have been under-researched in combination with explicit teaching. This study is an attempt to
(ll this gap and provide some evidence that TE can be a useful addition to vocabulary learning, as it can quickly
draw learners’ attention to the form and use of a word, something Nation (1999) suggests can be helpful. As a
strategy, TE has the bene(t of being potentially extremely versatile. It could be used for incidental learning or
targeted learning based on resources such as Coxhead’s Academic Word List (2000) or even used by learners
themselves as a deliberate learning strategy. The use of such strategies by learners has been identi(ed by Folse
(2004) as an essential feature of successful vocabulary learning.
Input Enhancement
The term ‘input enhancement’ is credited to Sharwood Smith (1991, 1993), who suggested that some form of
enhancement may be helpful to make input more salient to learners. Without such salience, he suggests, learners
may fail to notice forms within the input they receive because much input is likely to be processed for meaning.
Noticing, as described by Schmidt (1990, 1995, 2001, 2010) can be de(ned broadly as ‘conscious registration of
attended speci(c instances of language’ (Schmidt, 2010, p.725). It is this conscious registration which is
considered to be the (rst step needed to convert input into intake and input enhancement may be viewed as one
type of ‘consciousness raising’ (Sharwood Smith, 1981) activity, which teachers and researchers can use to help
learners notice forms within input they comprehend. Sharwood Smith (1993) suggests a number of methods
which might be used to enhance input, including the bolding of texts for visual input and repeating targeted
items for aural input. The use of input enhancement would seem to be of particular relevance in the instruction
of beginners since in the initial stages of learning a new language all input is potentially signi(cant to the learner
and there is often little indication as to which pieces of language are essential or more useful in the long term.
There have been a number of studies aimed at investigating the effect of input enhancement upon the
learning of targeted forms. Many studies of this nature have focussed upon textual enhancement (TE), either as
a variable of its own or in combination with other variables such as input Aood or explicit rule-based instruction.
In a review of the research in this area Han, Park and Combs (2008) found that most research has sought to
compare TE with another form of instruction such as explicit rule-based instruction or output practice and that
TE has often been combined with additional means to augment its effect, such as asking learners to attend to the
targeted form. Studies have generally focused on grammatical forms in a variety of languages, including English
relative clauses (Doughty, 1991), Spanish impersonal imperative (Leow, 2001), French past participle agreement
in relative clauses (Wong, 2003) and English passive forms (Lee, 2007), although some studies have concentrated
upon lexical items (e.g., Kim, 2003; Petchko, 2011). Treatments have varied greatly in length from (fteen minutes
to two weeks as have sample sizes, which have varied from fourteen to two hundred and (fty nine participants
(Han, Park & Combs, 2008). Generally, studies have employed an experimental design, employing a pre-test,
treatment and immediate post-test design, with a tendency not to include a delayed-test. Results have mostly
been measured by analysing productive and receptive tests statistically, although there is inconsistency in the type
of test employed. For example, some studies (e.g. Reinders & Cho, 2011) have just used one receptive test type,
commonly a grammaticality judgement test. Several studies have attempted to measure noticing through
measures such as think-aloud protocols ( e.g., Alanen, 1995; Rosa & O’Neill, 1999) and to employ such data to
demonstrate that learners who noticed aspects of the targeted language achieved better results in tests.
Perhaps because of the varied nature of the studies, results indicating positive effects for TE have
themselves been mixed in terms of its impact upon noticing and learning of the targeted forms. Doughty (1991),
Shook (1994) and Alanen (1995) for example, all report that TE had some positive effects on learning of the
targeted forms, whilst Izumi (2002) and Wong (2003) report that there were no positive effects on learning. Other
studies report mixed results, with TE having a positive impact in the area of noticing but not in terms of learning
(e.g., Izumi 2002), and in some cases there were no discernible effects with regard to noticing or learning (e.g.,
Leow, Egi, Nuevo & Tsai, 2003; Petchko, 2011). Jourdenais, Stauffer, Boyson and Doughty (1995) did (nd that
TE had a signi(cant impact upon noticing and immediate production of the targeted forms but the lack of
delayed test makes it dif(cult to suggest the forms were acquired. A possible cause of the mixed results may also
be simply that not all studies have sought to measure both noticing and learning (Han, Park & Combs, 2008,
p.602) and there has often been a presumption that TE will cause noticing and therefore learning will follow and
thus it is only noticing of the forms in focus which needs to be measured. This is in itself not entirely
unreasonable if we accept Schmidt’s often quoted assertion that ‘noticing is the necessary and suf(cient condition
for converting input to intake’ (Schmidt, 1990, p.129) but there is a case for suggesting that researchers need to
differentiate between what learners have noticed and what they appear to have acquired and are able to produce.
Although these are not mutually exclusive, we would suggest that not every aspect of language which learners
notice will be acquired, in the sense that learners will be able to produce it. Measuring an internal process
(noticing) is also not without dif(culty and the use of measures such as think-aloud protocols have been criticised.
Barkaoui (2011, p.53) identi(es the issue of veridicality in such approaches where relationships between the
unconscious processing and the measurement process are indirect at best and relationships can only be inferred.
Dornyei (2007, p.148) notes that thinking aloud whilst performing a task it is not a natural process and therefore
requires some training. This training may result in what Stratman and Hamp Lyons (1994) term reactivity in that
it inAuences the kind of data produced so that learners produce more (or fewer) instances of noticing than they
would otherwise do. The method also relies on a learner’s ability to verbalise what they have noticed and it will
clearly be the case that some learners may be more con(dent at expressing this in a written form, either as they
notice, or after noticing. For these reasons, as we will discuss, we suggest that noticing can be measured through
testing receptive knowledge and learning through testing productive knowledge and areas of crossover can then
be analysed.
While studies in TE have been frequent, those employing aural enhancement (AE) have been much less
frequent. H. Y. Kim (1995) reports on an early study which attempted to explore whether AE could inAuence the
phonological aspects which learners perceive in connected speech. Two groups of Korean learners of English
were asked to listen to a series of short texts and complete a visual comprehension task, choosing a picture which
best matched each passage. For one group, the speed of speech was slower with more frequent pausing at phrase
boundaries, while the other group listened to the texts at normal speed. Immediately following the listening,
students were asked what they had heard and why they chose each picture. Student reports suggested that the
elements of speech which students comprehended most easily were words which contained tonic syllables within
a tone unit, suggesting that slower speech may allow a greater chance to perceive these elements. However,
results indicated that there were no statistically signi(cant differences between the groups in terms of
comprehension.
There have also been a number of studies conducted using enhanced listening materials, particularly with
video (e.g., Baltova, 1999; Hernandez, 2004; Grgurović & Hegelheimer, 2007). However, these studies have
focussed upon different effects of TE upon listening, such as the extent to which listening comprehension and
intake can be aided by subtitled video or by using transcripts while listening. Although the effects have been
positive in some cases (e.g., Baltova, 1999) these results have not been consistent across a number of studies
(Perez & Desmet, 2012). In addition, use of subtitles and transcripts is perhaps better described as TE and not
AE because nothing has been done to enhance the recordings themselves.
Jensen and Vinther (2003) did examine the use of repetition of listening materials as a form of AE.
Students learning Spanish were played the same DVD material three times and given different treatments. Each
group heard the clip three times, either fast-slow-fast, fast-slow-slow or fast-fast-fast as treatment between pre-
and post-tests. No signi(cant differences were found between the treatment types but there was a signi(cant effect
of all treatments when compared to a control group. This suggests that all forms of repetition as AE had a
positive effect in this study. Reinders and Cho (2010, 2011) conducted a study using digital technology to aurally
enhance adverb placement and passives with sixteen Korean learners of English. The volume was raised on each
instance of the targeted structures in an audio (le given to students, whilst a contrast group was given the same
audio (le but without the targeted structures being enhanced. Each group was asked to listen to the audio (le
once and were given no further instructions. Despite the interesting nature of the study, no statistical differences
were found in the test results of each group and some participants even reported that the raised volume was
distracting.
Whilst the body of research in TE in particular is plentiful, there are clearly some elements which have
been under-researched and aspects of study design which have been inconsistent. The (rst of these is the failure
in some cases to provide both receptive and productive tests as a measure of the treatment given, something
Schmitt (2010) suggests is vital when assessing receptive and productive knowledge as aspects of vocabulary
learning. Clearly, if a learner can recognise a correct form in a measure such as a grammaticality judgement test,
this only provides evidence of receptive knowledge of the item in question. It cannot be equated with an ability
to produce the target items. Providing both types of test can help us to measure noticing (receptive tests) and
learning (productive test)s. Tests of lexis also have to be developed in order to assess the aspects of lexical
knowledge that are relevant to the situation. Nation (1999, p.340) sets out a table detailing different levels of
word knowledge for both reception and production both in terms of written and spoken contexts. At the early
stages of learning, such as the situation within which the learners in this study were in, assessment is most likely
to be mainly receptive in nature with only limited production being possible.
Secondly, as we have noted, not all studies have not employed a longitudinal element, in the form of a
delayed-test, something Schmitt (2010) also suggests is essential if we wish to provide evidence of durable
learning. This weakness is also one which Han, Park and Combs (2008) recognise and one which they argue must
be addressed if we hope to provide more reliable results in future TE studies. Although there is disagreement
about what constitutes an acceptable delay, it is generally recognised that a week or more is needed after
treatment in order to establish longer term effects of any intervention (Schmitt, 2010).
Thirdly, there have been notably fewer studies which have attempted to assess the impact of TE and AE
on the learning of lexical items. Those that have focussed upon lexical items (e.g., Bishop, 2004, Choi, 2016;
Y.Kim, 2003; Petchko, 201) have not employed both TE and AE as treatment variables and have also found little
effect for TE. Y. Kim (2003) sought to investigate the effect of TE and implicit, explicit or no lexical elaboration
(explicit = meaning plus de(nition, implicit = appositive phrase, following the target items) on two hundred and
ninety seven Korean learners of English. The (ndings show that TE alone did not have a signi(cant effect on
learners’ ability to recognise form or meaning of the lexical items, whilst lexical elaboration of both types aided
meaning recognition of the item. Bishop (2004) assessed the effects of TE on noticing formulaic sequences in a
reading text and overall comprehension of that text. Two groups were compared—a control group which read
an unenhanced text and an experimental group which read a text with targeted formulaic sequences
typographically enhanced. Students were able to click on words or sequences they were unsure and these were
often provided with an explanation of the meaning. They then answered a series of comprehension questions on
the text. Results showed that the TE group clicked on the enhanced formulaic sequences signi(cantly more than
single words and they also performed signi(cantly better on the comprehension test, when compared with the
control group. Petchko (2011) explored the impact of TE upon incidental vocabulary learning whilst reading
with forty seven intermediate students of English as a foreign language. Students in the treatment group had
twelve non-words enhanced, whilst the control group did not. Non-words were chosen to ensure that the
meaning of the treatment alone was measured. Both groups were given productive and receptive tests to measure
the effects of the treatment upon their recognition of word meaning and recall of the target items’ meanings.
Although both groups made gains when recognising form and recalling meaning in post-tests, there were no
statistically signi(cant differences found between the groups’ scores in either test. Cho (2016) investigated the
effect of TE on the learning of collocations. Two groups were compared – one which read a passage with target
collocations enhanced in the text and another group which read the text without the collocations being
enhanced. Groups received a post-test on the target collocations following the reading and a test to check their
recall of the whole text. They also had their length of eye (xation measured using eye tracking software. Results
showed that the TE group performed signi(cantly better than the contrast group on the target collocations test
and also spent more time looking at the enhanced forms. However, they also recalled signi(cantly less of the non-
enhanced text. This suggests that while TE can increase noticing of targeted lexis, the increased attention on
these items may reduce the ability to recall texts. As these results found mixed effects for TE alone, there seems to
be a clear need for more studies which attempt to investigate the impact of TE alongside AE on the learning of
lexical items. Such attempts are particularly merited when we consider the argument that one important way for
learners to increase their vocabulary is to notice form and meaning (Schmidt 1990) as much as possible when
they encounter them and TE and AE are one way this could be achieved. This would seem to be particularly the
case when investigating the impact upon beginners learning an L2, as a large part of their time can usefully be
spent trying to acquire a basic vocabulary as quickly as possible (McCarthy, 2004). Nation (2006) emphasises the
need for a deliberate approach to the learning of vocabulary and TE and AE potentially offer a way to direct
learning to the most important vocabulary. While the current study investigates the use of TE and AE in the
classroom, both types of input enhancement could also form the basis for self-directed study or independent
learning strategies.
Research Questions
To our knowledge, no studies have attempted to combine AE and TE with explicit instruction. Whilst the effects
of TE are mixed and AE has been under-researched, there is a great deal of evidence which demonstrates the
bene(ts of explicit instruction in language teaching, in developing lexical, pragmatic grammatical and pragmatic
competency (e.g., Alsadhan, 2011; Halenko and Jones, 2011; Norris & Ortega, 2000, 2001; Spada & Tomita,
2010). The current study is an attempt to address some of these issues through a focus on comparing TE/AE
alongside explicit vocabulary teaching, in comparison to explicit vocabulary teaching alone. It also attempts to
address the lack of a longitudinal element in some studies though the inclusion of a delayed-test, which can
provide evidence of durable learning (Schmitt, 2010, p.268) and to measure both receptive and productive
knowledge through these tests. The study seeks to answer the following research questions:
1. To what extent does TE and AE+ explicit teaching improve the receptive knowledge of the target lexical
items when compared to explicit teaching alone?
2. To what extent does TE and AE+ explicit teaching improve the productive knowledge of the target
lexical items when compared to explicit teaching alone?
Methodology
Participants
The participants consisted of two groups of 20 (rst year undergraduate students. All students were studying for a
degree in TESOL and Modern Languages combining TESOL with Arabic, Chinese, French, German, Japanese
or Spanish as their main second language. English was the (rst language of all participants. The research was
conducted as part of four hours of classes which students undertook in order to experience learning a second
language through Direct Method teaching, as beginners. Students had undertaken just two hours of classes in
Turkish prior to the study taking place and none had studied the language previously. In total there were
nineteen male and twenty one female participants, with a mean age of 21.5 in the contrast group and 22.6 in the
treatment group. Participants were randomly assigned to each group.
Research Design
The study followed a quasi-experimental classroom research design, as outlined by Dornyei (2007) and Cohen,
Canion and Morrison (2011) and here described as such because there was no control group employed but rather
two groups who received different types of instruction, which took place within a classroom setting. Although a
control group (receiving no instruction but undertaking each test) would have been an addition to the study, this
was not possible, as the participants undertook instruction as part of their undergraduate programme. In
addition, the intention was to measure the effects of a key variable in the instruction upon the learning of the
targeted lexis (in this case types of input enhancement) and not whether instruction itself has any effect. The
study employed a pre-test, treatment, post- and delayed test structure, with the delayed tests taking place two
weeks after instruction and representing the longitudinal aspect of the study. The design can be summarised in
Table 1.
Table 1
Research Design
Pre-test Treatment Post-test Delayed post-test
(2 weeks after
instruction)
Contrast Receptive and One hour of explicit Receptive and Receptive and
group productive teaching only productive productive
N = 20 vocabulary tests focussed on vocabulary tests vocabulary tests
Focused on target ‘restaurant Turkish’ Focused on target Focused on target
items e.g. ayran (a including food and items items
drink made from drink items tested in Turkish e..g. ayran (a Turkish e.g. ayran (a
yoghurt, salt and the pre-, post and drink made from drink made from
water) delayed tests yoghurt, salt and yoghurt, salt and
water) water)
Treatment Receptive and One hour of explicit Receptive and Receptive and
group productive teaching with textual productive productive
N = 20 vocabulary tests and aural input vocabulary tests vocabulary tests
Focused on target enhancement for the Focused on target Focused on target
items e.g. ayran (a target lexical items items e.g. ayran (a items e.g. ayran (a
drink made from focussed on drink made from drink made from
yoghurt, salt and ‘restaurant Turkish’ yoghurt, salt and yoghurt, salt and
water) including food and water) water)
drink items tested in
the pre-, post- and
delayed tests
A number of items were included in tests, based upon several factors. Firstly, two items were chosen as
they contained a potential cognate (salata [salad] and alkollu [alcoholic]) but also contained a word which would
not be recognisable to the learners. The second set of items were not recognisable but were used multiple times
in various forms (içecekler [drinks]) and (nally words were chosen which would be entirely unfamiliar and would
not be easily translatable into English (ayran [a drink made from yoghurt, salt and water]) and beyaz/kırmız şarap
[white/red wine]. All students were (rst given a productive and then a receptive test focussing on the target
items, for reasons outlined in the literature review (see appendix A for the target items and appendix B for a
sample of the tests). The productive test entailed learners translating the target items into English and the
receptive test entailed learners reporting whether they believed they knew the target item or not. As noted earlier,
tests of lexis have to be developed in order to assess the aspects of lexical knowledge that are relevant to the
situation. As the classes focused on learners at beginner level, this meant that the test needed to centre on
establishing meaning of new lexis and then the linking of form to this (Batstone & Ellis, 2009) and thus the focus
was on whether learners were able to recognise the words and link them to the appropriate forms. To ensure
reliability each receptive test also contained an equal number of real and invented words following Nation’s
(1999) format for vocabulary recognition tests. The addition of these words reduces the likelihood of participants
simply ticking all of the options. The order of items was changed for each test.
Each group received an hour of explicit instruction about ‘restaurant Turkish’ using a direct,
communicative method, meaning all instruction was delivered in the target language. Implicit teaching was
taken to be ‘learning without awareness of what has been learned’ whilst explicit teaching was taken to mean
‘the learner is aware of what has been learned’ (Richards & Schmidt, 2002, p.250) .This was realised by the
teacher explicitly stating the aims and intended outcomes of the class before it started. The lesson followed a
presentation and practice framework. Students were (rst shown pictures of the items and drilled on them.
Explanations of items which were not immediately obvious from the picture were given in Turkish (e.g. ayran [a
drink made from yoghurt, salt and water]). Later on in the lesson the menu was presented in enhanced and
unenhanced form (see appendix A). Finally, the students did a short role-play based on a model dialogue where
they took the part of customers in a café while the teacher took the role of the waiter.
The treatment group were given identical materials to the contrast group but each targeted word was
bolded for this group, in order to operationalise TE. Aural enhancement was operationalised by the instructor
modelling each targeted item three times for the experimental group and only once for the contrast group. This
procedure was intended to replicate the oral repetition which Sharwood Smith suggests (1991) can be used for
aural input enhancement. Students in the treatment group were not given any additional instruction, such as
asking them to pay attention to the enhanced words. Both groups were asked not to revise the words between
classes.
Test data was analysed for statistical signi(cance using between group and within group measures. To
answer the two research questions, gain scores at pre-post, post-delayed and pre-delayed stages were compared
using an independent samples t-test to compare groups. Productive and receptive gains were also compared for
each group using paired samples t-tests. Effect sizes were measured where signi(cance was found, using Pearson’s
r, which Cohen (1988) suggests can be considered in the following ways: small effect = 0.10, medium effect =
0.30, large effect = 0.50.
Table 2
Receptive Test Results
Pre-test Post-test Delayed test
Group1 (contrast) M = 1.5000 M = 6.5550 M = 5.6500
N = 20 SD = 1.73205 SD = 2.21181 SD = 260111
Group2 (treatment) M = 1.5000 M = 8.2000 M = 6. 9000
N = 20 SD = 1.60591 SD = 1.43637 SD = 1.94395
Note. Maximum score = 9
It is clear from this data that both groups made gains from pre- to post and pre- to delayed tests. Paired sample t-
tests show that these gains were signi(cant for both groups. For the contrast group, gains at the pre-post stage
were most positive (M =5.05000, SD = 2.68475) t(19)= 8.412, p = <.001, r = 0.88 and were maintained to
some degree at the pre-delayed stage ( M = 4.15000, S D = 2.49789) t (19) = 7.430, p = <.001, r = 0.86.
Although there was notable attrition from the post to delayed stage (M =-.9000, SD = 2.82657) , this was not
found to be signi(cant. For the treatment group, gains were largest at the pre-post stage (M = 6.70000,
SD = 2.51522) t (19) = 11.913, p <.001, r = 0.94 and were maintained to some degree at the pre-delayed stage
(M = 5.4000, SD = 3.20197) t (19) = 7.542, p =.001, r = 0.87. There was also attrition at the post to delayed test
stage (M = -1.3000, SD = 1.97617), but this was not found to be signi(cant. These results show that both types of
instruction had a durable bene(t for the receptive knowledge of the target lexis. They also show that the gains
were larger in general and the effect sizes larger at the pre-post and pre-delayed stages for the treatment group,
indicating a clear short term bene(t for explicit teaching combined with TE and AE. However, despite these
notable gains, when compared with independent samples t-tests, no statistically signi(cant differences were found
between the groups at any of the test stages.
Research Question 2: To what extent does TE and AE+ explicit teaching improve the productive
knowledge of the target lexical items when compared to explicit teaching alone?
Table 3 gives the descriptive statistics for the productive tests, for group 1 (the contrast group, who received
explicit teaching only) and for group 2 (the treatment group, who received explicit teaching and AE/TE).
Table 3
Productive Test Results
Pre-test Post-test Delayed test
Group1 (contrast) M = .9000 M = 4.9500 M = 3.6000
N = 20 SD = 9.6791 SD = 2.13923 SD = 2.23371
Group2 (treatment) M = .0000 M = 6.3500 M = 3.5000
N = 20 SD = .0000 SD = 2.32322 SD = 2.94690
Note. Maximum score = 9.
It is again clear from this data that both groups made gains from pre to post and pre to delayed tests. Paired
sample t-tests show that these gains were also signi(cant for both groups. For the contrast group, gains at the pre-
post stage were most positive (M =4.0500, SD = 2.13923) t(19)= 8.467, p <.001, r = 0.88 and were
maintained to some degree at the pre-delayed stage ( M = 2.70000, S D = 2.22663) t (19) = 5.423, p<.001,
r = 0.78. Again there was notable attrition from the post to delayed stage (M = - 1.3500, SD = 2.51888), and this
was found to be signi(cant, t (19) = - 2.397, p = .027, r = 0.47. For the treatment group, gains were again largest
at the pre-post stage (M = 6.35000, SD = 2.32322) t (19) = 12.224, p <.001, r = 0.78 and were maintained to
some degree at the pre-delayed stage (M = 3.50000, SD = 2.94690) t (19) = 5.132, p <.001, r = 0.47. There was
also attrition at the post to delayed test stage (M = -2.85000, SD = 2.79614) and this was found to be signi(cant, t
(19) = - 4.558, p <.001, r = 0.72.
These results show that both types of instruction had a durable bene(t for the productive knowledge of
the target lexis. They also again show that the gains were larger in general at the pre-post stage for the
experimental group indicating a clear short term bene(t for experimental teaching combined with TE and AE.
An independent samples t-test also revealed that there was a signi(cant difference (with a medium effect size )
between the groups in terms of their pre-post gains, demonstrating the superiority of the results for the treatment
group (Contrast group: M = 4.0500, SD = 2.3923; treatment group; M = 6.3500, S D =2.32322)
t (38) = -3.257, p =. 002, r = 0.46).
Overall, results for both tests show what we might expect at this level, both types of treatment helped
learners to improve their receptive and productive knowledge of the target lexical items. The effects of the
instruction were not sustained over time but gains made at pre-delayed were signi(cant for both groups and for
both test types. The greater gains for the treatment group in general and at the post test stage in particular,
indicate that experimental teaching plus AE/TE had a stronger effect in this study, particularly in terms of
productive knowledge. This suggests that an addition of AE/TE to explicit teaching can aid learning of lexis and
could heighten noticing and retention of targeted lexis. The absence of signi(cant differences between the
groups at the delayed tests stages may be due to the fact that TE/AE are a relatively implicit form of input
enhancement (Gasgoine, 2006) and may impact on learners for a short time only. To ensure a longer lasting
effect, students at elementary levels in particular, may need very explicit forms of TE and AE to accompany
explicit teaching. Gasgoine (2006), for example, found a positive effect for explicit input enhancement in a study
investigating diacritics in beginners learning French and Spanish. Her study found that learners who were asked
to re-type a passage in either French or Spanish and given keycodes showing them how to produce diacritics had
a signi(cantly higher recall of diacritics than a control group. This suggests that explicit measures such as asking
students to pay attention to the enhanced forms may be more effective at this level, particularly if combined with
repeated and longer exposure to the targeted items. Lastly, it is possible that administering a post-treatment
questionnaire to assess whether the AE and TE did in fact draw learners’ attention to the targeted items could
have demonstrated the impact of these enhancements upon noticing. White (1998), for example, found in a study
of TE with French texts that participants in her study believed that TE did make them attend to the targeted
forms. If there is evidence that learners are paying more attention to the targeted forms as a result of TE then it
can be argued that this is likely to lead to more noticing and durable learning.
Conclusion
The results of this study demonstrate that TE and AE did, to some extent, produce a more positive effect upon
durable learning than explicit teaching alone. When the groups were compared, this was signi(cant in the short
term in the gains of productive knowledge for the experimental group and for both measures, gains were larger
for the treatment group. Within group tests demonstrated that instruction had a positive and signi(cant impact
on both receptive and productive knowledge for both groups, when we compare gains made from the pre-post
and pre-delayed test. Given that both groups were beginners, we would of course hope and expect that this
would happen. However, the results do indicate that the use of enhanced input, particularly for beginners, could
be extremely bene(cial. Koprowski (2005) makes the salient point that materials often present learners with
possible language without any signal of which language may be more useful. For example, the chunk ‘play
football’ is more likely to be useful than ‘do judo’. The issue is that at the outset of learning a language all words
and phrases presented are potential input and the learner does not necessarily know which words or patterns are
more worthy of attention. Enhanced input, directed to high-frequency/highly useful lexis would seem to provide
a potential way of signalling to learners that certain pieces of language are noteworthy, as well as guiding
teachers to provide particular emphasis on these.
As mentioned in the literature review, TE and AE also have the potential to be utilised not only by
teachers to guide explicit vocabulary learning in class, but as a possible strategy for independent study for
language learners. This could be done informally, with learners simply highlighting post-reading lexis that they
feel is useful to them. It could also be carried out in conjunction with the use of word lists such as Coxhead’s
Academic Wordlist (2000) or the lists provided by English Pro(le (2014). There are a number of tools available to
learners (and teachers) which can pro(le vocabulary using a range of input word lists such as the Compleat
Lexical Tutor (Cobb, 2017) site. AE could be carried out by learners recording (on a smartphone or similar
device) texts and pausing before the key lexis they wish to remember, or by repeating those words a number of
times.
There are, however, certain limitations of the study which may have impacted upon the results. Firstly, as
discussed above, a more explicit form of TE and AE may have produced superior long term results. This could
have been realised with more listening for the AE aspect, such as playing the experimental group dialogues with
the target items repeated a number of times and asking learners to pay attention to the items they hear most
often. For TE, their attention could also have been drawn to the bolded words by simply asking them to try and
remember those words. Although this may seem unnecessarily mechanical, it may be the case that beginners
learning a second language focus their attention on all aspects of the input they receive and implicit input
enhancement may not be processed. Secondly, although we were able to assess both receptive and productive
knowledge, it can be argued (e.g., Schmitt, 2010) that a test battery is the most effective measure of vocabulary
learning. This could involve the type of tests used plus a constrained constructed response test (such as a gap-(ll)
and a freer productive test (such an elicited role play). If vocabulary learning is measured in these ways, it can
allow for a more robust analysis and tell us under what conditions learners really know a set of target items.
It is clear that the results of this study offer some evidence that TE and AE can have a positive impact
upon learning. If this is indeed the case, and was followed with other studies which demonstrate similar results, it
would be a simple and easy change for second language teachers to make to classroom practice. Teachers could
simply use TE to enhance target language within written texts and AE to enhance listening texts. Clearly though,
more research is needed, particularly in regard to the effects of AE. Future studies could focus on a greater use of
AE realised through measures such as teacher repetition and increased volume and stress on target items in
listening texts when combined with explicit teaching. It would also be useful to replicate studies such as this at
different levels, as we would suspect that AE and TE are likely to be more effective beyond elementary levels,
when learners can begin to focus on different aspects in the input they receive.
References
Alanen, R. (1995). Input enhancement and rule presentation in second language acquisition. In R. W. Schmidt
(Ed.), Attention and awareness in foreign language learning (pp. 259–302). Hawai’i: University of Hawai’i Press.
Alsadhan, R.O. (2011). Effect of textual enhancement and explicit rule presentation on the noticing and
acquisition of L2 grammatical structures: A meta-analysis. Unpublished Master’s dissertation, Colorado
State University, USA.
Baltova, I. (1999).The effect of subtitled and staged video input on the learning and retention of content and
vocabulary in a second language. Unpublished Doctoral dissertation, University of Toronto, Canada.
Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality
and reactivity. Language Testing, 28(1), 51–75.
Batstone, R., & Ellis, R. (2009). Principled Grammar Teaching. System, 37, 194 – 204.
Bishop, H. (2004). The effect of typographical salience on the look up and comprehension of formulaic
sequences. In N.Schmitt (ed.).Formulaic sequences: Acquisition, processing and use (pp 227 –244). Amsterdam:
John Benjamens.
Cambridge English Language Assessment; Cambridge University Press; The British Council; The University of
Cambridge; The University of Bedfordshire; English UK. (2014) English Vocabulary ProGle. Retrieved from
www.englishpro(le.org.
Cobb, T. (2017). Compleat Lexical Tutor. Retrieved from www.lextutor.ca.
Cho, S. (2016). Processing and learning of English collocations: An eye movement study. Language Teaching
Research. Advanced Online Access.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York: Lawrence Erlbaum Associates.
Cohen, L., Manion, L., & Morrison, K. (2011). Research methods in education (Seventh Edition). New York:
Routledge.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.
Dornyei, Z. (2007). Research methods in applied linguistics. Oxford: Oxford University Press.
Doughty, C. (1991). Second language instruction: does make a difference? Studies in Second Language Acquisition, 13,
431–469.
Folse, K. (2004). Vocabulary myths: Applying second language research to classroom teaching. Ann Arbour: University of
Michigan Press.
Gasgoine, G. (2006). Explicit input enhancement: Effects on target and non-target aspects of second language
acquisition. Foreign Language Annals, 39(4), 551–564.
Grgurović. M., & Hegelheimer, V. (2007). Help options and multimedia listening: Students’ use of subtitles and
the transcript. Language Learning and Technology, 11(1), 45–66.
Halenko, N., & Jones, C. (2011). Teaching Pragmatic awareness of spoken requests to Chinese EAP learners in
the UK. Is explicit instruction effective? System, 39(1), 240–250.
Hernandez, S.S. (2004). The effects of video and captioned text and the inAuence of verbal and spatial abilities
on second language listening comprehension in a multimeedia environment. Unpublished Doctoral
dissertation, New York University, USA.
Izumi, S. (2002). Output, input enhancement and the noticing hypothesis: An experimental study on ESL
relativization. Studies in Second Language Acquisition,19(2), 541–577.
Jensen, E.D., & Vinther, T. (2003). Exact repetition as input enhancement in second language acquisition.
Language Learning, 53(3), 373–428.
Jourdenais, R. O.M, Stauffer, S, Boyson, B., & Doughty, C. (1995). Does textual enhancement promote noticing?
A think-aloud protocol analysis. In R. Schmidt (Ed.), Attention and awareness in second language learning. (pp.
183–216). Hawaii: University of Hawai’i Press.
Kim, H.Y. (1995). Intake from the speech stream: Speech elements that L2 learners attend to. In R. W. Schmidt
(Ed.), Attention and Awareness in Foreign Language Learning (pp. 65–84). Hawai’i: University of Hawai’i Press.
Kim, Y. (2003). Effects of input elaboration and enhancement on second language vocabulary aquisition through
reading by Korean learners of English. Unpublished Doctoral dissertation, University of Hawai’i, USA.
Koprowski, M. (2006). Investigating the usefulness of lexical phrases in contemporary coursebooks. ELT Journal,
59(4), 322–332.
Lee, S. (2007).Effects of textual enhancement and topic familiarity on Korean EFL students’ reading
comprehension and learning of passive form. Language Learning, 57(1), 87 –118.
Leow, R. P. (2001). Do learners notice enhanced forms while interacting with the L2? Hispania, 84(3), 496–509.
Leow, R, Egi, T, Nuevo, A .,& Tsai, Y. (2003). The roles of textual enhancement and type of linguistic item in
adult L2 learners’ comprehension and intake. Applied Language Learning, 13(2), 1–16.
McCarthy, M. (1999). What constitutes a basic vocabulary for spoken communication? Studies in English Language
and Literature 1, 233–249.
McCarthy, M. (2004). Touchstone: From corpus to coursebook. Cambridge: Cambridge University Press.
Nation, I.S.P. (1999). Learning Vocabulary in Another Language. Wellington: Victoria University of Wellington.
Nation, I.S.P. (2006). Language education – vocabulary. In K. Brown (ed.) Encyclopaedia of Language and Linguistics,
2nd Ed. Oxford: Elsevier. Vol 6: 494-499.
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-
analysis. Language Learning, 50(3), 417–528.
Norris, J. M., & Ortega, L. (2001). Does type of instruction make a difference? Substantive (ndings from a meta-
analytic review. In R. Ellis (Ed.), Form-Focused Instruction and Second Language Learning (pp. 157–213).
Oxford: Blackwell.
O'Keeffe, A., McCarthy, M., & Carter, R. (2007). From Corpus to Classroom. Cambridge: Cambridge University
Press.
Petchko, K. (2011). Input enhancement, noticing and incidental vocabulary acquisition. The Asian EFL Journal
Quarterly, 3(4), 228–255.
Perez, M. M., & Desmet, P. (2012). The effect of input enhancement in L2 listening on incidental vocabulary
learning: A review. Procedia –Social and Behavioural Sciences, 34, 153–157.
Reinders, M., & Cho, M. (2010). Extensive listening and input enhancement using mobile phones: Encouraging
out-of-class learning with mobile phones. TESL E-Journal 14(2).Available from http://www.tesl-
ej.org/wordpress/issues/volume14/ej54/ej54m2/
Reinders, M., & Cho, M. (2011). Encouraging informal language learning with mobile technology: Does it work?
Journal of Second Language Teaching and Research, 1(1), 3–29. Available from www.uclan.ac.uk/jsltr
Richards, J. C., & Schmidt, R. W. (2002). Longman dictionary of language teaching and applied linguistics (Third Edition).
Harlow: Pearson Education Limited.
Rosa, E., & O'Neill, M. (1999). Explicitness, intake and the issue of awareness: Another piece to the puzzle.
Studies in Second Language Acquisition, 21(4), 511–566.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158.
Schmidt, R. W. (1993). Awareness and second language acquisition. Annual Review of Applied Linguistics, 13, 206–
226.
Schmidt, R. W. (1995). Consciousness and foreign language learning: A tutorial on the role of attention and
awareness in learning. In R. W. Schmidt, (Ed.), Attention and Awareness in Foreign Language Learning (pp. 1–
63). Hawai’i: University of Hawai’i Press.
Schmidt, R. W. (2001). Attention. In P. Robinson (Ed.), Cognition and Second Language Instruction (pp. 3–32).
Cambridge: Cambridge University Press.
Schmidt, R. W. (2010). Attention, Awareness and Individual Differences in Language Learning. In W. M. Chan,
S. Chi, K. N. Cin, J. Istanto, M. Nagami, J. W. Sew, T.Suthiwan., & I. Walker (Eds.), Proceedings of CLaSIC
2010, Singapore, December 2–4 (pp. 721–737). Singapore: University of Singapore Centre for Language
Studies.
Schmitt, N. (2000). Vocabulary in language teaching. Cambridge: Cambridge University Press.
Schmitt, N. (2010). Researching Vocabulary: A Vocabulary research manual. London: Palgrave Macmillan.
Sharwood Smith, M. (1981). Consciousness-Raising and the second language learner. Applied Linguistics, 2(2),
159–168.
Sharwood Smith, M. (1991).Speaking to many different minds: On the relevance of different types of language
information for the L2 learner. Second Language Research ,72, 118-32.
Sharwood Smith, M. (1993). Input enhancement in instructed SLA. Studies in Second Language Acquisition, 15, 165-
179.
Shin, D., & Nation, P. (2007). Beyond single words: the most frequent collocations in spoken English. English
Language teaching Journal, 62 (4), 339 –348.
Shook, D. (1994). FL/L2 reading , grammatical information and input-intake phenomenon. Applied Language
Learning, 52, 57–93.
Spada, N., &Tomita, Y. (2010). Interactions between type of instruction and type of language feature: A meta-
analysis. Language Learning, 60(2), 263–308.
Stratman, J.F., & Hamp-Lyons, L. (1994). Reactivity in concurrent think-aloud protocols: Issues for research. In
P. Smagorinsky (Ed.), Speaking about writing: ReMections on research methodology (pp. 89–111). Thousand Oaks,
CA: Sage Publications.
VanPatten, B. (1990). Attending to form and content in the input: An experiment in consciousness. Studies in
Second Language Acquisition, 12, 287-301.
White, J. (1998). Getting the learners’ attention: A typographical input enhancement study. In C. Doughty, & J.
Williams(Eds.), Focus on form in classroom second language acquisition (pp. 85-1 13). Cambridge: Cambridge
University Press.
Wong, W. (2003).The effects of textual enhancement and simpli(ed input on L2 comprehension and acquisition
of non-meaningful grammatical form. Applied Language Learning, 14, 109–32.
Appendix A
Sample Enhanced menu (target items in bold, translations not given to learners)
Içecekler [drinks]
Çay [tea]
Kahve [coffee]
Kola[cola]
Fanta [fanta]
Cips [chips]
Appendix B
Sample tests
Productive test: Write what you think is the English equivalent of each word.
Bira …………………………………………………..
Fanta …………………………………………………..
Çay …………………………………………………..
Içecekler …………………………………………………..
Soğuk içecekler …………………………………………………..
Sıcak içecekler …………………………………………………..
Kola …………………………………………………..
Maden suyu …………………………………………………..
Ayran …………………………………………………..
Beyaz şarap …………………………………………………..
Kırmızı şarap …………………………………………………..
Vodka …………………………………………………..
Alkollu içecekler …………………………………………………..
Kahve …………………………………………………..
Rakı …………………………………………………..
Tavuk şiş …………………………………………………..
Iskembe çorbası …………………………………………………..
Balık …………………………………………………..
Peynır salatası …………………………………………………..
Patlıcan salatası …………………………………………………..
Daniel Waller works at the University of Central Lancashire in the School of Language & Global Studies. He has
taught in various countries including Turkey, China and Cyprus. He completed his PhD in Language Assessment
at the Centre for Research in English Language Learning and Assessment at the University of Bedfordshire and
his research interests include written assessment, corpus linguistics and lexis.
Mª Pilar Agustín-Llach*
Universidad de La Rioja, Spain
Abstract
This paper offers a theoretical approach to vocabulary instruction from the evidence provided by lexical errors as the main
sources of difculty in the EFL acquisition process, it reviews previous research and from it suggests new ways of dealing
with lexical errors in the classroom. Some practical implications are concluded which rely on lexical error categories
identied in previous studies. Our main starting point is that lexical errors can serve as a guideline for teachers and
researchers to improve vocabulary instruction. Identifying the main causes of lexical errors can help teachers understand the
difculties of their learners and assist them in planning and designing lessons and materials for the vocabulary class.
Embarking from this premise, we have reviewed the main lexical error sources identied in the literature and provided some
suggestions for vocabulary instruction.
Keywords: lexical errors, cross-linguistic in'uence (CLI) in vocabulary, remedying strategies, vocabulary
instruction, explicit teaching
Introduction
Previous research on lexical errors has revealed a series of difculty areas within lexical acquisition. Descriptive
studies reporting on lexical errors allow researchers, teachers or material designers to identify the nature as well
as the origin or source of lexical errors. We believe that this information can be used to act upon the problematic
aspects identied and help deal with them. Lexical learning is a difcult and lifelong task and lexical errors are
most undesirable since they distort communication and can have a negative impact on the image of the learners.
However, they are also positive signs of vocabulary development. We believe that teaching learners the origin
and causes of their lexical misuse and how to remedy and prevent it, is a good start for successful and effective
lexical acquisition (Agustín-Llach, 2004, 2015; Hemchua & Schmitt, 2006). This paper intends to compile main
ndings and tendencies drawn from lexical error analysis in English as a Foreign Language (EFL) vocabulary
acquisition as a starting point to propose a set of actions to help learners overcome those difculties.
Analysis of these studies shows that we need to go further into detail beyond simple L1 versus L2
in'uenced errors. In fact, these studies show that considering the L1 as a unitary source of in'uence is an
oversimplication. L1 in'uence intermingles and collaborates with other sources, mainly L2 in'uence via
overgeneralization or confusion, in originating lexical errors. Descriptive studies of lexical errors have achieved a
renement in etiologies which has allowed us to identify the most problematic areas which should be dealt with
in the foreign language classroom.
In what follows, we intend to, rst, give account of the most frequent lexical error types found by
previous research and of their outstanding role in vocabulary acquisition, and then to propose some pedagogical
interventions or actions aimed at teaching vocabulary and remedying and preventing lexical errors in the
interlanguage of EFL learners in the light of those previous ndings.
*Tel: (34) 941 299435; Fax: (+34) 941 299433; E-mail: maria-del-pilar.agustin@unirioja.es; C/ San José de Calasanz, 33,
26004, Logroño, La Rioja, Spain
educational context. Establishing the source or main causes of lexical errors in EFL productions will allow us to
conclude some pedagogical implications for vocabulary instruction as hinted above. Among the most frequent
and important lexical error types in EFL, previous ndings highlight the following (Agustín-Llach, 2011; Bouvy,
2000; James, 1998; Warren, 1982):
1) Borrowings, which are bare L1 insertions into the L2 syntax; for instance, from Spanish L1:
My ciudad is very big (Eng. city).
We need to acknowledge that, while use of native words is a very frequent cause in EFL learners with
typologically closer L1s to English like French, Spanish, or German; it is a much rarer cause of
interference or difculty in learners who speak native languages which are distant from English such as
Chinese, Thai, Hebrew, or Arabic. Nevertheless, code switching from the L1 is a communication strategy
to overcome lexical lack of knowledge, and to scaffold their acquisition process. In this sense borrowings
tend to be marked in the students’ productions with e.g. inverted commas, capital letters, change of
intonation or pronunciation, or underlining. If the teacher and students share L1, then inserting L1 words
into the L2 discourse is a communication strategy which can result in successful message transmission
disregarding the source L1.
2) Lexical adaptation of an L1 word to the L2 morphological or phonological rules so that it sounds or looks
English (Celaya & Torras, 2001, p.7). An example of such lexical error appears in the following sentence:
My favorite deport is football (Eng. sport, Sp. deporte).
Psychotypological perceptions of similarity or rather of transferability (e.g. Kellermann, 1979) might
explain these types of adaptations. If learners perceive a lexical item can be transferred or is similar to the
L2 target, then they will try to tailor it to the L2 norm. Success of this strategy is certainly frequent, e.g.
contribution from contribución (Sp.) or come from kommen (G.).
3) Semantic confusion originates when the learner confounds two words which are semantically related in
the L2 such as for example in
My uncle’s name is Ana (for aunt) or in
In my city there are very shops (for many).
Especially conspicuous is the confusion of two auxiliary verbs: to have and to be. It is frequent to nd
sentences in learners’ data in which these two verbs are confused:
I’m an older sister, her name is Ana (for I have), or
I have eleven years old (for I am).
Some instances of this confusion can be traced back to L1 in'uence, however in some other cases the
explanations are unfortunately not so straightforward and nding a plausible interpretation for this
confusion is extremely difcult. Confusions can also have a formal origin thus giving rise to lexical errors of
the type:
I’m board (for bored) or
I lake playing basketball (for like).
We tend to call them phonetic or formal confusions. Semantic and formal confusions reveal a certain
degree of word knowledge, incomplete or imperfect knowledge, though. We might wonder whether the
learner knows both the target and the error word, and confuses them because of their similarity or whether
they ignore the target word and use a proximal, close word they have knowledge of. The rst example
might illustrate the rst case, and the second example the latter:
My hear is blond (for hair)
My favourite eat is pasta with meat (for food)
4) Learners also tend to calque L1 words or expressions when they lack exact lexical knowledge
of the L2 equivalents. A calque or literal translation originates when a learner literally translates
a L1 word and transfers the semantic and even syntactic properties of the L1 word into a L2
equivalent which has a different contextual distribution (cf. Zimmermann, 1986). Adjectival and
verbal structures or word order in compounds of phrases are likely candidates for literal
translation. The following sentences are good examples of this phenomenon:
I like ballhand (for Eng. handball, Sp. balonmano) and
My favourite plate is pasta and rice (from Sp. plato, Eng. dish)
5) Previous research with (for example) Spanish EFL learners has revealed that they display
wrong cognate use such as in the sentence, In the evenings, I go to an academy (Eng. private tuition
school, Sp. academia), where the word is used as it is in Spanish with the semantic and contextual
restrictions of the L1 and not of the L2. German EFL learners display a similar behaviour and
tend to use cognates in the L1 sense (Agustín-Llach, 2014) (for examples from other languages
see e.g. Bouvy, 2000; Ringbom, 2001; Warren, 1982). This type of lexical error could also be
considered as an extension or particular manifestation of word confusion (see above).
6) Spelling problems are probably the most frequent category of lexical errors in EFL learners’
writings (cf. Bouvy, 2000; Fernández, 1997; Lindell, 1973). These are violations of the
orthographic conventions of English. The lack of congruence between spelling and
pronunciation so characteristic of the English language is mostly responsible for these difculties.
EFL learners face the problem of having to cope with the complicated English encoding system
in which one sound, especially vowel sounds, can be rendered in multiple ways, i.e. through
different letters, and vice versa where one letter can be pronounced in different ways. Double
letters, silent letters, or triphthongs also cause problems for learners. Thus, we nd the following
misspellings as an example: beautifull, verday, ritting, inteligent for beautiful, birthday, writing, and
intelligent, respectively. A particular type of spelling error arises as the result of what is called
phonetic spelling, i.e. writing the words the way they are pronounced. Thus, we nd the
following examples that illustrate this phenomenon: Reichel for Rachel, keik for cake, spik for speak,
braun for brown, or saebyet for subject.
7) Construction errors make up the last category of lexical errors. These are the result of a faulty
use of constructions regarding for instance, choice of prepositions, re'exivity, transitiveness. Very
recent research trends within cognitive linguistics have identied constructions as central units of
the language, and take, therefore, a relevant role in SLA (cf. Goldberg, e.g. 2006). Constructions
represent the lexical-grammatical interface and thus errors in the arguments of the verb could
be termed “construction errors”. Learning a new language implies learning new ways of
encoding or conceptualizing reality, hence errors with transitive and re'exive verbs, with
prepositions, phrases or characteristics of verb arguments (e.g. animate/inanimate) tend to be
frequent, especially at higher levels of prociency (Verspoor et al., 2012). In previous lexical
error-related research, we were able to identify some lexical errors which could originate in
constructions (Agustín- Llach, 2015):
I donate at poor, for I donate to the poor.
I can relax me, for I can relax.
I am writing to introduce you myself, for I am writing to introduce myself (to you).
I meet friends for play, for I meet friends to play.
He visit to me always, for He visits me always.
Films romantic doesn’t love with me for I do not like/love romantic lms.
In the examples above, we observe a misuse of a preposition in the rst one, a re'exivization of
a non-re'exive verb in the second one, the wrong use of the dative in the third one, the wrong
preposition in a nality clause in the fourth example, the transformation of a transitive verb into
a non-transitive one in the fth example sentence, and in the last one the learner uses an
inanimate subject to the sentence where an animate subject is necessary.
Constructions could also traditionally have fallen under the heading literal translations. The main
difference is founded on the fact that construction errors pertain to more xed expressions, whereas calques or
literal translations appear in freer word combinations or compound words.
Focusing on these tendencies of lexical inconsistencies identied in previous research on lexical errors, we
are going to propose some instructional actions to tackle these problems in the classroom on the way to L2
vocabulary teaching and acquisition. This is not a treaty about error correction, but rather our intention is to
take a deep look into the vocabulary areas which cause major problems for Spanish EFL learners and describe
possible pedagogical interventions to remedy them. We have departed from identifying lexical errors to learn
from them and use them as a starting point for lines of vocabulary instruction. The following section offers some
suggestions for remedial and preventive vocabulary instruction.
where lexical items are presented and practiced in isolation deprived of communicative context. Computer assisted
instruction can be very useful to implement this focus on forms approach to minimize the effect of lexical errors.
Computer resources can enhance and facilitate vocabulary teaching, as well. We illustrate some possibilities of
computer enhanced vocabulary teaching below for the corresponding lexical problem. This manifold practice
approach should be the basis of an effective vocabulary teaching intervention.
This double-step approach mirrors the input-output orientation (cf. interactionist SLA perspectives e.g.
Long, 1996). First, learners are provided with input in the form of explicit explanations of the causes of errors
and of how the correct version should look like. These explanations trigger noticing. Dictionaries, corpora,
thesauruses can also be used to provide learners with explicit lexical input (e.g. McWhinney, 2005). Additionally,
promoting self-discovery and developing learners’ autonomy are two crucial steps to remedy their lexical errors.
Then, they are encouraged to produce L2 lexical items, i.e. pushed output. This approach is believed to enhance
lexical learning and provide learners with multiple opportunities for acquisition. Furthermore, pushing learners
further towards lexical progress can also help prevent fossilization and help them move over a possible “plateau
effect”. If learners’ attention is not called over recurrent errors, they might just be unable to spot and correct
them. Similarly, either conscious or most frequently unconsciously, learners stop developing their lexical accuracy
when they have reached communicative success (cf. Richards, 2008). They need to be urged to continue learning
and to be accurate.
Still, we can think of a transversal approach which is central in lexical learning and lexical error
prevention, namely explicit vocabulary strategy training. Lexical errors are on many occasions the result of a
faulty application of vocabulary learning or communication strategies. 1 In this sense, it is recommended to train
learners in the use of effective vocabulary strategies to improve their lexical production. Using cognate
knowledge, using word-parts (in'ectional or derivational prexes or sufxes, Latin and Greek roots), or using the
dictionary sensibly will arguably result in fewer lexical errors and better lexical use (Graves et al., 2012).
McWhinney (2005) proposes together with dictionary use, two other strategies to maximize learners’ full learning
potential, namely recoding or constructing new images or new concepts for new words or phrases and linking
word forms and meanings relating them to L1 equivalents, such as in the keyword method. We believe with
McWhinney (2005) that these strategies can be very helpful to cope with learners’ lexical learning challenge.
Finally, learning collocations and chunks or xed expressions is highly recommended to prevent and remedy
lexical errors and to increase learners’ vocabulary knowledge (cf. Richards, 2008).
successful vocabulary acquisition as Hemchua and Schmitt (2006) probed in their analysis L1 originated errors in
learners’ compositions. Moreover, warning learners of the dangers of literal translation and of lack of
straightforward semantic and contextual equivalence between L1 and L2 words or expressions is essential (cf.
Warren, 1982). This is of special relevance, since as Schmitt (2008) notes learners rmly believe that translating
will help them learn vocabulary words, idioms, and phrases.
5) Nevertheless, English shares a number of cognates with other languages, not to speak of international
words most of which come from English, which can be very helpful in articulating discourse. Learners can be
instructed on cognates and international words so that they can take advantage of these similarities and use them
to their advantage. Instructing learners to take advantage of their L1 lexical knowledge by resorting to cognate
use is a good way to increase learners’ vocabulary competence. Moreover, teaching them false friends will
presumably prevent erroneous word meaning inference.
Translation activities could also promote instances of positive L1 in'uence at the lexical level. Laufer and
Girsai (2008) found translation to be a particularly successful instructional condition, since it is an ideal task for
pushed output and fosters the mobilization of linguistic resources such as contrastive comparisons. Schmitt
(2008) also points to the benets of using the L1 to establish initial form-meaning links, and since at the rst
stages of acquisition learners are unlikely to absorb much contextualized knowledge about the words, there are
few possible negative effects of L1 use. However, at more advanced stages of acquisition the value of the L1
lessens and words should be presented in context, because learners can learn more from this (Schmitt, 2008).
Phonetic or formal confusions arise when two similarly looking or sounding words are mixed up. Warren
(1982) suggests that learners should be taught the form-meaning link of both words: the incorrectly used word
and the target word, contrasting them. Furthermore, teachers should also instruct learners on homophones and
give them examples. Homophones, or words which sound the same but have a different meaning, can be a
potential source of formal confusion. Similarly, words which have a similar meaning but a slightly different
contextual distribution in the L1 and L2 are also strong candidates for explicit instruction. The teaching of
formally, and especially of semantically similar words, has been prey to some controversy. Some authors have
claimed for a simultaneous teaching of semantically related words such as synonyms, antonyms, or hyponyms
(e.g. Nation, 2001; Tagashira et al., 2010). The idea that these semantic webs re'ect the way the mental lexicon is
organized underlies and justies this technique (Nation, 2001; Stoller & Grabe, 1995). However, a different trend
in research (Nation, 1990; Waring, 2007) has highlighted the higher likelihood of confusion when related words
are taught together at the same time, and advocate for teaching one member of the pair/tryad rst, and the
other only when the rst one has been properly mastered. From the evidence of lexical error production, we
believe that contrasting formally or semantically similar words and teaching them accordingly might be an
adequate approach to solve these problems of confusion. In this line of reasoning, we agree with Warren (1982)
when she calls for the identication of the common semantic trait(s) or semma(s) of the confused words and the
isolation of the distinguishing feature(s) to understand the confusion. This identication can proceed in two ways,
either the teacher gives explicit account of it, or they let learners deduce those features from a series of
contextualized examples. Using pictures, as we suggest below, can be an efcient way of contextualizing lexical
items.
Creating a meaningful context in which lexical learning is related to feelings and experiences is a technique
which will surely enhance vocabulary acquisition through deep processing (cf. Arnold & Foncubierta, 2013).
Establishing emotional links between the lexical items and learners’ personal memories and experiences will not
only help them remember better the words they wish to learn, but will also presumably prevent lexical errors.
This trend of exploring sensory-emotional intelligence and linking it to lexical learning has been recently
brought to light by some researchers such as Arnold and Foncubierta (2013) who propose tasks and exercises that
exploit this relationship in the FL classroom. We certainly believe this is a very fertile avenue for lexical
instruction and lexical error remediation.
We can think of a series of activities that can help learners reinforce the form-meaning link of the new
words and activate prior knowledge and contrast word meanings (cf. Graves et al, 2012). Semantic or conceptual
maps with both pictures and word forms evidence the relationships between words and make the specic traits
patent (cf. Barreras Gómez, 2004); in this sense, of special interest are virtual tools that can help the teacher and
the students also create conceptual mappings allowing for different colors, sizes, or movement. Semantic feature
analysis has been found to lead to robust word learning surpassing traditional vocabulary instruction (Bos, Allen ,
& Scanlon, 1989). This technique would allow learners to dissect the meaning(s) of the target and the error word
and compare and contrast them. Additionally, a semantic eld bingo (food, family, school) can be a fun tool to
practice related words highlighting common and distinctive features (cf. Barreras Gómez, 2004). Finally,
providing learners with the L1 equivalent of the target and error word might be the most effective intervention
(Warren, 1982). In this, Webb and Kagimoto (2011) found a benecial effect of providing glosses to help learners
learn collocations in the L2.
6) Spelling problems are, as commented on above, very numerous in EFL learners’ productions, including
the particular group of phonetic spelling. In traditional EFL classrooms, spelling and the link between spelling
and pronunciation was not paid much attention to. However, more recent teaching methodologies include
explanations concerning the different written renderings of vowel and consonant sounds as well as the multiple
plausible pronunciations of specic letters.
Grouping words according to their spelling and/or pronunciation is a good activity to learn how to write
and pronounce them. These kinds of explanations and subsequent exercises can help learners become familiar
with the grapho-phonological rules of the English language and thus overcome the problems posed by the
discordance between spelling and pronunciation. Using morphological knowledge of e.g. in'ectional sufxes,
derivational prexes or sufxes or knowledge of Latin or Greek roots can greatly enhance spelling abilities and
reduce misspellings considerably. Teaching learners English morphology, morphological patterns, building words
from word parts: roots plus afxes, breaking words into morphemes, identifying lexical units within compound
words, and teaching how this relates to lexical knowledge, and how to apply this to avoid lexical errors can be a
useful and interesting idea, for instance, morphemes such as -less, -ful, -able, in-, im-, un-, roots such as “tract” or
“voc”, or the units of complex or compound words such as screwdriver, schoolbag, blackboard.
Computer assisted vocabulary instruction can be very useful to prevent learners from committing
misspellings and phonetic spelling errors. By using a sound and recording device, learners can be encouraged to
produce the problematic lexical items and to check the gap between the native pronunciation, their
pronunciation, and the written rendering of the words. Similarly, using still and motion graphics and colors to
highlight new or difcult orthographic patterns, e.g. double consonants, silent letters, afxes can also be very
interesting.
But teaching cannot stop with controlled and guided focus on forms activities; increasing free written and oral
production within communicative tasks would mean a great step towards remedying and preventing misspellings.
Furthermore, if these communicative tasks include (language) games or ludic activities, such as crosswords, word
search puzzles, or hangman, their effectiveness towards the desired EFL learning outcomes could be augmented.
EFL teaching to young learners is starting to incorporate the “Jolly Phonics” method. This method was
designed by Lloyd and Wernham (1992, 2012) and has traditionally been used in English language teaching to
native children. It consists in relating pronunciation and spelling, joining isolated sounds to make up larger sound
combinations and form words. The segmentation of words into sounds is the other alternative of the method.
When generalizations or systematizations do not work, e.g. with words defying graphophonetic rules, then
learners are encouraged to practice those words in extra activities. This method is especially appropriate for
children, since it presents words whose meanings can be inferred from actions, mimicry, pictures, 'ashcards, or
objects. But its multisensory character, which links new words with learners’ multiple intelligencies such as
musical, kinesthetic, intrapersonal, or spatial (e.g. García de Celis, 2005; Gardner, 1994) makes it a good
candidate technique for vocabulary teaching at all levels. Furthermore, this relates to the above mentioned idea
of linking new words to old experiences and making vocabulary teaching acquisition an experiential and sensitive
activity (cf. Arnold & Foncubierta, 2013). With these considerations in mind, we might contend that this method
might be instructionally more helpful than previous attempts at teaching the pronunciation-spelling link.
7) Researchers working in cognitive linguistics and construction theory also advocate for explicit instruction
of the new and problematic structures in the lexico-grammar continuum. They have given this approach the
name of “Pedagogical grammar” (e.g. Dirven 2001). Together with explicit explanations and L1-L2 comparisons,
cognitive linguistic and sociocultural approaches to language pedagogy call for other techniques and activities
such as input enhancement to increase the perceptual salience of the lexical items to help retention or to
highlight their communicative relevance (Della Putta, 2015; Della Putta & Visigalli, 2012). We agree with Della
Putta (2015) in the need to help learners “unlearn” certain linguistic features and encourage them to
reconceptualize the reality around them according to the rules and codes of the L2. We need to clarify the way
the L2 embodies reality by explicit explanations, mimicry, or pictures, by promoting interaction and meaning
negotiation (e.g. Long, 1996), and by manipulating the input to lead learners to notice the new lexical items.
Not only do lexical errors have teaching pedagogical applications, we can also think of them as quality
reference. In general, their presence in learners’ productions would make them score lower. However, the
correlation is not straightforward. Lexical creations or misspellings do not represent important communication
breakdowns (cf. Agustin-Llach. 2011), but borrowings or calques are relevant communication disturbers. Their
seriousness resides in whether they cause intelligibility problems (Hughes & Lascatatou. 1982; Johansson. 1978).
Their relative importance also derives from the acquisition stage at which the learner nds him or herself.
Research has been able to associate lexical error types with specic acquisition stages (cf. Agustín-Llach, e.g.
2011, Hemchua & Schmitt, 2006), so that if a learner commits lexical errors typical of further stages of
acquisition, they cannot be considered as serious. Lyster et al. (2013) also highlight the importance of lexical
errors as crucial instruments in comprehension and call for the need to target them in L2 vocabulary instruction.
Conclusion
This is an exploratory theoretical paper in which we try to join two research trends. First, the examination and
systematization of lexical errors constitutes a major research area within SLA and lexical studies. Here, we have
not accounted for lexical error results of a particular population, but have rather presented general ndings of
some previous main lexical error studies of EFL learners. Research-based generalizations of lexical error
production will lead our pedagogical implications. Thus, secondly, banking on these frequent developmental
lexical errors, we have tried to propose some lines for pedagogical actions and interventions in vocabulary
instruction in EFL - a vivid line of research.
In the present study, we give no frequencies of lexical errors because we base on generalizations of
previous studies. The review of lexical error types comes from the need to nd tendencies or systematizations of
those lexical error categories and mainly of their causes, since they are going to be the stepping stones upon
which we are going to propose some pedagogical actions. Similarly, we do not intend to rank lexical errors or
vocabulary teaching activities or tasks according to their impact in vocabulary acquisition, but rather show some
general possibilities for the EFL classroom, always with the information of problem areas from lexical errors in
mind. The systematization of causes of lexical errors in EFL learners allows us to suggest some vocabulary
instruction and practical implementations.
This paper is of theoretical stance with some aspiration for practical application. Likewise, have not
conducted a specic study with actual informants and derived our proposal from the ndings. Rather, we have
generalized ndings from previous studies addressing the exploration of lexical errors in actual EFL learners’
productions and have extracted common tendencies and devised some lines for vocabulary instruction based on
those observed trends. Among the main conclusions to be drawn from this theoretical paper, however, we can
highlight one which affects foreign language teaching policies and refers to the need of explicit instruction of
vocabulary not only as concerns the form-meaning link exclusively, but also its relation to the L1 equivalents, the
spelling-pronunciation link, and its contextual distribution in syntactic, semantic, and pragmatic contexts. From
these observations and considerations, we also agree with Schmitt (2008) who concludes that evidence of
research studies suggests that different teaching methods may be appropriate at different stages of vocabulary
learning.
Further research should focus on experimentally testing these suggestions in the EFL classroom to check
for their effectiveness in vocabulary acquisition. A thorough analysis of lexical errors which extends through
several years can help us better understand the process of lexical development. Moreover, identifying the
variables that affect such process such as learner age, gender, native language, instructional approach, or
intralexical factors will be of great help to maximize lexical learning. Applying the results of such studies to
practical vocabulary instruction is a task which should receive far more attention in future research.
Endnotes
1
At this point we need to make two clarications. First, lexical errors can also derive from lack of word knowledge simply,
faulty rule application, or overgeneralization or transfer. Second, the application of vocabulary learning and communication
strategies does not necessarily lead to the commission of a lexical error. On the contrary, myriad are the examples of
successful application of vocabulary strategies that result in correct language use.
References
Agustín Llach, M.P. (2004). “Pedagogical implications and application of lexical inconsistencies errors in second
language classroom” en APAC of News, 52: 34-39.
Agustín-Llach, M.P. (2011). Lexical errors and accuracy in foreign language writing. Bristol: Multilingual Matters.
Agustín-Llach, M.P. (2014). Early Foreign Language Learning: The case of Mother Tongue In'uence in
Vocabulary use in German and Spanish Primary-School EFL Learners. European Journal of Applied
Linguistics. 2 (2), 287-310.
Agustín Llach, M.P. (2015). Lexical errors in writing at the end of Primary and Secondary Education:
description and pedagogical implications. Porta Linguarum. 23, 109-124.
Albrechtsen, D., Henriksen, B., & Faerch, C. (1980). Native Speaker Reactions to Learners’ Spoken
Interlanguage. Language Learning 30, 365-396.
Arnold, J. & Foncubierta, J.M. (2013). La atención a los factores afectivos en la enseñanza del español. Madrid: Edinumen.
Barreras Gómez, A. (2004). Vocabulario y edad: pautas para su enseñanza en las clases de Inglés de Educación
Primaria. Aula Abierta, 84, 63-84.
Bos, C. S., Allen, A. A., & Scanlon, D. J. (1989). Vocabulary instruction and reading comprehension with
bilingual learning disabled students. National Reading Conference Yearbook, 38, 173-179.
Bouvy, C. (2000). Towards the construction of a theory of cross-linguistic transfer. In J. Cenoz & U. Jessner (Eds),
English in Europe. The acquisition of a third language. (pp. 143-156). Clevedon: Multilingual Matters.
Celaya, M.L. & Torras, M.R. (2001). L1 in'uence and EFL vocabulary: do children rely more on L1 than adult
learners? Proceedings of the 25th AEDEAN Meeting. December 13-15, University of Granada. 1-14.
Corder, S.P. (1967). The Signicance of Learner’s Errors. IRAL 5, 161-170.
Della Putta, P. (forthcoming). How to discourage constructional negative transfer: Theoretical aspects and
classroom activities for Spanish learners of Italian. In K. Masuda & C. Arnett (Eds.), Cognitive Linguistics
and Sociocultural Theory in Second and Foreign Language Teaching, Mouton De Gruyter.
Della Putta, P. & Visigalli, M. (2012). A Classroom-based Study: Teaching the Italian Noun Phrase to
Anglophones. Paper presented at the Ireland International Conference on Education, 16th -18th April 2012,
Dublin.
Dirven, R. (2001). English phrasal verbs: theory and didactic application. In M. Pütz et al. (Eds.), Applied Cognitive
Linguistics II: Language Pedagogy. (pp. 3-27) Berlín/Nueva York: de Gruyter.
Ellis, R. (1994). The study of second language acquisition. Oxford: Oxford UniversityPress.
Engber, C.A. (1995). The Relationship of Lexical Prociency to the Quality of ESL Compositions. Journal of
Second Language Writing 4, 139-155.
Fernández, S. (1997). Interlengua y análisis de errores en el aprendizaje del español como lengua extranjera. Madrid: Edelsa.
Ferris, D. R. (1999). The case for grammar correction in L2 writing classes: A response to Truscott (1996). Journal
of Second Language Writing, 8, 1–10.
Garcia de Celis, G. (2005). Inteligencias múltiples y didáctica de lenguas extranjeras. Iberpsicología: Revista
Electrónica de la Federación española de Asociaciones de Psicología, 10, 7.
Gardner, H. (1994). Multiple intelligences theory. In R. J. Sternberg (Ed.), Encyclopedia of human intelligence (Vol. 2).
(pp. 740-742). New York: Macmillan.
Goldberg, A. (2006). Constructions at Work: the nature of generalization in language. Oxford University Press.
Graves, M. F., August, D. & Mancilla-Martínez, J. (Eds.). (2012). Teaching vocabulary to English language learners. NY:
Teachers College Press.
Hemchua, S. & Schmitt, N. (2006) An analysis of lexical errors in the English compositions of Thai learners.
Prospect 21 3, 3-25.
Hughes, A. and Lascaratou, C. (1982). Competing criteria for error gravity. ELT Journal 36 3, 175- 182.
James, C. (1998). Errors in language learning and use. Exploring error analysis. London: Longman.
Jiménez Catalán, R.M. (1992). Errores en la producción escrita del inglés y posible factores condicionantes . Madrid: Editorial
de la Universidad Complutense. Colección Tesis Doctorales n° 73/92.
Johansson, S. (1978). Problems in studying the communicative effect of learner’s errors. Studies in Second Language
Acquisition 1, 10, 41-52.
Kellerman, E. (1979). Transfer and Non- Transfer: Where We Are Now. Studies in position of the node word, and
synonymy affect learning? Applied Linguistics 32,3, 259−276. Second Language Acquisition 2,1, 37-57.
Laufer, B. & N. Girsai. (2008). Form-focused instruction in second language vocabulary learning: a case for
contrastive analysis and translation. Applied Linguistics 29,4, 694−716.
Lindell, E. (1973). The Four Pillars: On the Goals of a Foreign Language Teaching. I n J. Svartvik (Ed), Errata:
Papers in Error Analysis. (pp. 90-101). Lund: GWE Gleerup.
Lloyd, S & Wernham, S. (1992). The Phonics Handbook. Essex: Jolly learning Ltd.
Lloyd, S & Wernham, S. (2012). Guía para padres/profesores. Essex: Jolly learning Ltd.
Long, M. H. (1991). Focus on form: A design feature in language teaching methodology. In K. de Bot, R.B.
Ginsberg & C. Kramsch (Eds.), Foreign language research in cross-cultural perspective. (pp. 39-52). Amsterdam:
John Benjamins,.
Long, M. H. (1996). The role of linguistic environment in second language acquisition. In W. Ritchie & T. Bhatia
(Eds.), Handbook of research on second language acquisition. (pp. 413−468). Malden, M.A.: Blackwell.
Lyster, R., K. Saito & M. Sato. (2013). Oral corrective feedback in second language Classrooms. Language Teaching
46, 1−40.
McWhinney, B. (2005). Emergent Fossilization. In Z. Han & T. Odlin (Eds), Studies of Fossilization in Second
Language Acquisition. (pp. 134-156). Clevedon, U.K.: Multilingual Matters.
Meara, P. (1984). The Study of Lexis in Interlanguage. In A. Davies, C. Criper & A. P. R. Howatt (Eds),
Interlanguage. (pp. 225-239). Edinburgh: Edinburgh University Press,.
Meara, P. (1996). The dimensions of lexical competence. In G.Brown, K. Malmkjaer & J. Williams (Eds),
Performance and Competence in Second Language Acquisition. (pp. 35-53). Cambridge: CUP.
Nation, P. (1990). Teaching and Learning Vocabulary. Boston: Heinle and Heinle Publishers.
Nation, P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Richards, J. (2008). Moving Beyond the Plateau From Intermediate to Advanced Levels in Language Learning. Cambridge
University Press: New York.
Ringbom, H. (2001). Lexical Transfer in L3 Production. In J. Cenoz, B. Hufeisen & U. Jessner (eds) Cross-linguistic
InMuence in Third Language Acquisition: Psycholinguistic Perspectives. (pp. 59-68). Clevedon: Multilingual Matters.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and Second Language Instruction. (pp. 3−32).
Cambridge University Press.
Schmitt, N. (2008). Teaching Vocabulary. Pearson Longman. Available online:
http://www.longmanhomeusa.com/content/FINAL-HIGH%20RES-Schmitt-Vocabulary
%20Monograph%20.pdf Accessed 22nd December 2014.
Solís Hernández, M. (2011). “Raising Student Awareness about Grammatical and Lexical Errors via Email”
Revista de Lenguas Modernas, 14, 263-281.
Stoller, F. & Grabe, W. (1995). Implications for L2 vocabulary acquisition and instruction from L1 vocabulary
research. In T. Huckin, M. Haynes & J. Coady (Eds), Second language reading and vocabulary learning. (pp. 24-
45). Norwood, N.J.: Ablex Publishing Corporation.
Sunderman, G. & Kroll, J.F. (2006). First language activation during second language lexical processing. Studies in
Second Language Acquisition 28, 387-422.
Tagashira, K., Kida, S., & Hoshino, Y. (2010). Hot or gelid? The in'uence of L1 translation familiarity on the
interference effects in foreign language vocabulary learning. System, 38, 412-421.
Verspoor, M., Schmid, M.S., & Xu, X. (2012). A dynamic usage based perspective on L2 writing. Journal of Second
Language Writing 21, 239–263
Waring, R. (1997). The negative effects of learning words in semantic sets: A Replication. System 25,2, 261-274.
Webb, S. & Kagimoto, E. (2011). Learning collocations: do the number of collocates, position of the node word,
and synonymy affect learning? Applied Linguistics, 32(3), 259-276.
Zimmermann, R. (1986). Classication and Distribution of Lexical Errors in the Written Work of German
Learners of English. Papers and Studies in Contrastive Linguistics 21, 31-40.
Acknowledgements
This research has been funded by the Spanish Ministry of Economy and Competitiveness through grant number
FFI2010-19334/FILO.
Chengchen Ma
Xi’an Jiaotong-Liverpool University, China
Song Jing
Australian National University, Australia
Abstract
This study aims to further the understanding of rst language (L1) lexical transfer within the context of L1 Chinese learners
of English. Previous transfer research has often focused on a small subset of grammar errors, without examining how lexical
choices, especially in collocations and multi-word units (MWU), might have been in)uenced by L1 or L1-based assumptions
about vocabulary use. There is therefore a need to look for evidence of L1 transfer or word-for-word translation from the
native language in L2 production at each of the three levels: individual words, collocations and MWU. Such errors points to
subordinate bilingualism, which is rooted in translation as a teaching/learning method (Cook, 2014), which is common in
China (Edmunds, 2013). Therefore this paper addresses the following research questions: 1) To what extent does the transfer
of L1 word polysemy, collocations, and MWU impact Chinese learners’ English vocabulary use? 2) Are more advanced
learners as prone to L1 lexical transfer errors as the less advanced ones? The approach used here is corpus-linguistic. The
main research task is to examine an existing corpus of Chinese student writing in English and analyze and classify the
identied lexical transfer errors. The ndings indicate that the most common of these are errors caused by L1 polysemy in
individual words, followed by MWU and collocation errors. More advanced learners appear to be slightly but not
signicantly less prone to lexical transfer errors. Instruction which follows the recommendations made in this paper is likely
to prevent the onset of such errors.
Introduction
First language (L1) transfer, sometimes also called L1 interference or ‘cross-linguistic in)uence’ (Jarvis &
Pavlenko, 2008), has been found at different linguistic levels, from phonology and spelling to discourse. Although
transfer research has so far mainly focused on grammar, MacWhinney (1992) suggested that a signicant number
of L1 transfer cases were found at the lexical level as well. However, only a modest amount of research has been
conducted to investigate the impact of L1 transfer on L2 vocabulary acquisition. Yet, vocabulary is of
tremendous importance, as ‘without vocabulary nothing can be conveyed’ ( Wilkins, cited in Thornbury, 2002).
Words build language structures and convey L2 learners’ intended meaning, but only when they are
appropriately selected and used (Shalaby, Yahya & Kl-Komi, 2009). Inappropriate use of words could lead to
errors and miscommunication. Such errors are called lexical errors (Augustin Llach, 2011).
Errors have served as indices of writing quality in formal contexts. In other words, there is ‘a negative
correlation between quality writing and linguistics errors in general and lexical errors in particular’ (Augustin
Llach, 2011, pp.67). In order to be able to deal adequately with lexical errors found in learner writing, language
teachers should be aware of the sources of such errors. This is particularly important in Chinese contexts, where
*Tel.: +374 60612-740; Email: mdodigovic@aua.am; 40 Marshal Baghramyan Ave., Yerevan, 0019, Armenia
corpus studies have already recognized a signicant presence of lexical errors (Chan, 2010; Liu, 2011; Edmunds,
2013).
Relevant research (Hemchua & Schmitt, 2006; Zhou, 2010; Xia, 2013) suggests that lexical errors do not
only occur at single word level, but also at collocation and multi-word unit (MWU) levels (Gray & Biber, 2013).
Among those are lexical transfer errors which have been identied at every level (Yang, Ma & Cao, 2013; Li,
2005; Yamashita & Jiang, 2010), although not necessarily all within the same study. In short, lexical transfer
errors are lexical errors caused by L1 transfer. The identication of lexical transfer errors in the English output
of Chinese learners coincides with the nding that grammar-translation method happens to be the prevalent L2
instruction approach in China (Edmunds, 2013). If conrmed, such ndings could have profound implications
for language teaching practice in Chinese contexts. Hence, the present study aims to identify and analyze the
cases of negative lexical transfer from Chinese to English caused by 1) Chinese word polysemy at single word
level, 2) Chinese collocations, and 3) Chinese MWU.
knowledge. This does not surprise in view of Pienemann’s (2003) processability theory, which posits that L2
acquisition is beholden to predictable developmental sequences, allowing learners to acquire only those linguistic
forms which are appropriate for their developmental stage. Thus, Pienemann et al. (2005) are more inclined to
believe that early errors are a consequence of stagewise L2 development, i.e. that they are developmental.
What Cook (2014) calls subordinate bilingualism is akin to the notion of L1 transfer. The concept of
‘transfer,’ a product of behaviorism, was used extensively in the early years of SLA to refer to the process in
which the learners’ L1 in)uences L2 in a positive or negative way (Gass, 2013). ‘Transfer’ or ‘cross-linguistic
in)uence’ is preferred by linguists in SLA, while the term ‘interference’ is used more commonly in
psycholinguistic approaches to SLA (Jarvis & Pavlenko, 2008). More recently, there have been attempts to
redene the terms. Thus, Grosjean (2001) pointed out that the term ‘transfer’ should be used to refer to the
permanent in)uence of L1 on L2, while ‘interference’ should be used to describe L1 features occurring from
time to time in L2. In this study, transfer is used in the sense of the denition rendered by Gass (2013). In the
same vein, L2 errors caused by L1 transfer are called transfer errors.
Transfer errors as well as developmental errors are deemed to contribute to the state of interlanguage
(Yip, 1995) or the language of L2 learners which is found somewhere along the continuum between the L1 and
L2 (Han & Selinker, 1999). Within the interlanguage framework, those errors which defy correction and persist
despite repeated instruction are called fossilized errors (Gass & Selinker, 2008). Fossilisation resembles Grosjean’s
(2001) narrow denition of transfer. It is connected with the Multiple Effects Principle (MEP), “which predicts
that when language transfer works in tandem with one or more second language acquisition processes” (Han &
Selinker, 1999, p. 248) interlanguage structures are more likely to stabilize, leading to a permanent in)uence of
L1 on L2 (Han & Odlin, 2006).
Cases of L1 transfer can be exacerbated by polysemous L2 words or such words that have more than one
meaning sense (Schmitt, 2000). For example, Lennon (1996) found that in speech production, L2 English
learners, even the advanced ones, frequently made errors while using polysemous words such as ‘go,’ ‘put ‘ and
‘take.’ Morimoto and Loewen (2007) conducted a quasi-experimental study involving 58 Japanese high-school L2
English learners to compare the effectiveness of two kinds of vocabulary instruction, the image-schema-based
instruction and translation-based instruction, on learning of English polysemous words. The results revealed that
image-schema-based instruction is more effective than the translation-based instruction.
Conversely, few studies have addressed the in)uence of L1 word polysemy on L2 vocabulary acquisition.
Amongst the few is the study by Duan and Qin (2012), which argues that, unlike English, Chinese makes use of
the same word (character) to express a number of meanings, which allows the Chinese language to be
economical. However, this misleads Chinese learners to believe that English follows the same pattern, which
could result in L1 lexical transfer errors.
Another lexical area with the potential for L1 transfer is collocation. Collocations are described as ‘the
combinations of words which occur naturally with greater than random frequency’ (Lewis, 1997, pp.25).
Collocational knowledge as a parameter which ‘distinguishes native speakers from nonnative speakers’ (Schmitt,
2000, pp. 79) is not easy for L2 learners to acquire. Even advanced learners appear to have difculty with L2
collocations (Yamashita & Jiang, 2010).
L1 in)uence was found to play a critical role in a number of studies of L2 collocation, in which
researchers either asked learners to take elicitation tests of collocations or collected and analyzed collocations in
learner production. The former type is more common. Thus, a translation test was developed by Biskup (1990,
cited in Nesselhauf, 2005) to investigate Polish learners’ knowledge of English collocations, and it was discovered
that participants seldom made errors in translation from L2 collocations to L1, but a considerable number of
errors occurred when translating collocations from L1 to L2. Yamashita and Jiang (2010) conducted a study
involving 47 Japanese learners of English, in which participants were shown some collocations on a computer
screen, after which they needed to judge whether or not those collocations were acceptable English collocations.
The ndings showed that English collocations which are congruent with Japanese collocations are more easily
learned than those that are incongruent. Nesselhauf (2003) analyzed the use of English collocations in a learner
corpus consisting of 32 essays written by L2 learners of English whose L1 was German. The study discovered
that 56% of collocation errors could be attributed to negative L1 transfer.
Studies with Chinese speakers have yielded similar results. Based on a cloze test administered to Chinese
learners, Shei (1999, cited in Nesselhauf, 2005) concluded that advanced Chinese learners found it more difcult
to learn English collocations when compared with their European counterparts. The research by Lombard
(1997, cited in Nesselhauf, 2005) which investigated the collocations produced by Chinese learners also found
that 10% of non-native-like collocations might be due to L1 transfer, other sources being the incorrect use of
English synonyms. Wang (2011) used three collocation tests to investigate language transfer in the acquisition of
light verb + noun collocations by Chinese learners. He found that 61.84% of the participants’ uses of English
light verb + noun collocations may be traced to either positive or negative transfer from Chinese. Hence he
concluded that the in)uence of Chinese on the acquisition of English collocation was obvious and signicant. He
also suggested that the priority in teaching L2 collocations should be given to the L1 incongruent ones. In Chen
and Lin’s (2011) study, 355 rst-year college students from three different universities in Taiwan were asked to
complete a 50-item multiple-choice collocation test and questionnaire. The results showed that L1 transfer,
together with overgeneralization and misapplication of synonyms, was one of the top three factors leading to
collocation errors.
Other scholars analyzed learner writing to explore common collocation errors made by Chinese learners
(Li, 2005; Duan & Qin, 2012; Yang, Ma, & Cao, 2013). For example, Duan and Qin (2012) found that some
common English collocation errors such as ‘eat medicine,’ ‘nd an object,’ ‘pay time,’ are due to negative transfer
from Chinese collocations. More evidence of transfer of Chinese collocations comes from Yang, Ma and Cao
(2013). They argue that due to L1 transfer, Chinese learners of English frequently produce unacceptable
collocations in English, such as ‘learn knowledge’ or ‘strong competition.’
Beyond collocations, which are usually restricted to two words, there are longer strings of words called
multi-word units or MWU (Schmitt, 2000). They have been categorized into four linguistic categories: ‘phrasal
verbs,’ ‘xed phrases,’ ‘idioms’ and ‘proverbs’ (Schmitt, 2000). From the point of view of language production,
MWU are regarded as ‘formulaic expressions,’ ‘lexical phrases,’ or ‘lexical chunks’ which are stored in long-term
memory and can be easily activated (Pawley & Syder, 1983). Therefore, they are one of the key elements of
accurate, )uent and efcient linguistic production, playing a critical role in SLA (Biber et al., 1999; Erman and
Warren, 2000; Hyland, 2008).
Furthermore, except for idioms, whose meaning does not equal the meanings of their component parts,
most MWU are to some extent compositional (Nation, 2013). In other words, the meaning of each component
part in an MWU contributes to the meaning of the whole. As the selection of words in an MWU is not arbitrary,
the knowledge of its parts is conducive to the understanding of the whole MWU (Bogaards, 2001; Boers &
Lindstromberg, 2009). MWU are also regarded as being transferable because they are semantically and
syntactically compositional. L2 MWU studies (e.g. Rafee, Tavakoli & Amirian, 2011; Adel & Erman, 2012;
Karabacak & Qin, 2013) indicate that, compared with native speakers, L2 learners use fewer MWU, while also
being likely to overuse certain MWU and underuse others.
Few studies have so far addressed the L1 in)uence on the acquisition of L2 MWU. One of the
exceptions is Peromingo (2012), who explored the L1 in)uence on L2 learners’ production of both correct and
incorrect English MWU by analyzing argumentative writing from several learner corpora. The ndings suggest
that L2 learners tend to overuse the MWU which are similar to the L1 ones.
Paquot (2013) analyzed the writing by French learners of English from the rst version of International
Corpus of Learner English (ICLE) with special focus on their use of English 3-word sequences with a lexical
verb (marked). The results indicated that French learners made few errors in using English 3-word sequences
with a lexical verb. However, more errors were found in their selection of English unmarked word combinations,
whose French translation equivalents are easy to trace.
In conclusion, studies of lexical transfer have so far examined this phenomenon in three different
contexts: single word polysemy, collocation and MWU, although not necessarily all at once. These contexts
coincide with the scope of the term lexis, which subsumes not only single words, but also collocations and MWU
(Schmitt, 2000; Thornbury, 2002). Hence, any comprehensive studies of lexical transfer should not be restricted
to individual words, but would need to observe the effects of L1 on words in both collocational and MWU
contexts (Gray & Biber, 2013). In accordance with this, Dodigovic, Wei and Jing (2015) proposed the following
taxonomy of the contexts for lexical transfer from Chinese to English 1) Chinese word polysemy, 2) Chinese
collocations, 3) Chinese MWU. This taxonomy is followed in the current study.
Cook (2014) identies the grammar-translation approach to teaching as one of the causes of subordinate
bilingualism or lexical transfer. This approach appears to be the dominant L2 instruction methods in China
(Edmunds, 2013). Vocabulary teaching in this method encourages establishing links between L1 and L2 single
words, but does not necessarily pay attention to collocations or MWU, which sets the stage for word-for-word
translation and hence lexical error. Hence, it is important to investigate the evidence of lexical transfer in China,
in all of its lexical contexts. The ndings could have signicant implications for L2 teaching practice in the
Chinese speaking world. Hence, the current study aims to identify and analyze the cases of negative lexical
transfer from Chinese to English caused by 1) Chinese word polysemy at single word level, 2) Chinese
collocations, and 3) Chinese MWU. In doing so, it addresses the need for a comprehensive approach to lexis in
lexical transfer research, since corpus studies have to some extent examined individual aspect, but not necessarily
the entire scope of lexis in Chinese-English interlanguage.
Research Questions
To obtain a better understanding of the in)uence of Chinese as L1 on English vocabulary learning, the present
study attempts to address the following research questions: 1) To what extent does the transfer of L1 word
polysemy, collocations, and MWU impact Chinese learners’ English vocabulary use? 2) Are more advanced
learners as prone to L1 lexical transfer errors as the less advanced ones? Both, Chinese learners of English and
their teachers, stand to gain from the answers to these questions and the ensuing implications for the teaching
practice. It is hoped that the insights gained through this study will contribute to the increase of English language
prociency in Chinese contexts.
Methodology
Data
The learner corpus used in the present study consists of 100 samples of writing (541,482 words in total). Fifty of
those were written by rst year students and 50 by fourth year students at a Sino-British English medium
university in China. The students whose writing was included were native speakers of Chinese from the
department of English. They were aged between 18 and 23, and had been learning English for at least 6 years
(three years in middle school and three years in high school) prior to university enrolment. Research ethics was
abided by as per university protocol.
The genre and topics of year one and year four student writing are somewhat different. First year
students were required to write a 1,000-word essay to demonstrate their appreciation of a given movie, a poem
or a novel chapter, on which they were able to work for several weeks. In contrast, year four students presented
their Final Year Projects (FYP), on which they had worked for an entire year. Each FYP was at least 10,000 words
long. Despite the considerable differences, each type of assignment was level-appropriate, thus being a true
indicator of the writers’ ability in written English.
In the present study, each instance of negative lexical transfer from Chinese to English, in word,
collocation, or MWU, was counted as one error. Instances of L1 lexical transfer, highlighted and marked in the
corpus, were grouped into the three pre-dened categories: 1) those caused by Chinese word polysemy, 2)
Chinese collocations and 3) Chinese MWU. The average length of writing by rst year students was 1,000 words,
while that by nal year students was 10,000. Due to the considerable difference in word count, it was deemed
that the raw frequencies could be misleading. In order to make the results better comparable across sub-samples,
there was a need to ‘norm’ the raw frequency (Biber, Conrad & Reppen, 1998). In the present study, the counts
were normed to a basis of 1,000 words using the following formula:
Raw error frequency = (number of errors / total word count) * 1,000
The data was subsequently processed using the IBM SPSS Statistics Version 20 for both descriptive and
inferential analysis.
Procedure
The corpus of student writing was manually analysed for lexical errors by initially three and later two Chinese-
English bilinguals, who used dictionaries and the English corpora available through lextutor.ca to assess the
accuracy of their linguistic judgement. The accuracy of their judgement was subsequently veried by a native
English speaker. Based on their native speaker Chinese competence, the analysts also decided whether the
context causing error is the polysemy of the Chinese equivalent of the target English word, an underlying
Chinese collocation or a Chinese MWU. Each instance was counted as one error. For example, in cases where the
error was based on an underlying Chinese collocation which also included a polysemous Chinese word, the error
was counted as collocation error. Thus, larger lexical units took precedence over smaller ones in error count. The
accuracy of the Chinese aspect of this research was subsequently veried by an L1 Chinese rater.
In order to investigate whether the polysemous Chinese words identied in the learner corpus as the
cause of lexical transfer are frequently used by native speakers of Chinese, the corpus of ‘Texts Of Recent
Chinese,’ whose acronym is ToRCH (TORCH2009, Texts of Recent Chinese, Brown family, 2009, 2013 summer
edition) available from http://111.200.194.212/cqp/torch09/ was used in the study. The ToRCH project was
initiated under the name of CC2009 meaning Chinese corpus 2009 by Corpus Research Group at Beijing
Foreign Studies University. The current version was nalized on 20 July 2014 after the removal of some
duplicated portions of texts. The corpus contains 1,066,347 tokenized Chinese words or 1,670,356 Chinese
characters from texts of 15 types (Press: Reportage, Press: Editorial, Press: Reviews, Religion, Skill and hobbies,
Popular lore, Belles-lettres, Miscellaneous: Government and house organs, Learned, Fiction: General, Fiction:
Mystery, Fiction: Science, Fiction: Adventure, Fiction: Romance, and Humor). While the polysemy of the
Chinese words was judged by at least two native speakers of Chinese and later veried by an L1 Chinese rater,
the frequency was measured in terms of the word’s frequency ranking within the ToRCH corpus.
Table 1
The Frequency of Some Chinese Polysemous Words in the Corpus of Texts of Recent Chinese
NO. Chinese Words/characters Frequency
1 有 6,539
2 都 3,747
3 大 3,078
4 还 2,581
5 好 2,160
6 后 1,893
7 看 1,811
8 用 1,548
9 做 1,365
10 国家 1,023
11 重要 940
12 需要 825
13 通过 681
14 主要 650
15 方面 632
16 情况 567
17 提高 564
18 作用 508
19 变化 500
The second research question (Are more advanced learners as prone to L1 lexical transfer errors as the
less advanced ones?) was answered by comparing the error frequency in the writing of year one and year four
students. Compared with the writing of year one students (less advanced learners), fewer lexical transfer errors
were identied in the writing of year four students (more advanced learners).
Figure 2. Error frequency in writing by year one and year four students
Figure 2 compares the frequency of lexical errors in the writing of rst year and nal year students respectively
in terms of the three transfer categories on a 1,000-word basis. Specically, in the papers written by rst year
students, in every 1,000-word unit, 0.4404 error could be attributed to Chinese word polysemy, and the number
of errors caused by the transfer of Chinese collocations was the same as the number of those caused by MWU
(0.2936 and 0.2936 respectively). However, the gures decreased in the writing of nal year students. While
0.3605 errors were caused by the transfer of Chinese word polysemy, 0.1478 were caused by Chinese collocations
and 0.1924 by Chinese MWU.
An independent sample t-test was conducted using SPSS statistics software. The difference in the frequency
of lexical transfer from Chinese at all levels was not statistically signicant, t (56.609) = 1.788, p = .079. The
corresponding medium effect size r = .231 suggests however that the additional years of tertiary studies that year
four students had over year one students might have had a moderate bearing on lexical transfer error decline.
A medium frequency word is ‘ 认 识(ren shi),’ which can be used both as a verb and a noun. Its English
translation equivalents include: know, realize, acquaint oneself with, be familiar with, recognition, understanding.
It has a frequency of 360 in the ToRCH Corpus and has caused two errors in the present study.
1) have his/her own recognition of …
*[sb.] has his/her own understanding of
[某人]对…有自己的认识
2) they have the ability to realize these western goods.
* They have ability to get to know/get familiar with these western goods.
他们有能力去认识这些西方的商品。
Chinese collocations caused over one fth of the instances of lexical transfer. Three types have been
identied: verb + noun, adjective + noun and noun + noun. An example of the rst is ‘ 学习(xue xi)’ and ‘知识
(zhi shi),’ which always collocate with each other to make up ‘ 学习知识(xue xi zhi shi), whose English word-for-
word translation could be ‘learn knowledge.’ However, the correct English collocation should be ‘gain knowledge’
or ‘acquire knowledge.’ Here are two instances of this transfer type in the corpus:
Verb + Noun Collocations
1) The most important task for student in university should be learning
knowledge.
*The most important task for students in university should be gaining
knowledge.
大学生的主要任务是学习知识。
2) to learn the real practical knowledge
*to gain the practical knowledge
去学习知识
Finally, errors caused by Chinese noun + noun collocations are discussed here.
Noun + Noun Collocations
1) tool books
*reference books
工具书
2) development space
*room for improvement
发展空间
In the present study, close to one third of transfer errors were caused by Chinese MWU. Some typical
examples are discussed in this section. Many English MWU that indicate the author’s point of view were
negatively transferred from the Chinese ones. For instance:
1) standing in perspective of…
*walk in the shoes of…/ form the point of view of…/from the
perspective of …/in the perspective of…
站在…的角度
2) from this point to consider
*from this point of view
从这点考虑
3) from this point to see
*from this point of view
从这点看
Another type of transfer error found appeared to be caused by the difference in the order of language elements
in MWU of Chinese and English.
1) have some extent impact
*have impact to some extent
在某种程度上有影响
2) have some degree of in)uence
*have some in)uence to some degree
在某种程度上有影响
3) In the present day world
*In the world today/in the contemporary world
在当今世界
4) I not only can
* I can not only
我不仅能够
Discussion
The results show that Chinese word polysemy caused the most transfer errors, followed by Chinese MWU and
Chinese collocations. The frequency of transfer errors in the three categories was lower in the writing of year
four students than in the writing of year one students. This indicates that while writing in English, more
advanced learners tend to make fewer connections to the Chinese lexical network. This nding is consistent with
Kroll and Stewart’s (1994) as well as Jiang’s (2002) argument that in the minds of learners who are less procient,
L2 words are directly connected to their L1 equivalents. This nding, however, appears to counter Pienemann et
al. (2005) view that more advanced learners are more prone to L1 transfer than the less advanced ones.
Despite the fact that nal year students made fewer transfer errors, the difference in error frequency between
the two groups was not statistically signicant. As statistical signicance is dependent on sample size, a large
enough sample size would yield a small enough p value so that the desired level of statistical signicance could be
achieved. This was not the case in the current study. Therefore it is more noteworthy that the additional years of
English-medium instruction seem to have had a moderate impact on the decline in the number lexical transfer
errors.
However, the more advanced learners persisted with lexical transfer. This could be partly explained by
‘fossilization,’ which is a feature of L2 interlanguage (Yip, 1995). Moreover, as argued by Han and Selinker
(1999, pp. 248), ‘there is a greater tendency for interlanguage structures to stabilize, leading to possible
fossilization in spite of repeated pedagogical intervention.’ In addition, the nding is consistent with Jiang’s
(2000) L2 processing model, which stipulates that the transition from the L1 lemma mediation stage to the nal
stage could hardly happen due to the cessation of lexical development, or, more specically, due to fossilization.
Based on the discussion above, it appears that the ndings of the present study are in agreement with those
made by Yamashita and Jiang (2010). They concluded that L2 collocations which are not congruent with L1
collocations are more likely to cause negative transfer. In other words, the L2 collocations that cannot be
accurately represented through word-for-word translation from L1 would lead to transfer errors. This in turn
points to translation in the English language classroom as one of the likely precipitating factors in the case of
Chinese collocation transfer. The other one is possible lack of attention to collocations as such.
Conclusions
Intrigued by the role of L1 in L2 vocabulary acquisition, and the paucity of corpus-based research focusing on
L1 lexical transfer in Chinese contexts, the present study attempted to explore the lexical transfer errors caused
by 1) Chinese word polysemy, 2) Chinese collocations, and 3) Chinese MWU. A learner corpus containing 100
writing samples by 100 Chinese learners of English who were at the time studying at a Sino-British university in
China was compiled and manually analyzed. The results show that the majority of lexical transfer errors could
be attributed to Chinese word polysemy. Although less advanced learners made overall more lexical transfer
errors than the more advanced ones, the difference was not statistically signicant.
The fact that more advanced learners did not signicantly outperform the less advanced ones could be
explained by fossilization. Two possible underlying reasons for this were considered. The rst reason could be the
Chinese learners’ lack of adequate depth of English vocabulary knowledge, due to the lack of extensive exposure
to English and the lack of awareness of the lexical features of English vocabulary. The second reason is the over-
reliance on the Chinese conceptual network while learning English, which is exacerbated by the grammar-
translation approach to English instruction.
well as the associated registers and contexts in which they can be used. All of the above also require English
teachers to have in-depth English vocabulary knowledge.
Secondly, teachers should make the learners aware of the fact that there are no exact overlaps between
translation equivalents across languages. Moreover, in order to reduce the negative transfer from L1, the use of
bilingual dictionaries should decrease, especially for intermediate or advanced L2 learners. In contrast, the use of
monolingual learners’ English dictionaries should be encouraged since they could provide L2 learners with more
accurate and in-depth lexical knowledge, and offer them the contexts in which the words are used.
Thirdly, as argued by Ellis (2008), production could facilitate acquisition only if the learner is pushed, so
teachers should require learners to produce L2 as frequently as possible. For instance, learners should be
encouraged to try to think in English while writing English papers. In this manner, the role played by Chinese
could be reduced, thus preventing negative L1 transfer.
Fourthly, different approaches to teaching L2 lexis should be employed with learners at different levels.
Novice L2 learners most likely correspond to the initial stage in Jiang’s (2000) L2 lexical processing model. L2
learners at this stage are hardly able to establish a direct connection between the concept and the L2 word.
Instead, they connect the L2 word with its L1 translation equivalent. Therefore, Jiang (2000) suggests that an
interlingual teaching approach, namely the use of L1 translation, could be used in moderation to help the novice
L2 learners establish the forms and core senses of L2 words. However, lexical teaching strategies should change
with intermediate or advanced learners, who are already at the L1 mediation stage. In order to help intermediate
or advanced learners overcome the lexical or semantic fossilization, which leads to subordinate bilingualism, the
use of L1 equivalents should be avoided, and authentic and contextualized L2 materials should be used.
In addition, as suggested by Shalaby, Yahya and El-Komi (2009), word lists containing L2 words that are
difcult to acquire could be very helpful in L2 teaching. This is in particular the case with the multiple English
equivalents of the high-frequency polysemous Chinese words.
Similarly, since L2 collocations which are not congruent with L1 are found to cause transfer errors, lists of
English collocations that cannot be directly deduced from their L1 translation equivalents should be generated.
Furthermore, English collocations should be taught as unied wholes rather than as separate words. This is
especially important for beginners who are vulnerable to the negative in)uence of L1 collocations. Finally,
learners should be made aware of MWU, especially the ones that do not translate to English word for word.
Teachers and learners could turn to English language corpora for help concerning many aspects of
vocabulary, in particular collocation and context of use. The Compleat Lexical Tutor available at
http://www.lextutor.ca is a website enabling access to several corpora and analytical tools, which could be
successfully used for this purpose.
References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native
speakers of English: A lexical bundles approach, English for speci>c purposes, 31(2), 81-92.
Bialystok, E., Craik, F. I., & Freedman, M. (2007). Bilingualism as a protection against the onset of symptoms of
dementia, Neuropsychologia, 45(2), 459-464.
Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., & Quirk, R. (1999). Longman grammar of spoken and
written English (p. 1204) Harlow: Pearson Education Limited.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge:
Cambridge University Press.
Biber, D. (2009). A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and
writing, International Journal of Corpus Linguistics, 14(3), 275-311.
Bogaards, P. (2001). Lexical units and the learning of foreign language vocabulary, Studies in second language
acquisition, 23(03), 321-343.
Boers, F., & Lindstromberg, S. (2009) Optimizing a lexical approach to instructed second language acquisition . Basingstoke:
Palgrave Macmillan.
Carroll, D. (2007). Psychology of language. Belmont: Thomson Higher Education.
Chan, A. Y. W. (2010). Toward a Taxonomy of Written Errors: Investigation Into the Written Errors of Hong
Kong Cantonese ESL Learners, TESOL Quarterly, 44 (2), 295 – 319.
Chen, M.-H. & and Lin, M. (2011). Factors and Analysis of Common Miscollocations of College Students in
Taiwan. Studies in English Language and Literature, 28, 57 – 72.
Cook, V. (2014). How Do Different Languages Connect in Our Minds? In Cook, V. & Singleton, D. (Eds.) Key
Topics in Second Language Acquisition. (pp. 1-16) Bristol: Multilingual Matters.
Dodigovic, M. & Wang, S. (2015). The misuse of academic English vocabulary in Chinese student writing, US-
China Foreign Language 13(5), 349-356.
Dodigovic, M. (2005). Arti>cial intelligence in second language learning: Raising error awareness . Clevedon: Multilingual
Matters.
Duan, M. & Qin, X. (2012). Collocation in English Teaching and Learning, Theory and Practice in Language Studies,
2(9), 1890-1894.
Edmunds, K. (2013). Chinese ESL Learners’ Overuse of the Denite Article: A Corpus Study, BA thesis; Emory
University.
Ellis, R. (2008). The study of second language acquisition. 2nd edition. Oxford University Press.
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text-Interdisciplinary Journal for
the Study of Discourse, 20(1), 29-62.
Folse, K. S. (2006). The Effect of Type of Written Exercise on L2 Vocabulary Retention. TESOL Quarterly: A
Journal For Teachers Of English To Speakers Of Other Languages And Of Standard English As A Second Dialect, 40(2),
273-293.
Gass, S. (2013). Second Language Acquisition: An Introductory Course (4th Edition). New York: Routledge/Taylor Francis.
Gass, S. & L. Selinker (2008). Second Language Acquisition: An Introductory Course (3rd Edition). New York:
Routledge/Taylor Francis.
Gray, B. & Biber, D. (2013). Lexical frames in academic prose and conversation. Intemational Journal of Corpus
Linguistics, 18 (1), 109-135.
Grosjean, F. (2001). The bilingual’s language modes. In Nicol, J. (Ed.). One Mind, Two Languages: Bilingual Language
Processing (pp. 1-22). Oxford: Blackwell.
Han, Z. & Odlin, T. (Eds.), Studies of Fossilization in Second Language Acquisition . Clevedon, U.K.: Multilingual
Matters, 134-156.
Han, Z., & Selinker, L. (1999). Error resistance: Towards an empirical pedagogy. Language Teaching Research, 3(3),
248-275.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation, English for speci>c purposes, 27(1), 4-
21.
Jarvis, S., & Pavlenko, A. (2008). Crosslinguistic inGuence in language and cognition. New York: Routledge.
Jiang, N. (2000). Lexical representation and development in a second language, Applied linguistics, 21(1), 47-77.
Kroll, J. F., Bobb, S. C., & Wodniecka, Z. (2006). Language selectivity is the exception, not the rule: Arguments
against a xed locus of language selection in bilingual speech, Bilingualism: Language and Cognition, 9(02), 119-
135.
Kroll, J. F. & Stewart, E. (1994). Category interference in translation and picture naming: Evidence for
asymmetric connections between bilingual memory representations, Journal of Memory and Language, 33, 149 –
174.
Karabacak, E., & Qin, J. (2013). Comparison of lexical bundles used by Turkish, Chinese, and American
university students, Procedia-Social and Behavioral Sciences, 70, 622-628.
Lennon, P. (1996). Getting ‘easy’ verbs wrong at the advanced level, IRAL-International Review of Applied Linguistics
in Language Teaching, 34(1), 23-36.
Lewis, M. (1997). Implementing the lexical approach: putting theory into practice. Hove: Language Teaching Publications.
Li, S. (2005). An Investigation into Lexical Misuses by Chinese College Students under the Negative InGuence of Their First
Language. Master’ s thesis: Zhejiang University.
Liu, Z. (2011). Negative Transfer of Chinese to College Students’ English Writing, Journal of Language Teaching and
Research, 2 (5), 1061-1068.
Llach, M. P. A. (2011). Lexical errors and accuracy in foreign language writing. Claredon: Multilingual Matters.
MacWhinney, B. (1992). Transfer and competition in second language learning, Advances in Psychology, 83, 371–
390.
Morimoto, S., & Loewen, S. (2007). A comparison of the effects of image-schema-based instruction and
translation-based instruction on the acquisition of L2 polysemous words, Language Teaching Research, 11(3),
347-372.
Nation, I. S. P. (2013). Learning vocabulary in another language. 2nd edition. Cambridge: Cambridge University Press.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins Publishing.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for
teaching, Applied linguistics, 24(2), 223-242.
Pawley, A. & Syder, F.H. (1983). Two puzzles for linguistic theory: Native-like selection and native-like )uency. In
J. Richards & R. Schmidt (Eds.) Language and Communication. (pp. 121 – 225) London: Longman.
Pienemann, M., Di Biase, B., Kawaguchi, S., & Håkansson, G. (2005). Processing constraints on L1 transfer. In J.
F. Kroll & A. M. B. de Groot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 128-153) Oxford:
Oxford University Press.
Paquot, M. (2013). Cross-linguistic in)uence and formulaic language: French EFL learners’ use of recurrent
word sequences under scrutiny, Learner Corpus Research 18(3), 391-417.
Peromingo, J. P. (2012). Corpus analysis and phraseology: Transfer of multi-word units,’ Linguistics and the Human
Sciences, 6(1-3), 321-343.
Rafee, M., Tavakoli, M. & Amirian, Z. (2011). Structural Analysis of Lexical Bundles Across Two Types of
English News Papers Edited by Native and Non-native speakers, The Modern Journal of Applied Linguistics, 3
(2), 1 – 15.
Schmitt, N. (2000) Vocabulary in language teaching. Ernst Klett Sprachen.
Schmitt, N., & Meara, P. (1997). Researching vocabulary through a word knowledge framework: Word
associations and verbal sufxes. Studies in Second Language Acquisition, 19, 17 – 36.
Shalaby, N. A, Yahya, N., & Kl-Komi, M. (2009) ‘Analysis of lexical errors in Saudi college students’
compositions,’ Ayn, Jouenal of the Saudi Association of Language and Translation, 2(3), pp. 65-92.
Schmitt, N., & Hemchua, S. (2006). An analysis of lexical errors in the English composition of Thai learners,
Prospect: an Australian journal of TESOL, 21(3), 3-25.
Thornbury, S. (2002) How to teach vocabulary. Harlow : Longman.
Wang, D. (2011). Language Transfer and the Acquisition of English Light Verb + Noun Collocations by Chinese
Learners, Chinese Journal of Applied Linguistics, 34 (2), 107 – 125.
Wray, A. (2002) Formulaic sequences and the lexicon. New York: Cambridge University Press.
Wolter, B. (2006). Lexical network structures and L2 vocabulary acquisition: The role of L1 lexical/conceptual
knowledge, Applied Linguistics, 27(4), 741-747.
Xia, L. (2013). A Corpus-Driven Investigation of Chinese English Learners’ Performance of Verb-Noun
Collocation: A Case Study of Ability, English Language Teaching, 6 (8), 119 – 124.
Yip, V. (1995). Interlanguage and learnability: from Chinese to English. Philadelphia: John Benjamins Publishing.
Yamashita, J., & Jiang, N. (2010). L1 in)uence on the acquisition of L2 collocations: Japanese ESL users and EFL
learners acquiring English collocations, TESOL Quarterly, 44(4), pp. 647-668.
Yang, L., Ma, A. P., & Cao, Y. (2013). Lexical Negative Transfer Analysis and Pedagogic Suggestions of Native
Language in Chinese EFL Writing. The proceedings of the 2013 Conference on Education Technology and Management
Science (ICETMS 2013). (pp. 669 – 672) Atlantis Press.
Zhou, S. (2010). Comparing receptive and productive academic vocabulary knowledge of Chinese EFL learners.
Asian Social Sciences, 6(10), 14–19.
Chengcheng Ma, MA in TESOL, is a graduate from Xi’an Jiaotong-Liverpool Uuniversity and Univesity of
Liverpool. She served as a research assistant at Research Centre for Language Technology at Xi’an Jiaotong-
Liverpool Uuniversity. Currently, she is teaching English in Kunming, PR China.
Song Jing is a graduate from Xi’an Jiaotong-Liverpool Uuniversity. He served as a research assistant on the
project titled Lexical Transfer from Chinese to English in the Writing of XJTLU Students at Xi’an Jiaotong-Liverpool
Uuniversity. At present, he is pursuing an MA degree at the Australian National University.
Stephen Jeaco*
Xi’an Jiaotong-Liverpool University, China
Abstract
While studies exploring the overall effectiveness of Data Driven Learning activities have been positive, learner participants
often seem to report difculties in deciding what to look up, and how to formulate appropriate queries for a search (Gabel,
2001; Sun, 2003; Yeh, Liou, & Li, 2007). The Prime Machine (Jeaco, 2015) was developed as a concordancing tool to be used
specically for looking up, comparing and exploring vocabulary and language patterns for English language teaching and
self-tutoring. The design of this concordancer took a pedagogical perspective on the corpus techniques and methods to be
used, focusing on English for Academic Purposes and including important software design principles from Computer Aided
Language Learning. The software includes a range of search support and display features which try to make the
comparison process for exploring specic words and collocations easier. This paper reports on student use of this
concordancer, drawing on log data records from mouse clicks and software features as well as questionnaire responses from
the participants. Twenty-three undergraduate students from a Sino-British university in China participated in the
evaluation. Results from logs of search support features and general use of the software are compared with questionnaire
responses from before and after the session. It is believed that The Prime Machine can be a very useful corpus tool which,
while simple to operate, provides a wealth of information for language learning.
Key words: Concordancer, Data Driven Learning, Lexical Priming, Corpus linguistics.
Introduction
This paper presents the results of an evaluation of a concordancing program which the author developed as part
of his doctoral studies (Jeaco, 2015). After presenting a brief introduction to why the software was developed,
some of the theories and studies which had an in9uence on this work will be discussed. Then the basic design of
the software will be introduced and the evaluation itself will be presented.
* Tel: + 86 51288161301; E-mail: Steve.Jeaco@xjtlu.edu.cn; HS431, Xi’an Jiaotong-Liverpool University, 111 Ren’Ai Lu,
Suzhou Industrial Park, Suzhou, P. R. China
Given the limited time available in class and a deep sense of the need to help my Chinese learners of
English develop skills to explore language themselves, one of the main reasons for developing the concordancing
tool was so that it could be an additional language resource to which my students could turn in order for them to
check the meaning and use of words as they were composing, to consider alternative wordings as they were
proof-reading and editing their own work or the work of a peer, and to explore in their own time some of the
vocabulary which they had encountered brie9y in a class session and the different contexts and environments in
which it typically occurs.
effective than online dictionaries as reference resources in error correction in writing, but participants also
showed strong attitudes regarding difculties related to the time needed, unknown words in the concordance
lines, rule induction, having cut-off sentences and having too many examples. In addition to these issues, as
Anthony (2004) argues as he presents his classroom concordancer (AntConc), software for concordance exploration
is not usually designed specically with learners in mind. It is true that AntConc goes some way towards
simplifying the interface of a concordancer, but there are still many obstacles to getting started and knowing
enough about the tools and functions in order to use them. It has been argued that effort should be put into
trying to make concordancing software better in terms of its user-friendliness and its suitability for language
learners (Horst, Cobb, & Nicolae, 2005; Krishnamurthy & Kosem, 2007).
The Prime Machine aims to make insights about language based on Hoey’s theory of Lexical Priming (2005)
accessible and rewarding. The software has been designed to provide a multitude of examples from corpus texts
and additional information about typical contextual environments. Hoey argues that priming is “the result of a
speaker encountering evidence and generalising from it” (2005, p. 185), and also considers some of the
challenges that learners of a foreign language face due to limited opportunities to encounter language data
naturally, and also due to the severe limitations of wordlists and isolated grammar rules. The Prime Machine was
developed following key principles from Second Language Acquisition. First and foremost, the concordancer
and concordancing activities are a means of leading language learners to read multiple examples from authentic
texts. The SLA principle of exposing language learners to target language in use (Krashen, 1989; Nation, 1995-
6) provides a basis for this. Another fundamental principle from SLA is that of focussed attention and noticing
(Doughty, 1991). Schmidt claims that “intake is what learners consciously notice” (1990, p. 149). A link
between concordancing activities and Laufer and Hulstijn’s involvement load hypothesis (Laufer & Hulstijn, 2001) has
also been made clear by Lee et al. (2015). Tomlinson argues the positive effects of noticing language features
within authentic texts, and the learners’ recognition of a gap in their own language use can be strengthened if
the discovery process can be one in which the language learners uncover features for themselves (Bolitho et al.,
2003; Tomlinson, 1994, 2008). It is hoped that The Prime Machine goes some way to providing a platform for
these kinds of discovery as it has been designed specically to facilitate noticing of patterns and tendencies
(Jeaco, 2017).
It is possible to evaluate a piece of software like The Prime Machine by carrying out a series of system
evaluations or by conducting a user evaluation. A user evaluation considers how well the system meets the
expectations of its users, and how performance and accuracy affect the attitudes and actions of the users, and
these can be measured through both feedback mechanisms such as questionnaires, interviews or focus groups,
and through looking at the preferences expressed in records of users’ interactions with the software. Following a
user evaluation, priorities for further development become clear as software engineers can focus on ways to build
on the more positively viewed aspects of the software, or they can look at which parts of the system were
underappreciated or neglected and use system evaluation techniques to focus on these in isolation and attempt to
improve them. As the software was designed for language learning and teaching, it is important to consider how
principles from Computer Aided Language Learning (CALL) could be applied for the evaluation. Chapelle
(2001) makes suggestions for the judgemental analysis of CALL software (p53-4), the appropriateness of task
(p59) and the empirical evaluation of tasks (p68). She provides a list of six qualities as follows:
Language learning potential
Learner t
Meaning focus
Authenticity
Impact
Practicality
Each of these qualities should be considered when evaluating the effectiveness of a concordancing tool for
language learning. However, as Krishnamurthy and Kosem (2007) point out, it is also important for software
designers to get feedback from teachers in a pilot scheme in order to ensure teachers will want to use it. Scott’s
own re9ections on perceptions of the user-friendliness of WordSmith Tools include an important point that
teachers need to have condence in their own abilities to use software, and what it should be used for, otherwise
their fears for loss of face can be an inhibiting factor (Scott, 2008).
Figure 1. The Tabs Across the Top of the Screen in The Prime Machine Concordancer.
Search Tab
The usual starting point for language learners and teachers using the software is a specic word or collocation.
The search tab provides two boxes where words or phrases can be entered. As users start to type, the corpus
which is currently selected is accessed, bringing up lists of words and collocations for complete words. If the
word or phrase entered into the system is not found in the current corpus, the user can seek additional spelling
support, or click to check whether the word or phrase exists in any of the other corpora which are loaded into
the system. The software was designed to make comparisons between two words, two word forms from the same
family, words with similar meanings, and related collocations easy to make by providing search suggestions based
on words entered, and by presenting results for two searches side-by-side on screen 1. The search tab also allows
for comparisons of the same item across two corpora, and some other tools more tailored to corpus linguistic
research.
complete sentences above and below the sentence containing the node, with gentle highlighting of the line of
text which contains the node. At the top of each card, the caption shows strong collocations within the nearby
context of the node and the source type and citation is also prominently shown. The Cards Tab presents the list
of concordance lines in the form of cards, but obviously compared with the Lines tab, fewer concordance lines
are visible.
Figure 2. Example of the Lines Tab Showing the Card for the Currently Selected Concordance Line.
(Incidental Data from a Query for the Word consequences in The British National Corpus)
learners with the opportunity to experience the phenomena introduced in one of Firth’s ([1951]1957)
memorable assertions: “A word in a usual collocation stares you in the face just as it is” (p. 182).
Other Tabs
Additional information about the typical environments in which the search query may be found in the corpus are
shown on the other tabs. When the user looks up a specic vocabulary item, icons indicating strong tendencies
draw attention to different aspects of its typical context. The Graphs Tab shows the proportion of concordance
lines within specic contexts, and should draw learners’ attention to a selection of features that will resonate with
language teachers and will help learners engage with the data in the concordance lines more easily, including the
use of articles and prepositions, passive voice and modal verbs. Pre-calculated summaries for words and
collocations are also provided covering a range of features from the theory of Lexical Priming. Information on
the other tabs also makes it possible for language learners and teachers using The Prime Machine to explore the
patterns of words or collocations occurring in texts or sections labelled with a wide range of metadata, and as
they occur with other words and collocations in different text categories. Finally, the Corpus Info. Tab provides
information about the currently viewed corpus and its division into text categories.
Research Questions
This paper follows a user evaluation and reports on attitudes of language learners who used the software in a
language learning activity. The following research questions are considered:
1. Can the students nd examples which they consider helpful?
2. Which kinds of information do they look at most? How many results do they look at?
3. Which of the search support features are used most frequently?
4. How do they feel about the software? Would they want to use it in the future?
Methodology
Participants
Volunteers from an English-medium university in Eastern China were invited to participate in the project
through short announcements before lectures and through the student email system. None of the students were
currently studying modules taught by the researcher. Three sessions were scheduled for the same day, and these
face-to-face sessions took place on a Saturday to avoid any con9ict with class teaching. Students were able to
indicate a preferred slot through the university’s virtual learning environment system (VLE), (Moodle version 1.9),
and an information sheet was also provided for them to review before the rst session.
Materials
The materials for the evaluation included two questionnaires, a set of instructions demonstrating various aspects
of the software, a brief user manual for the software and a set of essay question prompts. The rst questionnaire
included demographic questions as well as questions relating to the students’ own views on their use of a range
of language learning reference tools such as dictionaries, electronic dictionaries and search engines, etc.
Therefore prior to using the new software, participants were presented with a broad range of relevant study
resources available as choices in the early part of the rst questionnaire, and for the questions relating to student
habits and their attitudes regarding the best resource for several specic language learning issues, the option of
concordance lines was not in any way foregrounded. The rst questionnaire also included questions about peer
review and more general attitudes towards language study.
The second questionnaire explicitly picked up on one of the questions from the rst questionnaire and
asked students whether their view of the importance of examples had changed as a result of taking part in the
project. There were also questions about how much they used several of the main features of the software and
how useful they perceived them to be. There were also a range of questions designed to gather their views on
appropriate future uses of the software and any suggestions for improvements.
Both of these questionnaires were delivered electronically through the VLE. Examples of resources were
provided on a printed A3 sheet, so that students would not need to 9ip between screens. This had examples of
dictionary entries, popular search engines or mobile phone apps and a picture of concordance lines.
Printed instructions were given to the participants, providing step by step guidance on the overall
procedure from answering the rst questionnaire, downloading the software, working through the examples,
writing the essay, and performing the follow up tasks later. In order to make the writing task relevant to students
from a wide range of university programmes, prompts were written on a range of topics related to contentious
but non-threatening issues which had been discussed in the news, following the style of popular language
prociency examinations.
Procedure
Participants volunteering to take part in the project were required to attend a face-to-face session in one of the
university computer labs. At the beginning of each session, the information sheets and consent forms were
distributed and then students were invited to complete the rst questionnaire on the VLE. After completing the
questionnaire, the students were free to start working through the instruction sheet, download the software and
look through the user manual. When the questionnaires had been completed, the researcher worked through all
the examples using a computer attached to a data projector. The participants were free to just watch or to try
using the software themselves. At the end of the presentation, blank lined sheets were distributed to students who
preferred writing essays by hand, while others loaded Microsoft Word and started to work on their essays on the
computers. The students were then given one hour to write their essays. During this time, they were free to
consult any other resources and to make use of the software. Formal examination conditions were not enforced.
Once students had submitted their essay to the researcher, they were free to leave. Within the next two
days, individual feedback on each essay was sent to each participant. The template used by the researcher for
this feedback included some comments based on each of the four criterion from the public band descriptors for
IELTS (www.ielts.org). The feedback also included three screen shots showing sets of concordance lines related
to three words or phrases used in the essay, as well as two Microsoft Excel spreadsheet attachments showing up to
100 more of the lines for these. A table of other single items or pairs of items to compare was also given. This
feedback was then sent to each participant and he or she was invited to complete the second questionnaire online
once he or she had reviewed the feedback, making use of the software again if he or she wished.
Four students participated in a pilot study several days before the main sessions took place, and a few minor
changes were made to the procedure, the wording of some items, and some small aspects of the software’s
operation.
Logs
For research into the use of corpus tools with language learners, Pérez-Paredes, Sanchez-Tornel et al. (2011)
argue that tracking of user actions through logs is essential in order to determine actual use rather than reported
use. The Prime Machine was designed to include the capability of collecting logs of various actions triggered by
mouse or keyboard movements during the evaluation.
Table shows a summary of the kinds of actions which are logged. During formal evaluations where
participants have consented to the collection of this kind of data, logs are sent when the application is not busy
retrieving data from the server or when the application closes.
Table 1
User Actions Which Can Be Automatically Logged by the Software
Action Examples Details
Category
Search Auto-complete for single words; Words /
Support Auto-complete for collocations; collocation
Suggestions for words with similar meanings; clicked
Spelling support request;
Request for a word or collocation to be checked in other
corpora;
Alternative corpus selected after other corpora have been
checked for a word or collocation not found in the current
corpus.
Use of other navigation buttons (“Back”, “Forward”, “Home” or
“Swap”).
Query Rules for query syntax not followed; Search string
Blocked Too few or too many words entered in a single query;
Word or collocation not found in the currently selected corpus;
Combination of words not found in the currently selected
corpus.
Query Single search; Search string
Compare mode search for two different queries;
Compare mode search for two different corpora;
Requests for more lines or collocation data.
Tab Cards Tab; Number of
Lines Tab; seconds viewed
Collocation Tab;
Graphs Tab;
Tags Tab;
Associates Tab.
Other A variety of other actions including the use of lters, access to help
screens, changes to options, changes of the main corpus and use of
various visual elements including the “Priming Dock”.
Also details such as the number of lines/cards viewed is stored.
As can be seen, a range of categories have been created, allowing the grouping of log data in terms of
search support features, actual queries, viewing of results and other features such as changes to options and
access to help.
Findings
A total of 25 students attended one of the face-to-face sessions, completing the questionnaire and submitting an
essay. All 25 participants were Chinese and came from Mainland China. The vast majority of the participants
were female, with just 3 male participants. In terms of the academic programmes from which the students came,
the most common was Financial Mathematics with 14 students, and this was followed by English and Finance (5
students), and 3 from engineering or computer science programmes, 2 from Chemistry and 1 from Economics.
The ages of the participants ranged from 18 to 22, with 3 students from Year 1, 7 students from Year 2 and 15
students from Year 3. Given the programmes represented, the gender balance and the home provinces of the
participants broadly re9ected the whole student population from which they were drawn. The participants
reported that they had studied English for between 7 and 15 years, with 19 out of 25 students having studied
English for 10 years or more.
Following the demographic questions, the rst set of questions in the questionnaire was related to the
students’ reported use of reference tools to help them with their English. As can be seen from Figure 3, by far
the most popular choice was mobile phone dictionary apps, with 21 students claiming to use these very often,
and 3 students selecting 4 out of 5 for this item. Just one student reported a lower score (2/5) tending towards
never. Interestingly, this student was the same student who indicated very often for concordance lines and one of
the four students who indicated 5/5 for English-English dictionary with Chinese translations. Following mobile
phone or electronic dictionaries, the next most popular choice was search engines. It is also clear that paper
dictionaries are disfavoured, and electronic means through mobile phone apps or search engines are clearly
favoured. As expected, the other clear nding was that for the majority of students concordance lines are not at
all regularly used, with 72% of respondents claiming never to use them at all, and a further 20% choosing the
second lowest rating. Three of the 5 students who chose 2/5 for concordance lines did not rate any of the
resources below 2. The student who rated concordance lines 3/5, also selected neutral scores for half of the
resources and did not select 1 or 5 for anything.
The next set of questions was related to which resource listed on the handout students thought would be
the most useful for ve specic kinds of language problems. Figure 4 shows the number of students who selected
each of these.
Figure 4. Judgements Given by Participants on the Best Resource for a Variety of Language Issues.
It is clear that mobile phone or electronic dictionaries were perceived to be the best choice for spelling and
meaning, while English-English dictionaries were considered best to check prepositions, collocations or to nd
examples. Interestingly, search engines were not considered the best choice by any students when checking the
meaning of words and were less popular than all three paper dictionary types and mobile phone dictionaries as a
source for examples. The only three areas where search engines were considered the rst choice by 16% or more
of the students were for spelling (24%), prepositions (20%) and collocations (16%). This would suggest that
search engines are used for language purposes by the students to check spelling and co-text rather than to provide
information about meaning or examples.
Again, it is evident that concordance lines were not considered the best resource for any of these problems
by the vast majority of students. There was also an interesting mismatch between the answers to the previous
question about reported frequency of use and the resources which were considered most useful. Only three
students chose concordance lines for any of the problems, and all three of these students had reported actual use
of concordance lines as being 1 (never) or 2. The student who had rated concordance line usage so highly in the
earlier question chose the option for “Chinese-English or English-Chinese dictionary” and the option for
“Mobile phone or electronic dictionary” for all of the problems. This suggests that the student who had reported
using concordance lines very frequently was perhaps using them for other work or considered them to be a
supplementary resource rather than a key one.
Another obvious conclusion which can be drawn from these data is that the vast majority of students (16
out of 25) consider translation dictionaries or mobile phone and electronic dictionaries to be suitable resources to
check meanings. The wording of this question was “Checking a word which has several different meanings” and
it is surprising that students place condence in dictionaries which often only have a limited range of translations.
As explained earlier, after submitting the essay, students left the rst session and were sent individual
feedback within the next two days. They were then invited to complete the second questionnaire. Although 25
students took part in the face-to-face session, two students did not complete the second questionnaire.
Figure 5. Reported Frequency of Use During Different Stages of the Writing Task.
Average ratings were 2.57 for planning, 3.87 for writing, 3.74 for checking or editing before submission and
3.74 for reviewing feedback from the teacher. The similar average scores for Writing, Checking/Editing and
Reviewing (Wilcoxon Signed Ranks Tests: Checking/Editing-Writing z=-.408, p=.683, effect size r=-.060;
Reviewing-Writing z=-.408, p=.683, effect size r=-.060; Reviewing-Checking/Editing z=-.0.37, p=.971, effect
size r=-.005) mask individual differences, however, as different students reported use of the software at different
levels. Only three students rated these three areas equally.
However, it is hard to nd evidence of actual use of the software in the logs, which suggests that students
were either exaggerating their use of the software or reporting attitudes rather than actual use. The strength of
the results is somewhat weakened if the question is interpreted as being representative of attitudes, but the varied
results do suggest that different students feel that the software would be useful for different stages of the writing
process.
Figure 6. Evaluation of the Usefulness of Some of the Main Features of the Software
From the graph in Figure 6, it is clear that students rated both the cards and lines tabs quite positively, with
approximately 74% of those who answered the second questionnaire choosing Useful or Very Useful. It is worth
noting that although the Cards Tab seems more mixed with 2 students reporting it was not very useful, 6 of the
23 students (26%) rated the Cards Tab above the Lines Tab. Having both ways of viewing the data may cater
for different learner preferences and different uses.
The Graphs Tab2 received the least positive feedback, with a much lower average rating (Wilcoxon Signed Ranks
Tests: Graphs-Cards z=-3.456, p=.001, effect size r=-.510; Graphs-Lines z=-3.337, p=.001, effect size r=-.492;
Collocations-Graphs z=-3.072, p=.002, effect size r=-.453), however it is worth noting that 6 out of the 23
students (26%) rated it as very useful or useful. The student who rated the Graphs Tab as “Very useful” had
lower ratings for all the other features except the Cards Tab.
The Collocations Tab was generally very positive. The student who rated the Collocations Tab at 2 also
rated the Cards Tab and Graphs Tab as 2, but rated the Lines Tab as 4 (useful). Clearly, this student preferred
looking at the information in the KWIC view, but from the logs it seems that he or she did not view the tables for
collocations.
By far the most striking result from Figure 6, however, is that being able to compare results side-by-side was
rated very highly indeed.
The results of the questionnaire questions related to the frequency of use during different stages of the task
and the students’ evaluation of the usefulness of some of the main features provide evidence that the rst
research question has been positively answered: the learners reported that they could nd examples which they
considered to be helpful.
Table 2
Logs Showing the Number of Views and Time Spent on Different Tabs in the Software
Average number
Number views Total time of seconds
Cards Tab 160 6485 40.5
Lines Tab 113 9328 82.5
Graphs Tab 53 2479 46.8
Collocations
Tab 70 4325 61.8
Tags Tab 35 813 23.2
Associates Tab 48 6615 137.8
Table shows that the logs seem to support the views regarding the usefulness of different tabs, with Cards and
Lines having much higher event counts and generally more time being spent on Cards, Lines and Collocations.
When looking at these gures, however, it is worth noting that the Cards Tab was set as the default results tab for
all users, so this will have received a log for every search which was completed. However, looking at the number
of cards viewed for each event, the logs show that an average of 15.1 cards were viewed with a range between 1
and 65. Only 17 out of the 160 events had fewer than 10 cards marked as having been viewed. Since only a few
cards are visible unless the user scrolls down, this seems to conrm that some users viewed quite a few results on
the Cards Tab.
It is worth bearing in mind, however, that the vast majority of the events were from the sessions on
Saturday, and the time in Table should be treated with caution since it is likely that students may have left a tab
visible when stopping to listen to another part of the demonstration. The times are calculated for the whole time
that the application is “active” (in the sense of being the window with the current focus), so this kind of data is
more reliable when students are completing a task in another window rather than switching attention to a data
projector during a demonstration or working on a paper-based activity.
From the logs, only 4 students seem to have made use of the software after Saturday, and gures for use
across different tabs for later use are shown in Table 3.
Table 3
Number of Views, Time Spent and the Number of Different Users for the Results Tabs After the Main Input Session.
Number views Total time Users
Cards Tab 10 186 4
Lines Tab 9 1679 4
Graphs Tab 4 91 3
Collocations Tab 7 74 4
Tags Tab 4 22 2
Associates Tab 3 26 2
Again, it is clear that most time was spent on the Lines Tab. Although, gures for the Graphs Tab may
seem a little disappointing, it is worth noting that there were a total of 188 clicks on the priming icons on the
dock and 18 users made use of this feature to switch to the Graphs Tab.
In terms of use of the ability to compare results side by side, the logs show that a fair proportion of
searches were made like this. Of the 281 logs from 22 users, 56% of searches were for one term only, while 44%
were made in compare mode. Three users did not appear to make any queries. Using the logs for the right-
hand retrieval only, 85% of the compare mode searches were to compare different queries across the same
corpus, while 15% were comparing the same query across two different corpora.
The summary of the log data which has been provided here addresses the second research question, which
was concerned the kinds of information viewed and the number of results. It is clear that overall the students
spent most time on the Lines Tab, followed by the Cards and Collocations tabs. The logs also showed some
engagement of the students with the different kinds of information and the number of results, measured by the
number and range of events logged and the number of concordance cards viewed.
Support Features
As well as being able to compare results easily, another set of important design features were related to search
support. The third research question was to ascertain which of these search support features would be used most
frequently. A total of 54 queries from 16 users were logged as having been blocked by the software. Six of these
were related to spelling errors, 1 was because a Chinese word had been entered. Nine blocked queries contained
collocations where the incorrect format had been given (lack of spaces or additional full stops, etc.), and 20
blocked queries were because the phrase was not stored as a collocation in the system. Four queries were blocked
because it seems nothing had been entered in the search box. A further 14 queries were blocked but information
is not provided in the logs.
As well as preventing users from making queries and waiting only to discover that no results are found, the
software also included other features such as auto-complete, collocation suggestions, synonyms, other word forms
and spelling support. From the logs, auto-complete for words was used 12 times, and 9 of these were for words
or word forms which did not form part of the demonstration. Collocations were selected from the drop-down
box 9 times, 8 of which were for collocations not part of the demonstration. Spelling support was requested 5
times, but from the logs it does not seem to be the case that the student made a subsequent search using the
correct spelling. This suggests that either the spelling component was too slow or did not provide useful
suggestions, or perhaps that students were trying it out rather than actually wanting to use it to assist with their
spelling.
A quarter of the all the search queries in the concordancer were made for words or word forms not part of
the worksheet, and these were made by 13 different users. In the second questionnaire, students were asked to
report on whether or not they had looked up words or phrases not connected with their task. Eleven students
reported that they had, and 7 said that the search was useful and 4 interesting/fun, including one student who
chose both useful and interesting/fun. Just 1 student said that this was a waste of time, but it is worth noting that
overall this student was highly positive in his/her responses to the questions about the usefulness of each tab,
having rated everything 5/5 except the Graphs Tab which was still rated positively at 4/5. These results might
suggest that overall the software is likely to have potential for the kind of serendipitous learning which has been
reported in DDL and “discovery learning” activities (e.g. Bernardini, 2004).
These results provide an answer to the third research question, demonstrating that the most frequently
used search support features seem to be those which can be found on the main search screen such as the spelling
support and the auto-complete features for words and collocations.
to think differently and get some information”, and another mentioned that corpus examples were useful because
students have little opportunity to see how native speakers express themselves.
The second question in this group related to the importance of understanding collocations. All 23 students
responded positively to this question. In the comments, 9 students mentioned the need for this kind of
information to avoid making errors or to improve accuracy, and 8 students mentioned the importance of
knowing how to use words.
The last question in this group asked students whether the software tool was useful. Out of the twenty-
three students, all but one responded positively. The student who selected “no” was one of the two students who
used the software most after the Saturday session. However, the actual comment made by this student is still
positive about the software’s usefulness; as is clear from the full response, his/her reservation is due to his/her
belief that other software packages may be able to provide similar information in a more convenient way:
“It has many many tools and looks useful, but some important usage can be replace[sic] by other APP.”
Overall, it seems that the software was received very positively, especially considering that from the results
of the rst questionnaire it is very clear that very few students had used concordancers before. All but one of the
students responded positively to the question about the usefulness of the software, and even the student who
responded negatively did so in a highly positive manner. As explained earlier, two students chose not to complete
the second questionnaire and their reasons for dropping out are not known. Neither student withdrew formally
from the project and it is likely that other pressures such as coursework deadlines and mounting pressure for the
nal examinations may have in9uenced their choice not to complete the second questionnaire. Nevertheless,
even if the non-participation of these students is interpreted as being lukewarm or negative towards the
usefulness of the software overall, the proportion of positive responses as a total of all 25 participants is still 88%.
Students who completed the second questionnaire gave a variety of reasons why they thought it was useful, with
4 mentioning being able to compare or see differences between words. Two mentioned the resources specically.
One student simply stated “It help [sic] students like a teacher”. Another student demonstrated a good
understanding of how different resources will be suitable for different occasions:
“This software may not be my Erst choice when I look up a word, because [an] electronic dictionary is much more
convenient. However, [the] function of the software is complete and I would like to use it as the complement of my Erst
choice.”
One other student mentioned that it was not so “convenient” to use; however, 4 other students commented
favourably on the “convenience” of the software. Another student focused specically on the way in which the
software can help students discover semantic associations of words writing:
“I think, it can tell us whether a word is positive or negative. This is interesting and useful!”
Other comments included positive evaluations of the software in terms of helping students to learn
effectively (1), the amount of detail (3), and its potential in helping with academic writing (3). One student also
said that it was useful for students from different “levels”.
The positive response is also evident in all of the responses to the question “In future, do you think you
would like to use software like this again?” 10 out of 23 students chose “Yes, denitely”, and the remaining 13
chose “probably”. None of the students chose “Not sure”, “Probably not”, or “Denitely not”. When asked to
select from three situations when the software should be used, 7 chose “In class with a teacher”, 16 chose “In
class for pairwork activities”, and 14 chose “Outside class independently”. Given that almost 70% of the
students thought the software was suitable for pairwork, and 2 of these students had reported that they did not
think peer activities were useful in the rst questionnaire, it seems that the software may have potential to as a
teaching tool to enhance pairwork tasks.
The positive responses to the questions about corpus examples, collocation information and the software
itself, coupled with these highly positive responses to questions about possible future uses of the software go some
way to addressing the fourth research question. However, one factor which needs to be considered in relation to
these largely positive responses is that in China there is a cultural desire to please. It is hoped that the in9uence
of this on the questionnaire responses was reduced through the precaution of not revealing who had created the
software until the debrief message was sent. Nevertheless, the results should be considered in the light of these
cultural in9uences.
Discussion
Taking these results as a starting point for the evaluation of The Prime Machine, this section will return to the 6
qualities of CALL software which were presented from Chapelle (2001).
The rst quality, “Language Learning Potential” when applied to this project might include a judgemental
analysis of the level of interactivity and the suitability of the range of target forms the software can provide. It
would seem fair to award the software highly in this area since its very design encourages students to look up
vocabulary themselves and to interact with the different tabs of data which are presented, and it also supports a
wide range of comparisons between words and collocations or between corpora. It is also clear that the software
has great potential for providing students access to a very wide range of target forms, both in terms of the level
of analysis from individual word types, to similar words and collocations, and in terms of the range of text types
from different disciplines and genres which are contained in the corpora which have so far been used. The
question of whether target forms are acquired and retained, as has been mentioned above, is still one which
needs to be explored, but the responses to the second questionnaire as presented here suggest that students were
able to identify the importance of the software in supporting language use and accuracy and as a means of
obtaining information about language.
In terms of the second quality, “Learner t”, the software would also seem to stand up very well. As a tool
for exploring words and phrases the software provides a great amount of control. The questionnaire responses
indicating how students viewed exploration of words or phrases not directly related to their essay writing also
provides evidence that the software has potential for incidental or less directed learning. To facilitate autonomy
and unsupervised exploration, one of the main aims for the design of the software was to provide more adequate
support, hints and guidance to learners, as compared with other leading concordancers. Within the context of
higher education, the software seems to have been very well received by students of different levels. The
evidence from the questionnaire on how students reported using the software, the variation in their preference for
different tabs of information and also the different views on how it could be used in future suggest that it might
cater well for different learners with different learner styles. Since students were overwhelmingly positive, but
positive about different aspects, it could be claimed that there is some empirical evidence that the software has
succeeded in this respect. Based on the positive responses from students, it would seem that the innovations in
the design of The Prime Machine alleviate some of the difculties reported in previous DDL studies using other
software. The difculties or frustration in formulating and performing search queries which was observed in
previous studies (Gabel, 2001; Sun, 2003; Yeh, et al., 2007) may have been alleviated by the search support
features. The availability of the Card view and being able to compare results side-by-side, could also explain why
there appeared to be fewer of the kinds of difculties related to time or the presentation of results reported in
other studies (Luo & Liao, 2015; Thurstun, 1996). However, clearly longer-term attitudes and measurements of
change in performance over time would need to be considered. Nevertheless, designers of other concordancing
interfaces could consider adding features like these if they wish to make their software more learner-friendly.
A focus on meaning also seems to be evident both from a judgement of the software and task, as well as
empirical evidence in the form of questionnaire responses. The high rating of the compare feature suggests that
students were interested in understanding how different words were used. The reported use of the software as
part of a writing task also provides some evidence that students could see how the software could be used to help
communicate their meaning effectively in writing, although as was mentioned earlier the logs suggest that these
attitudes were probably based on their ideas about how the software could be used, rather than based on their
actual experience using the software. Clearly, a longer study with log data matching reported views would be
desirable.
In terms of “authenticity”, the task design was highly relevant given the number of students who go on to
take language tests such as IELTS as well as tests for their EAP modules, but it lacked the authenticity of being
actually part of the degree programme itself. However, the learners clearly demonstrated a belief that the
software would be useful in the classroom or for self-study, and the overwhelmingly positive indication that they
would denitely or at least probably want to make use of the software again in the future is good evidence that
the software has to some extent met its aims as being a tool suitable for classroom or home use.
The “impact” of the software could be measured in terms of the comprehensiveness of feedback and
software logs. While the log data was a little disappointing in terms of quantity, the evaluation has demonstrated
that the level of detail which can be provided about different actions made by users of this system does have
great potential. It is certainly clear that students rated the experience of using the software as a positive
experience and in this respect the evaluation so far has been highly successful. The limited evidence of actual
use of the software, especially after the main face-to-face part of the evaluation, points to a need for further
research in order to ensure that the positive impact in terms of the perceptions of the students would also follow
through to a positive impact on longer-term use. One of the main limitations of the evaluation in terms of its
face validity was that although the participants were completing a writing task suited to their learning context,
the essay was not part of their formal studies and was administered towards the end of the semester when other
pressures such as assessed coursework and upcoming exams may have meant they were less inclined to put the
usual amount of care and attention into it. In order to encourage greater use of the software so that attitudes
would be based on more direct and prolonged exposure to the interface and results, participants could be given
opportunities to access it over a longer period. The software needs to be made available so students can access it
as and when they encounter language learning needs. Even in a shorter term study, if permission could be
gained for students to bring with them early drafts of assignments or materials from their classes, participants
would be much more likely to look up more words and phrases than when writing for an additional essay which
may not have any long or short term benets beyond general improvement of their language abilities.
Regarding the use of concordancers by language learners, the results of the rst questionnaire were
consistent with Timmis (2003) in that the participants had had very little prior exposure to direct use of
concordancing software. Given the learning background of learners in China, it would be unrealistic to expect a
sudden shift in their understanding of effective language learning processes, but the highly positive response to
the software suggests that providing students with a new way of looking at language can be very effective,
especially when supported by the kind of evidence which The Prime Machine can readily provide. Of course, a
very important consideration with any kind of teaching software is whether or not teachers will be interested and
willing to make use of it and to recommend it to their students. The design of the software was made by
drawing on my own extensive experience as a language teacher and as a manager of language teachers.
However, the importance of getting teacher input on software design (Krishnamurthy & Kosem, 2007) and
responding to teachers’ fears (Scott, 2008) should not be overlooked. Clearly, further exploration of the
perceptions of teachers and input from them will be a key to making The Prime Machine a well-used tool as well as
a useful tool for language learning.
The last quality is that of “practicality”. The fact that the evaluation ran smoothly with a single server
which was actually a desktop machine purchased in 2011 and was located outside the university local area
network suggests that the minimum requirements are reasonable. The Prime Machine has now been running on a
central server at the university for more than a year, and in the near future, this server will be accessible from
outside its host institution. For further details see www.theprimemachine.com.
Conclusion
This paper has focussed on one aspect of the evaluation of The Prime Machine. It has considered the results of the
small scale evaluation which took place over a short period of a few days, and it has also considered the scope of
this evaluation within a wider framework. Despite being somewhat limited in size and duration, the
questionnaire-based study has provided interesting insights into the acceptability of the software, face validity
and student attitudes before and after and has also provided some concrete areas for future development. While
the remaining ground drawing on frameworks from Computer Aided Language Learning for detailed evaluation
of the software as a learning and teaching tool is wide, this initial evaluation has served to demonstrate
condence that the project meets its overall aims. While there is also much scope for detailed evaluation of
specic features and mark-up processes, as well as opportunities for performance enhancement of the computer
processes behind the software, the participants’ enthusiasm suggests that the software is providing some
meaningful data and provides at least face validity for the hidden processes.
Through this small evaluation involving undergraduate students, the software has been shown to have
considerable potential as a tool for the writing process. Since this evaluation was carried out, The Prime Machine
has been developed further now includes additional tools for exploring vocabulary in terms of semantic tags and
other features. As it continues to be developed, it is believed that The Prime Machine will be a very useful corpus
tool which, while simple to operate, provides a wealth of information for English language teaching and self-
tutoring.
Notes
1
For a fuller explanation of the way these features work, for more details about the other features of the software
and for the pedagogical reasons behind the design see Jeaco (2015) and Jeaco (2017).
2
At the time of the evaluation, the label on this tab was "Primings Tab", and the questionnaire asked
respondents to comment on it using this name. However, the label was subsequently changed to "Graphs Tab"
as this better matches the purpose and scope of the tab.
References
Anthony, L. (2004). AntConc: A learner and classroom friendly, multi-platform corpus analysis toolkit . Paper
presented at the Interactive Workshop on Language e-Learning, Waseda University, Tokyo.
Bernardini, S. (2004). Corpora in the classroom: An overview and some re9ections on future developments. In J.
M. Sinclair (Ed.), How to Use Corpora in Language Teaching (pp. 15-36). Amsterdam: John Benjamins.
BNC. (2007). The British National Corpus (Version 3 BNC XML ed.): Oxford University Computing Services
on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/.
Bolitho, R., Carter, R., Hughes, R., Ivanič, R., Masuhara, H., & Tomlinson, B. (2003). Ten questions about
Language Awareness. ELT Journal, 57(3), 251-259.
Chapelle, C. (2001). Computer Applications in Second Language Acquisition: Foundations for Teaching, Testing and Research .
Cambridge: Cambridge University Press.
Cobb, T. (1999). Giving learners something to do with concordance output. Paper presented at the ITMELT '99
Conference, Hong Kong.
Coniam, D. (1997). A practical introduction to corpora in a teacher training language awareness programme.
Language Awareness, 6(4), 199-207.
Doughty, C. (1991). Second Language Instruction Does Make a Difference. Studies in Second Language Acquisition,
13(4), 431.
Firth, J. R. ([1951]1957). A synopsis of lingistic theory, 1930-1955. In F. R. Palmer (Ed.), Selected Papers of J R
Firth 1952-59 (pp. 168-205). London: Longman.
Gabel, S. (2001). Over-indulgence and under-representation in interlanguage: Re9ections on the utilization of
concordancers in self-directed foreign language learning. Computer Assisted Language Learning, 14(3-4), 269-
288.
Hindawi. (2013). Hindawi's open access full-text corpus for text mining research. Retrieved 6 November, 2013,
from http://www.hindawi.com/corpus/
Hoey, M. (2005). Lexical Priming: A New Theory of Words and Language. London: Routledge.
Horst, M., Cobb, T., & Nicolae, I. (2005). Expanding academic vocabulary with an interactive on-line database.
Language Learning & Technology, 9(2), 90-110.
Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
Jeaco, S. (2015). The Prime Machine: a user-friendly corpus tool for English language teaching and self-tutoring
based on the Lexical Priming theory of language. Unpublished Ph.D. dissertation, University of
Liverpool. Retrieved from https://livrepository.liverpool.ac.uk/2014579/
Jeaco, S. (in press). Concordancing lexical primings. In M. Pace-Sigge & K. J. Patterson (Eds.), Lexical Priming:
Applications and advances (pp. 273-296). Amsterdam: John Benjamins.
Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. In T. Johns & P. King
(Eds.), Classroom Concordancing (Vol. 4, pp. 1-13). Birmingham: Centre for English Language Studies,
University of Birmingham.
Johns, T. (2002). Data-driven Learning: The perpetual change. In B. Kettemann, G. Marko & T. McEnery (Eds.),
Teaching and Learning by Doing Corpus Analysis (pp. 107-117). Amsterdam: Rodopi.
Kennedy, C., & Miceli, T. (2010). Corpus-assisted creative writing: Introducing intermediate Italian learners to a
corpus as a reference resource. Language Learning & Technology, 14(1), 28-44.
Kennedy, G. D. (1998). An Introduction to Corpus Linguistics. London: Longman.
Kettemann, B. (1995). On the use of concordancing in ELT. TELL&CALL, 4, 4-15.
Krashen, S. (1989). We acquire vocabulary and spelling by reading: additional evidence for the Input Hypothesis.
The Modern Language Journal, 73(iv), 440-464.
Krishnamurthy, R., & Kosem, I. (2007). Issues in creating a corpus for EAP pedagogy and research. Journal of
English for Academic Purposes, 6(4), 356-373.
Laufer, B., & Hulstijn, J. (2001). Incidental vocabulary acquisition in a second language: The construct of task-
induced involvement. Applied Linguistics, 22(1), 1-26
Lee, D. Y. W. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a
path through the BNC jungle. Language Learning and Technology, 5(3), 37-72.
Lee, J. H., Lee, H., & Sert, C. (2015). A corpus approach for autonomous teachers and learners: Implementing
an on-line concordancer on teachers’ laptops. Language Learning & Technology, 19(2), 1-15.
Luo, Q., & Liao, Y. (2015). Using Corpora for Error Correction in EFL Learners' Writing. Journal of Language
Teaching & Research, 6(6), 1333-1342.
Meyer, C. F. (2002). English Corpus Linguistics: An Introduction. Cambridge: Cambridge University Press.
Mills, J. (1994). Learner autonomy through the use of a concordancer. Paper presented at the Meeting of
EUROCALL, Karlsruhe, Germany.
Nation, I. S. P. (1995-6). Best practice in vocabulary teaching and learning. EA Journal, 3(2), 7-15.
Pérez-Paredes, P., Sanchez-Tornel, M., Alcaraz Calero, J. M., & Jimenez, P. A. (2011). Tracking learners' actual
uses of corpora: guided vs non-guided corpus consultation. Computer Assisted Language Learning, 24(3), 233-
253.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129-158.
Scott, M. (2008). Developing WordSmith. International Journal of English Studies, 8(1), 95-106.
Sinclair, J. M. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Stevens, V. (1991). Classroom concordancing: Vocabulary materials derived from relevant, authentic text. English
for SpeciEc Purposes, 10(1), 35-46.
Sun, Y.-C. (2003). Learning process, strategies and web-based concordancers: a case study. British Journal of
Educational Technology, 34, 601-613.
Thurstun, J. (1996). Teaching the vocabulary of academic English via concordances. Paper presented at the
Annual Meeting of the Teachers of English to Speakers of Other Languages, Chicago.
Timmis, I. (2003). Corpora and Materials: Towards a Working Relationship. In B. Tomlinson (Ed.), Developing
materials for language teaching (pp. 461-474). London: Continuum.
Tomlinson, B. (1994). Pragmatic awareness activities. Language Awareness, 3(3-4), 119-129.
Tomlinson, B. (2008). Language acquisition and language learning materials. In B. Tomlinson (Ed.), English
Language Learning Materials: A Critical Review (pp. 3-13). London: Bloomsbury Publishing.
Tsui, A. B. M. (2004). What teachers have always wanted to know - and how corpora can help. In J. M. Sinclair
(Ed.), How to Use Corpora in Language Teaching (pp. 39-61). Amsterdam: John Benjamins.
Varley, S. (2009). I'll just look that up in the concordancer: integrating corpus consultation into the language
learning environment. Computer Assisted Language Learning, 22(2), 133-152.
www.ielts.org. IELTS | Researchers - Band descriptors, reporting and interpretation. Retrieved 16 January, 2014,
from http://www.ielts.org/researchers/score_processing_and_reporting.aspx
Yeh, Y., Liou, H.-C., & Li, Y.-H. (2007). Online synonym materials and concordancing for EFL college writing.
Computer Assisted Language Learning, 20(2), 131-152.
Yoon, H. (2008). More than a linguistic reference: The infuence of corpus technology on L2 academic writing.
Language Learning & Technology, 12(2), 31-48.
Heidi Brumbaugh*
Simon Fraser University, Burnaby, BC, Canada
Trude Heift*
Simon Fraser University, Burnaby, BC, Canada
Abstract
This article describes a research study that determined the depth of vocabulary knowledge of 28 intermediate ESL learners.
The study was carried out with Bricklayer, a vocabulary assessment tool for L2 English which tested the ESL learners on 72
words. Two post-tests collected evidence for concurrent validity. A semantic distance test captured incremental knowledge
for 36 words, but Bricklayer’s predictive power for this partial knowledge was weak. A standard multiple-choice test of the
remaining 36 words showed that Bricklayer predicted 61% of known words and 69% of unknown words; results were better
for words which were strongly predicted to be known or unknown. These +ndings provide promise that Bricklayer’s
assessment paradigm assists in building up models of students’ knowledge and behaviour in CALL environments.
Keywords: Computer Assisted Language Learning, vocabulary assessment, vocabulary depth, meta-cognition,
self-assessment
Introduction
It may seem intuitive, even obvious, that language learners need to know words of the target language in order to
communicate effectively. Nonetheless, Zimmerman (1997) points out that despite vocabulary’s central role in
language, over the course of the history of language teaching, vocabulary has not been emphasized. A surge of
interest in vocabulary over the past decade has shifted this focus. Nation (2013), for instance, states that “over 30
per cent of the research on vocabulary that has appeared in the last 110 years was published in the past eleven
years” (p. 5). This new body of research informs strategies for incorporating vocabulary instruction in the
language classroom including computer-assisted language learning (CALL) contexts.
At its most basic level, vocabulary knowledge involves connecting the word form (written or spoken) with
its associated meaning. Vocabulary researchers, however, have recognized that word knowledge is complex, and
thus have tried to articulate a broader structure for vocabulary knowledge (Henriksen, 1999; Nation, 1990, 2001;
Richards, 1976). These frameworks capture the idea that word knowledge is multifaceted. In addition to
knowledge about a word’s meaning, word knowledge also includes such features as associative knowledge, form
production and recognition, morphology, collocations, etc.
*Tel: (1) 831 247 1379; Fax: (+1) 866 216-8918; E-mail: heidi@vocabsystems.com; 5733 Hollister Ave. Suite 7, Goleta, CA
93117 USA
**Tel: (1) 778 782 3369; E-mail: heift@sfu.ca; Robert C. Brown Hall Building, Room 9201, 8888 University Drive,
Burnaby, BC V5A 1S6 Canada
Apart from the multifaceted nature of word knowledge, lexical knowledge is also acquired incrementally.
In fact, the idea that a learner does not progress immediately from being unfamiliar with a word to having
complete knowledge of all its meanings and usages was observed as far back as the early part of the twentieth
century (Dolch, 1927). Durso and Shore (1991) characterize this intermediate level of knowledge as partially
known words, or so-called “frontier” words (see also Shore and Kempe, 1999). Durso and Shore’s studies show
that although learners denied that the word was part of their language knowledge, they nonetheless were able to
access some semantic content about the word.
An accurate assessment of the learner’s vocabulary knowledge and stage of acquisition is especially critical
for the L2 classroom because it informs and drives instructional strategies. CALL, in particular, is well suited for
this task. Consider, for instance, that a computer could track and keep a record of whether a particular word was
mostly known, mostly unknown, or a frontier word, and construct a model (i.e., a representation) of the learner’s
vocabulary knowledge accordingly.
Such a model is called a learner model or student model and is an integral part of a computerized intelligent
tutoring system (ITS). A learner model allows an ITS to deliver individualized content for each student by
considering each learner’s behaviour and performance and tailoring instruction to their individual needs (see
Heift & Schulze, 2003). For example, words which are mostly known by the learner would not need to be
targeted for direct instruction, whereas mostly unknown words could be targeted for instruction or initial
exposure. Unknown or partially known words in the text could be targeted for hyperlink glosses.
By identifying frontier words which are in the process of being assimilated into the mental lexicon, an ITS
could target such words for what Nation (2001) calls “rich instruction,” which “involves giving elaborate attention
to a word, going beyond the immediate demands of a particular context of occurrence” (p. 95).
The following section discusses the most common vocabulary assessment tools in language instruction and
evaluates the extent to which current assessment tools can capture multi-faced and incremental word knowledge.
We also identify gaps in current vocabulary assessment techniques and introduce the CALL program Bricklayer
which presents a new paradigm for L2 vocabulary assessment. We then describe a study which we conducted
with 28 ESL learners to validate Bricklayer’s performance. After presenting the results of our study, we discuss
the merits of different types of vocabulary assessment tools and conclude with improvement suggestions for
Bricklayer.
Vocabulary Assessment
Vocabulary assessment tools can generally be classi+ed into two main types: breadth tests and depth tests.
Breadth Tests
The goal of the breadth test is to measure a learner’s overall vocabulary size. Two widespread assessment tools of
this type are the Vocabulary Levels Test (VLT) (Nation, 1983, 1990; Schmitt et al., 2001) and the Vocabulary
Size Test (VST) (Nation & Beglar, 2007). These tests rely on sampling across different frequency bands or ranges
in order to generate a comprehensive vocabulary score. The tests provide strong examples of content validity, in
that a test is typically considered to be a sample of a particular domain (Messick, 1989).
Nonetheless, breadth tests are not designed to assess speci+c vocabulary items. For example, if the Levels
Test indicates that the student knows 400 of the words at the 3,000 frequency band, there is no way of telling
which 400 words are known and which 600 words are unknown. Furthermore, as Milton and Vassiliu (2000) point
out, “learners acquire their knowledge from course books and not from frequency lists” (p. 446). The authors
researched a small corpus of three +rst-year EFL course books for Greek students and found that the vocabulary
was thematic and idiosyncratic. In addition, vocabulary at the 2,000 word range was underrepresented and
vocabulary at the 3,000 word range was overrepresented, challenging the notion that vocabulary is acquired by
students in the order suggested by frequency lists. Neither the VLT nor the VST pinpoint speci+c gaps in
vocabulary.
Depth Tests
There are several vocabulary assessments designed to detect the learner’s depth of word knowledge and they
differ in the types of lexical depth they measure: Webb (2005): different knowledge types; Schmitt (1998): four
different kinds of word knowledge; Meara (2009): word association knowledge; Schmitt and Meara (1997): depth
of word association as well as depth of knowledge for verbal suf+xes; Nagy, et al. (1985) and Collins-Thompson
& Callan (2007): precision of semantic meaning; Qian (2002): synonymy, polysemy, and collocational knowledge;
Schmitt (1998b) and Crossley et al. (2010): polysemy; and, Laufer and Nation (2001): Luency.
One of the more ambitious assessment instruments is the Vocabulary Knowledge Scale (VKS) (Paribakht
& Wesche, 1997; Wesche & Paribakht, 1996), which aims to measure depth of different kinds of word knowledge
via varying levels of questions. In the VKS, +rst students indicate whether they have seen a word, then whether
the meaning is known; if known, they +rst produce the word meaning, and then a sentence. Scoring is based on
how much knowledge was indicated.
The advantage to the VKS is that gradations of understanding can be captured. The downside is that it is
very time-consuming to administer; furthermore, Laufer and Goldstein (2004) point out that it does not
necessarily measure what it purports to measure. Indeed, most of the assessment instruments mentioned above
are designed for research purposes. Some are very arduous to administer (Webb’s (2005) assessment, for example,
requires ten questions for each word), making them impractible for the L2 classroom.
After the board is full, the game goes into mini-quiz mode. At this point, starting from the top, one random
brick per row lights up and the player is given a multiple-choice quiz for that word. For instance, Figure 2
displays the quiz for the word fat, which the user placed on the top left brick. The user must take the quiz in
order to continue. If the player picks the correct de+nition for a word on a brick, the brick becomes solid. If the
incorrect de+nition is chosen, the brick is destroyed.
Continuing on, the player is then quizzed on a random word on each of the following rows. The trick to
Bricklayer is that if the player picks the wrong de+nition, not only is the brick that the word was on destroyed,
but the bricks above the quizzed brick are also destroyed. The metaphor used in the game is that each brick
needs to be supported by the bricks below it. For instance, in Figure 3, the player has incorrectly answered a quiz
question for the brick hotel. Therefore, the player lost not only that brick but the four bricks above it.
After the player has taken one quiz per row, the game ends, and the player gets points for all the bricks left
on the board. The gamescore is presented as the percentage of all bricks remaining.
Each Bricklayer game presents the learner with a list of words. If this list has a range of words such that
some are known, some are unknown, and some are partially known, then this application has the potential to
measure not simply whether or not the player knows a word, but how well the player knows the word, at least in
comparison to all words contained in the word bank. If a player places a word on the top row and then “loses”
the word by an incorrect guess, not much is lost. Therefore, a player may “risk” putting an unknown word on the
higher rows. However, maintaining solid bricks on the lowest rows is critical to success, since one wrong guess can
knock out many bricks above. Therefore, the strategy for success in the game is for players to put the words they
think they know the best at the bottom, words they know pretty well in the middle, and words they are less
certain about near the top. For the purposes of the research study described below, participants were explicitly
instructed in this strategy.
Research Questions
In order to assess the ef+cacy of Bricklayer, we conducted a study with 28 ESL learners who were tested on 72
words (Brumbaugh, 2015). For the purpose of this article, we report the results with regards to the following two
research questions:
RQ 1: Does the learner’s behavior in the Bricklayer game provide a way to accurately predict the learner’s
knowledge for that word?
RQ 2: Does the strength of this prediction provide a measurement for the learner’s depth of knowledge for
that word?
Methodology
Participants
28 ESL learners participated in the study which took place at a mid-sized Canadian university. 1 Study
participants were recruited from the university’s English for Academic Purposes program, which is a remedial
ESL program for intermediate-level students seeking admission to the university. According to a background
questionnaire, the study participants were about evenly split by gender, ranged from 17 to 21 years old, and had
been studying English for an average of 7 years. The participants’ English language skills were from lower to
upper intermediate according to their self-reported IELTS test scores and student placement in the program. All
participants were native speakers of non-Indo-European languages: Chinese (20 participants), Vietnamese (5
participants), and Turkish (1 participant).
Materials
Aside from the ethics release form which was provided as hard copy, all remaining study and assessment
materials were presented to the participants sequentially on a web site. In addition to the background
information questionnaire, materials included an instructional video, the computer program Bricklayer with the
72 words chosen for the study, and two post-tests.2
The two post-tests tested the learners’ word knowledge for each of the 72 test items. They were divided
equally into one of two post-test categories: a standard multiple-choice test and a semantic distance test. As
discussed by Meara (1997) in the context of vocabulary acquisition, “[m]ultiple choice vocabulary tests, of the
sort typically used to assess incidental learning, may not be sensitive enough to pick up what is going on
[cumulative vocabulary acquisition]” (p. 119). For this reason, the semantic distance test was designed to measure
gradations of word knowledge.
All test questions were made up of the correct word de+nition, and three distractors. Note that the
de+nition for a given distractor was used instead of the distractor itself so that each distractor selection was the
de+nition for an actual word. The de+nitions were all drawn from the Merriam-Webster Learner’s Dictionary.
The multiple-choice test contained three distractors which were not semantically related to the correct
answer. Table 1 provides an example of the answer and distractor set for the word basket, a word in the multiple-
choice test condition.
Table 1
Sample Multiple-choice Quiz
Target
Correct answer Distractor 1 Distractor 2 Distractor 3
word
basket a container usually a covering for the a strong building a piece of cloth
made by weaving hand that has or group of with a special
together long thin separate parts for buildings where design that is used
pieces of material each +nger soldiers live as a symbol of a
nation or group
The semantic distance test included distractors of varying semantic distance from the target word (Nagy et
al., 1985). The choices for a semantic distance test contained the correct answer, for which a full score of 2 is
given, two words with a strong semantic relationship to the target (for which a partial score of 1 is given), and two
unrelated words, for which a score of 0 is given.
Table 2 provides an example of the answer and distractor set for the word straw, a word in the semantic
distance test condition. Moreover, Table 2 also shows the word on which the distractor de+nition was based,
although participants only saw the de+nitions. During Bricklayer gameplay, the same question/answer sets for
the words were used as the mini-quizzes.
Table 2
Sample Semantic Distance Quiz
Target Correct
Distractor 1 Distractor 2 Distractor 3 Distractor 4
word answer
straw the dry stems (corn) the (tractor) a (tin) a soft, (sew) to make
of wheat and seeds of the large vehicle shiny, bluish- or repair
other grain corn plant that has two white metal something
plants eaten as a large back that has many (such as a
vegetable wheels and different uses piece of
two smaller clothing) by
front wheels using a needle
and that is and thread
used to pull
farm
equipment
Data Collection
A video tutorial provided instructions to orient the participants to the Bricklayer game. The video emphasized
the strategy for scoring well. Speci+cally, it showed the player that placing the words they know the best on the
lowest row of the game board is the best strategy to minimize the risk of losing all the supported bricks due to a
missed quiz question. After two practice games, study participants then played 8 rounds of Bricklayer as part of
the research study. There were a total of 72 words, 18 on each board. Given that Bricklayer essentially forces
students to rank word knowledge, each word was presented in two different boards because their rankings may
depend on which other words are on the board. Finally, the participants took the two post-tests for all 72 words.
Findings
In order to examine whether the learner’s placement of each word predicted his or her knowledge for that word
(RQ 1), the results were modeled using two Rasch logistic regressions. First the multiple-choice test set was
modeled, then, the semantic distance test set.
In the Rasch model, independent variables are referred to as facets. In this way, the effect of each individual
item is calculated. The facets for the model reported here include all the scores that may have inLuenced the +nal
prediction for word knowledge. The most important facet is called the wordscore; this is based on the word’s +nal
position on the board as placed by the learner. Another facet is the gamescore which measures how well the learner
performed on that individual game. Which board was played (board) is included as a facet because the board
dif+culty may inLuence the prediction. Finally, learner and word are included as facets for the model because, in
item response theory, word dif+culty and learner ability each contribute a measure to the prediction (see
Brumbaugh, 2015 for a more detailed analysis of the individual scoring values used in the research study).
The dependent variable is the value of the post-test score. Half of the data, selected randomly, was used as
training data to assign weights to the facets. The other half was used for testing purposes to assess the validity of
the weights. All results here are from the test set. The resulting prediction for each observation is referred to as
the target score.
The Rasch model provides goodness-of-+t results for individual facets and so was used to provide an
analysis of individual lexical items. These results are presented in the following section.
Table 3
Rasch Model Chi-Squared Statistics for Multiple-Choice Test Condition
Variable Fixed chi- Sig. Random chi- Sig.
square[df] square[df]
Learner 71.3[25] <.01* 19.2[24] .74**
Board 4.3[3] .23 1.8[2] .41**
Word 140.2[35] <.01* 28.6[34] .73**
Wordscore 14.7[6] .02* 4.3[5] .51**
Gamescore 33.7[19] .02* 12.0[18] .85**
* Fixed chi-square is signi+cant <.05 and indicates the probability that items are equal on a rating scale.:
** Random chi-square signi+cance indicates the probability that these items could have been randomly sampled
from a normal population.
The random chi-square results identify the probability that the items could have been sampled from a
normal population. The highest probability is found with gamescore (χ2 (18) = 12.0, p = .85), followed by learner
(χ2 (24) = 19.2, p = .74), word (χ2 (34) = 28.6, p = .73), wordscore (χ2 (5) = 4.3, p = .51), and board (χ2 (2) = 1.8,
p = .41).
The Rasch model can also be evaluated by means of a confusion matrix, which gives the accuracy of the
model’s predictions in percentages. Table 4 organizes the observed scores (the multiple choice post-test scores) in
rows and the model predictions in columns. Once again, the model was based on observations – each prediction
Table 4
Confusion Matrix for Rasch Results of Multiple-Choice Test Condition
Observation Predicted 0 Predicted 1 No prediction* % Correct
0 (unknown) 351** 140 15 69.4%
1 (known) 167 261** 2 60.7%
Total 518 401 17 65.4%
*
Note. There is no prediction for values of .5.
**
Accurate predictions.
The +rst row shows the results for observed scores of 0, that is, cases in which an incorrect answer was
given on the post-test. Of these incorrect words, 351 were accurately predicted to be incorrect and 140 were
inaccurately predicted to be correct. There were 15 unknown words for which no prediction was made (see
below for a discussion), thus the incorrect results were accurately predicted 69.4% of the time. In the next row,
the words which were tested to be known are given. Of these, 167 words were inaccurately predicted to be
unknown and 261 were accurately predicted to be known. There were 2 words with no prediction. The accuracy
rate for known words was 60.7%. Overall, 518 words were predicted to be unknown, 401 were predicted to be
known, and 17 words had no prediction. The overall accuracy rate of the model is 65.4%.
It is important to understand that although these predictions are presented as binary, the Rasch model
actually generates an expected value which is between 0 and 1. In the case of the multiple-choice data, if the
expected value is lower than .5, 0 is predicted. If it is above .5, 1 is predicted. At .5, the model makes no
prediction; that is, there is an even probability that the word is known. Because this measurement is probabilistic,
expected values close to the midpoint of .5 are less certain than values further from the midpoint (Bond & Fox,
2007). Accordingly, the further away the expected value is from the midpoint, the more accurate the prediction
will be. Table 5 provides data to con+rm this assumption. It shows a set of four confusion matrices for the Rasch
multiple-choice results drawn from various ranges of expected values. In the +rst matrix, all results are modeled,
and the predictions are 65.4% accurate. In the second matrix, data from the mid 20% of predictions are
omitted, and the model is 69.1% accurate (although only 78.5% of the data are analyzed). The following two
matrices model even less data but the overall predictions are more accurate. In the third matrix, the mid 40% of
the predictions are omitted with an accuracy rate of 72.2%, and in the fourth matrix, the mid 60% of the
predictions are omitted for an accuracy rate of 75.6%.
Table 5
Confusion Matrix: Various Prediction Levels Modeled
All results
Observed Pred. 0 Pred. 1 None* % % of data Range
Correct included
0 351 140 15 69.4%
1 167 261 2 60.7%
Total 65.4% 100%*
developed by Masters (1982) is used, since the middle scores in the semantic distance post-test correspond to
partial knowledge.
As in the previous model, half of the data was used to train the model and the other half was used for
testing purposes; all results reported here are from the test set. In order to reduce the level of factoring in the
model, the results of the semantic distance test were binned into three groups rather than the original +ve scores,
and then converted to integers (for the purposes of the modeling software).
The chi-square tests of statistical signi+cance and probability are reported in Table 6 for each of the
individual facets (degrees of freedom are given in brackets). Accordingly, Table 6 indicates that, for the +xed chi-
square, the results are signi+cant for learner (χ 2 (25) = 59.3, p =< .01), word (χ2 (35) = 161.2, p < .01), and
gamescore (χ2 (18) = 44.9, p < .01). Neither board (χ 2 (3) = 6.3, p = .10) nor wordscore (χ 2 (6) = 11.9, p = .06) had
a signi+cant effect on the model. As for the random chi-square, learner (χ2 (24) = 17.7, p = .82) and gamescore (χ2
(18) = 12.6, p = .82) have the highest probability of having been sampled from a normal population, followed by
word (χ2 (34) = 29.0, p = .71), wordscore (χ2 (5) = 3.9, p = .56), and board (χ2 (2) = 2.0, p = .36).
Table 6
Rasch Model Chi-Squared Statistics for Semantic Distance Condition
Variable Fixed Sig. Random Sig.
chi-squared[df] chi-square[df]
Learner 59.3[25] <.01* 17.7[24] .82**
Board 6.3[3] .10 2.0[2] .36**
Word 161.2[35] <.01* 29.0[34] .71**
Wordscore 11.9[6] .06 3.9[5] .56**
Gamescore 44.9[18] <.01* 12.6[18] .82**
* Fixed chi-square is signi+cant <.05 and indicates the probability that items are equal on a rating scale.
** Random chi-square signi+cance indicates the probability that these items could have been randomly sampled
from a normal population.
Table 7 shows the confusion matrix results for the semantic distance condition. In this case, if the model
reports a strong probability that the word is known, the prediction corresponds to full knowledge. A lower
probability for word knowledge corresponds to partial knowledge, and a low probability for word knowledge
corresponds to no knowledge.
Table 7
Confusion Matrix for Rasch Results for Semantic Distance Condition
Observation Predicted 0 Predicted 1 Predicted 2 % Correct
**
0 (unknown) 143 112 44 47%
1 (partial) 135 101** 87 31%
2 (known) 62 91 161** 51%
Total correct 43%
**
Accurate predictions.
As in the case of the previous confusion matrix displayed in Table 4, each row in Table 7 contains the
results for an observed score. When the score was observed to be 0 (the participant selected an unrelated
distractor), the model accurately predicted an incorrect score 143 times, inaccurately predicted partial knowledge
112 times, and full knowledge 44 times, for an accuracy rate of 47%. Words which were observed to be partially
known (the participant selected a semantically similar distractor) were inaccurately predicted to be unknown 135
times, accurately predicted to be partially known 101 times, and inaccurately predicted to be known 87 times for
an accuracy rate of 31%. Words which were observed to be known (the participant selected the correct
de+nition) were inaccurately predicted to be unknown 62 times and partially known 91 times; 161 times they
were accurately predicted to be known for an accuracy rate of 51%. The model’s overall accuracy rate was 43%.
In summary, applying the partial credit Rasch model to the semantic distance test results weakens the
predictive power (which was shown in the multiple choice test set) to approximately chance values (43%),
implying that the game does not con+dently predict the learners’ depth of semantic knowledge for words. 3
Moreover, in the Rasch analysis, which included words and learners as factors by taking into account the
dif+culty of each word as well as the ability level of each learner, the three independent variables learner, word,
and gamescore are signi+cant factors, in contrast to wordscore and board which are insigniDcant.
Discussion
In Messick’s (1989) seminal article on test validation, he emphasizes the importance of test use in validating an
assessment instrument. It is not enough to ask whether a test measures what it purports to measure, but it must
also be considered whether the results are appropriate to the particular purpose for which the test was designed.
Bricklayer was designed to generate a learner model in the context of an ITS. Thus, it is appropriate to discuss
the results of this study in that context.
Unlike a teacher (who may be attuned to the general ability level of his or her students), an ITS could
construct a detailed model of a learner’s lexical knowledge. In a vocabulary ITS, an assessment tool would be
used to seed this model. This learner model should be dynamic, adjusting instruction to the learner’s behaviours
and knowledge states as they evolve and manifest themselves during system use. Such a model is similar to the
one described by Mislevy, et al. (2002), which “refers to a piece of machinery: a set of variables in a probability
model, for accumulating evidence about students” (p. 482).
Brumbaugh (2015) compared Bricklayer’s results to a standard checkbox assessment, which has also been
used in an ITS (Rosa & Eskenazi, 2013), and found that the checkbox assessment fared slightly better overall
than the Bricklayer assessment for the multiple-choice word set when words were binned into two ( known or
unknown) categories.
However, the Bricklayer prediction model also reports the probability that a word is known. Examining the
results more deeply, Bricklayer does a better job of modeling the “edge conditions” – words which are strongly
predicted to be known or unknown. This is shown by the analysis in Table 5 which models various prediction
levels. The two assessments may therefore be better suited for different tasks. The checkbox offers a quick way to
make assessments for a lot of words thereby suggesting that the checkbox test would be useful for breadth
assessments, or evaluations which require comparisons between students. In contrast, Bricklayer is more accurate
at identifying words which are either very likely known or unknown; in an ITS environment the remaining words
at the middle ranges might be considered frontier words which merit attention. Even the fact that words in this
predictive range are just as likely to be known as unknown might turn out to be indicative of frontier knowledge.
A word on the edge of acquisition may be subject to inconsistent test results as the memory trace for the word
may be incomplete or not always accessible.
At this point, an ITS could provide additional focused tasks for these words, for example, readings, games,
quizzes, concordance exercises, and other activities. Subsequent learner behaviour such as clicking a word to look
up the meaning or correctly answering a cloze activity would then present opportunities to update the learner
model with more precise information for these words. In other words, Bricklayer’s assessment results are not
considered de+nitive, but rather one piece of data in the larger construction of a learner model.
There are some shortcomings of Bricklayer which might be addressed in order to improve its performance,
as well as possible limitations in the experimental design of the current study which may have adversely affected
the statistical results.
Computer Adaptation
Neither Bricklayer, nor any game based on this forced-choice ranking model, will ever meet its full potential until
it is able to adapt to the word knowledge of the learner. Games with word sets which are either all known or all
unknown simply do not do a good job of distinguishing knowledge. Due to the challenges of programming and
data analysis, an adaptive study was not feasible for this initial research.
In an ITS context, such computer adaptive testing techniques are used generally to target content to
individual learners (Beatty, 2010). Furthermore, a student model that keeps a record of student behavior and
performance could ultimately track not only lexical knowledge and ability levels for students, but student “+t” to
the model as well.
Conclusion
This research study introduced Bricklayer, an assessment tool which can identify strongly known and unknown
words, and which can suggest which words might be on the frontier of acquisition. An analysis of the results also
ascertained ways in which the tool’s performance might be improved by +ne-tuning the scoring rubric and by
using computer adaptive testing techniques to customize game boards for each learner.
Bricklayer, which presents a new paradigm for L2 vocabulary assessment, connects with research on
vocabulary acquisition by providing a mechanism to capture partial word knowledge. While Bricklayer was the
primary focus of the empirical investigation, the original contributions to the vocabulary assessment +eld are not
about Bricklayer per se, but rather about some fundamental characteristics unique to Bricklayer. From this
perspective, Bricklayer is a working exemplar of a novel self-assessment paradigm.
Bricklayer essentially presents learners with the meta-cognitive task to rank a list of words according to
how well they know them. This differs in a qualitative way from typical self-assessments which force a binary
choice. Learners must consider not just whether they know a word, but how well they know it. It is possible that
this leads to a deeper level of cognitive reLection. In the Bricklayer study, participants only spent about a minute
in total on the three screens of checkbox items (n=24 words); in contrast, they spent on average 2 ½ minutes on
each game (n=18 words per game). This may indicate that they were giving more focused attention to the game
task.
There are certain drawbacks to ranking data. Primarily, if two words are equally known or equally
unknown, the ranking data are not useful. This could be mitigated in several ways in task design. For example,
participants might be presented with two or three words and then instructed to rank them in terms of
knowledge. In a computer interface, this could be achieved by dragging the words into an ordered list. Words
could be repeated in different contexts and then results subjected to an item response analysis such as Rasch.
Alternately, the participant could simply report, for example, by pressing a button on a screen, that both words
are known or both are unknown.
From a quantitative point of view, measurements derived from rankings provide a mechanism for
sensitivity to partial lexical knowledge. Implementing such a modi+cation to the standard self-assessment tools
might result in more robust results with a higher level of structural validity.
Currently, vocabulary assessment falls into two broad categories: traditional tests in which the learner must
select or give the correct answer, and checkbox self-assessments in which the test administrator must either rely
on the learner’s response or depend on pseudowords to gauge the learner’s accuracy. The assessment paradigm
on which Bricklayer is based offers a third option: random spot-checks of learners’ self-assessments. The mini-
quizzes in the game serve three important functions. Firstly, they give a way to validate the learner’s responses.
Secondly, they provide accountability to the learners – since they know the test may be coming, they have a
reason not to misrepresent their knowledge. Finally, they provide a mechanism for clarifying the expectation
about what type of word knowledge is being tested.
There are typically three uses for assessment: evaluation, instruction, and research. In a context in which a
student is being evaluated for aptitude for a given program or in which learning gains for a course are being
assessed, Bricklayer’s probabilistic results might be too subtle to accomplish the test purpose. However, in
instructional contexts, such as a classroom or ITS environment, Bricklayer’s paradigm might be well-suited to
identify frontier words which would bene+t from further, direct instruction.
Endnotes
1. Two of the participants were excluded from the +nal data analysis due to incorrect usage of the software
which may have corrupted the results.
2. It should be noted that these are post-tests in the sense that they are taken after the main part of the study
for the purposes of collecting data for concurrent validity; this study did not use a pre-test/post-test
design.
3. Interestingly, although the results could not predict partial knowledge, deeper analysis of the data showed
that Bricklayer was sensitive to this knowledge (Brumbaugh, 2015).
References
Anderson, R. C., & Freebody, P. (1983). Reading comprehension and the assessment and acquisition of word
kowledge. In B. Huston (Ed.), Advances in Reading/Language Research (Vol. 2, pp. 231–256). Greenwich, CT:
JAI Press.
Beatty, K. (2010). Teaching and researching computer-assisted language learning (2nd ed.). Harlow, England ; New York:
Longman.
Beeckmans, R., Eyckmans, J., Janssens, V., Dufranne, M., & Van de Velde, H. (2001). Examining the Yes/No
Vocabulary Test: Some Methodological Issues in Theory and Practice. Language Testing, 18(3), 235–274.
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model: Fundamental Measurement in the Human Sciences, Second
Edition (2nd ed.). Routledge.
Brumbaugh, H. (2015). Self-assigned ranking of L2 vocabulary: using the Bricklayer computer game to assess depth of word
knowledge ( D o c t o r a l d i s s e r t a t i o n , A r t s & S o c i a l S c i e n c e s : ) . R e t r i e v e d f r o m
http://summit.sfu.ca/item/15287.
Collins-Thompson, K., & Callan, J. (2007). Automatic and Human Scoring of Word De+nition Responses. In C.
L. Sidner, T. Schultz, M. Stone, & C. Zhai (Eds.), HLT-NAACL (pp. 476–483). The Association for
Computational Linguistics.
Crossley, S., Salsbury, T., & McNamara, D. (2010). The Development of Polysemy and Frequency Use in English
Second Language Speakers. Language Learning, 60(3), 573–605.
Dolch, E. W. (1927). Reading and word meanings,. Ginn and company.
Durso, F. T., & Shore, W. J. (1991). Partial knowledge of word meanings. Journal of Experimental Psychology: General,
120(2), 190–202. http://doi.org/10.1037/0096-3445.120.2.190
Eyckmans, J. (2004). Measuring receptive vocabulary size: reliability and validity of the yes/no vocabulary test for French-
speaking learners of Dutch. Utrecht: LOT.
Heift, T., & Schulze, M. (2003). Student Modeling and ab initio Language Learning. System, 31(4), 519–535.
Henriksen, B. (1999). Three Dimensions of Vocabulary Development. Studies in Second Language Acquisition, 21(2),
303–317.
Horst, M., & Meara, P. (1999). Test of a model for predicting second language lexical growth through reading.
Canadian Modern Language Review/La Revue Canadienne Des Langues Vivantes, 56(2), 308–328.
Laufer, B., & Goldstein, Z. (2004). Testing Vocabulary Knowledge: Size, Strength, and Computer Adaptiveness.
Language Learning, 54(3), 399–436.
Laufer, B., & Nation, I. S. P. (2001). Passive Vocabulary Size and Speed of Meaning Recognition: Are They
Related? EUROSLA Yearbook, 1, 7–28.
LeBlanc, R., & Painchaud, G. (1985). Self‐Assessment as a Second Language Placement Instrument. Tesol
Quarterly, 19(4), 673-687.
Masters, G. N. (1982). A Rasch Model for Partial Credit Scoring. Psychometrika, 47(2), 149–74.
Meara, P. (1990). Some notes on the Eurocentres Vocabulary Tests. In J. Tommola (Ed.), Vieraan kielen
ymmärtäminen ja tuottaminen (Foreign Language Comprehension and Production) (pp. 103–113). Turku: Suomen
Soveltavan Kielitieteen Yhdistys AFinLA.
Meara, P. (1997). Towards a new approach to modelling vocabulary acquisition. In N. Schmitt & M. McCarthy
(Eds.), Vocabulary: Description, Acquisition and Pedagogy (pp. 109–121). Cambridge, UK: Cambridge University
Press.
Meara, P. (2009). Connected Words: Word Associations and Second Language Vocabulary Acquisition. Amsterdam: John
Benjamins Pub. Co.
Meara, P., & Buxton, B. (1987). An Alternative to Multiple Choice Vocabulary Tests. Language Testing, 4(2), 142–
154.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed, pp. 13–103). Washington, D.C.:
American Council on Education.
Milton, J., & Vassiliu, P. (2000). Frequency and the lexis of low-level EFL texts. In Proceedings of the 13th Symposium
in Theoretical and Applied Linguistics, Aristotle University of Thessaloniki (pp. 444–55).
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). Design and Analysis in Task-Based Language
Assessment. Language Testing, 19(4), 477–496.
Mochida, A., & Harrington, M. (2006). The Yes/No test as a measure of receptive vocabulary knowledge.
Language Testing, 23(1), 73–98. http://doi.org/10.1191/0265532206lt321oa
Nagy, W. E., Herman, P. A., & Anderson, R. C. (1985). Learning Words from Context. Reading Research Quarterly,
20(2), 233–253.
Nation, I. S. P. (2013). Learning Vocabulary in Another Language (Second). Cambridge: Cambridge University Press.
Nation, I. S. P. (1983). Testing and teaching vocabulary. Guidelines, 5(1), 12–25.
Nation, I. S. P. (1990). Teaching and Learning Vocabulary. New York: Newbury House Publishers.
Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge University Press.
Nation, I. S. P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9–13.
Paribakht, T. S., & Wesche, M. (1997). Vocabulary enhancement activities and reading for meaning in second
language vocabulary acquisition. In J. Coady & T. N. Huckin (Eds.), Second Language Vocabulary Acquisition: A
Rationale for Pedagogy (pp. 175–200). Cambridge, U.K: Cambridge University Press.
Pellicer-Sánchez, A., & Schmitt, N. (2012). Scoring Yes–No vocabulary tests: Reaction time vs. nonword
approaches. Language Testing, 29(4), 489–509. http://doi.org/10.1177/0265532212438053
Qian, D. D. (2002). Investigating the Relationship between Vocabulary Knowledge and Academic Reading
Performance: An Assessment Perspective. Language Learning, 52(3), 513–536.
Read, J. (1993). The Development of a New Measure of L2 Vocabulary Knowledge. Language Testing, 10(3), 355–
371.
Read, J. (1995). Validating the word associates format as a measure of depth of vocabulary knowledge. In 17th
language testing research colloquium, Long Beach, CA.
Rosa, K. D., & Eskenazi, M. (2013). Self-Assessment in the REAP Tutor: Knowledge, Interest, Motivation, &
Learning. International Journal of ArtiDcial Intelligence in Education (IOS Press), 21(4), 237–253.
Richards, J. C. (1976). The Role of Vocabulary Teaching. TESOL Quarterly, 10(1), 77–89.
Schmitt, N. (1998). Tracking the Incremental Acquisition of Second Language Vocabulary: A Longitudinal
Study. Language Learning, 48(2), 281–317.
Schmitt, N. (2014), Size and Depth of Vocabulary Knowledge: What the Research Shows. Language Learning, 64:
913–951.
Schmitt, N., & Meara, P. (1997). Researching Vocabulary through a Word Knowledge Framework: Word
Associations and Verbal Suf+xes. Studies in Second Language Acquisition, 19(1), 17–36.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and Exploring the Behaviour of Two New Versions
of the Vocabulary Levels Test. Language Testing, 18(1), 55–88.
Shore, W. J., & Kempe, V. (1999). The Role of Sentence Context in Accessing Partial Knowledge of Word
Meanings. Journal of Psycholinguistic Research, 28(2), 145–163. http://doi.org/10.1023/A:1023258224980
Webb, S. (2005). Receptive and Productive Vocabulary Learning: The Effects of Reading and Writing on Word
Knowledge. Studies in Second Language Acquisition, 27(1), 33–52.
Wesche, M., & Paribakht, T. S. (1996). Assessing Second Language Vocabulary Knowledge: Depth versus
Breadth. The Canadian Modern Language Review/La Revue Canadienne Des Langues Vivantes, 53(1), 13–40.
Zimmerman, C. B. (1997). Historical trends in second language vocabulary instruction. In J. Coady & T. N.
Huckin (Eds.), Second Language Vocabulary Acquisition: A Rationale for Pedagogy (pp. 5–19). Cambridge, U.K:
Cambridge University Press.
Trude Heift is Professor of Linguistics in the Department of Linguistics at Simon Fraser University, Canada. Her
research focuses on the design and evaluation of CALL systems with a particular interest in learner-computer
interactions and learner language. She is co-editor of Language Learning & Technology.
Ahmed Masrai*
King Abdulaziz Military Academy, Saudi Arabia
James Milton
Swansea University, UK
Abstract
Research has shown that general vocabulary knowledge (e.g., Milton & Treffers-Daller, 2013), academic vocabulary
knowledge (e.g., Townsend et al., 2012) and general intelligence (e.g., Laidra et al., 2007) are good predictors of academic
achievement. While the effect of these factors has mostly been examined separately, Townsend et al. (2012) have tried to
model the contribution of general and academic vocabulary to academic achievement and ,nd academic vocabulary
knowledge adds only marginally to the predictive ability of general vocabulary knowledge. This study, therefore, examines
further factors as part of a more extensive predictive model of academic performance, including L1 vocabulary knowledge,
L2 general and academic vocabulary knowledge, and intelligence (IQ) as predictors of overall academic achievement among
learners of EFL. Performance on these measures was correlated with Grade Point Average (GPA) as a measure of academic
achievement for undergraduate Arabic L1 users (N = 96). The results show positive signi,cant correlations between all the
measures and academic achievement. However, academic vocabulary knowledge shows the strongest correlation (r = .72)
suggesting that the pedagogical use of this list remains important. To further explore the data, multiple regression and factor
analyses were performed. The results show that academic and general vocabulary knowledge combined can explain about
56% of the variance in students’ GPAs. The ,ndings, thus, suggest that, in addition to L1 and L2 vocabulary size, and IQ ,
knowledge of academic vocabulary is an important factor that explains an additional variance in learners’ academic
achievement.
Introduction
Academic achievement is crucial in impacting students’ future employability and the opportunity to obtain better
jobs. It is also a major concern for higher education institutions. Thus, research which taps into modelling the
potential factors that might in=uence student academic success is worthwhile. A number of studies have
investigated factors which are thought to in=uence students’ academic success in various contexts (e.g., Laidra,
Pullmann, & Allik, 2007; Milton & Treffer-Daller, 2013; Roche & Harrington, 2013; Townsend, Filippini,
Collins, & Biancarosa, 2012). Among the factors identi,ed as being associated with learners’ overall academic
performance have been intelligence, general L2 vocabulary size, L2 academic vocabulary knowledge, and ,rst
language (L1) vocabulary size. Despite the in=uence of these factors on academic success, there is a scarcity of
studies examining their effect on achievement with native Arabic learners in the Arab world, with the exception
of two studies by Roche and Harrington (2013) and Harington and Roche (2014) who studied the effect of
vocabulary knowledge on students’ academic success in Oman. Thus, this study is an attempt to explore the
effect of vocabulary knowledge, in L1 and L2, and intelligence on academic performance with learners from L1
Arabic context. There are currently many schools and universities in the Middle East which deliver their
programmes through the medium of English, and academic achievement is one of their main concerns. Thus,
this study was motivated by both a desire to expand our understanding of the predictors of academic
achievement in general and in the Arab world context in particular, and by the scarcity of research in L1 Arabic
users studying at higher education institutions through the medium of English in an environment where English
is not the primary language used outside the classroom.
Success when studying through a foreign language is likely to be in=uenced by a range of possible factors
and while we have some understanding of the factors through studies which investigate these individually,
examining multiple factors as part of an overall predictive model of academic performance is likely to be more
useful. Few studies have attempted to place these various factors, including vocabulary knowledge, into an overall
model for the prediction of academic success. This study, therefore, will consider incorporation of four
independent variables into a model in order to predict academic achievement of native Arabic speakers studying
through the medium of English in Saudi higher education institutions.
the scores on these measures to calculate the contribution to academic success that the two types of knowledge
can make. The contribution of scores from the two measures to academic success was calculated both
individually and combined. They conclude that academic vocabulary knowledge contributes unique variance to
achievement across disciplines even when the overall breadth of vocabulary knowledge is controlled. The
explanatory power of vocabulary size as a whole was larger than that of academic word knowledge, between
26% and 43% of variance according to discipline. However, academic word knowledge can still add an
additional 2% to 7%, depending on discipline, to this explanatory power. These ,ndings appear to suggest that
developing a reasonably large vocabulary is more effective for success but that knowledge of the AWL has some
additional and marginal in=uence on academic performance.
The ,ndings from Townsend et al.’s (2012) study is supported by results from Roche and Harrington
(2013), who attempt a variety of methodological changes to their test to understand better how vocabulary and
academic performance are linked. Their results for the impact of vocabulary size are similar to those of
Townsend et al. (2012) and in their study vocabulary size can explain about 25% of the variance in students’
GPAs.
The Study
The aim of this study is to model a number of factors as part of a predictive model of academic performance
among native Arabic speakers from an undergraduate population. These factors are general vocabulary
knowledge, academic vocabulary knowledge, L1 vocabulary knowledge, and general intelligence. To examine the
effectiveness and the predictive power of these factors, individually and combined, on measure of academic
achievement, four research questions were addressed:
1. What are the levels of correlation of general vocabulary, academic vocabulary, L1 vocabulary, and IQ
with GPA?
2. What is the contribution of each of these variables to academic achievement?
3. Can general vocabulary knowledge and academic vocabulary knowledge explain a unique variance in
academic achievement?
4. Can factor analysis allow us to identify whether the vocabulary based variables are identifying separate
factors which contribute to GPA?
Method
Participants
Participants in this study were 96 undergraduate students (aged 20-22 years) from two universities in Saudi
Arabia. The students were following degree courses in Languages and Translation. The two universities where
the participants were drawn implement a very similar programme in English language and translation, so at least
in part the input factor from language classroom is controlled. The participants in both institutions were
attending levels two, three and four in a four-year degree programme when the data collection for this study took
place. Informed consent was obtained from all participants. Also, as a monolingual Arabic vocabulary size test is
administered to the participants in the study, only native Arabic speakers were included in the study. The
participants’ involvement was voluntary.
Instruments
Four measures were used to collect the required data for the current study.
1. The ,rst was a general vocabulary size test (XK-Lex; Masrai & Milton, 2012), which was used to
measure the receptive vocabulary knowledge of the participants in the most frequent 10,000 words in
English. The XK-Lex is a yes/no test of decontextualised words sampled from the ,rst ten 1000
frequency bands in English and includes non-words to control for guesswork.
2. The second was a written receptive vocabulary knowledge test (Arabic-Lex; Masrai & Milton, 2017),
which was used to estimate the participants’ L1 (Arabic) vocabulary size. This test is similar in its
construct to the XK-Lex, but was designed to measure the knowledge of the most frequent 50,000
words in Arabic.
3. The third was a newly developed receptive academic vocabulary size test (AVST; Masrai & Milton,
forthcoming) used to assess students’ academic vocabulary knowledge of the 570 words from
Coxhead’s (2000) AWL. The test is similar in its design (frequency based test) to the English and Arabic
vocabulary size tests.
4. The ,nal tool was Raven’s Standard Progressive Matrices (SPM), a non-verbal IQ test (Raven’s
matrices, 1998). We chose this version because it was developed to measure a wider range of mental
ability and to be equally used with persons of all ages, regardless of their education, background and
physical condition (Raven, Raven, & Court, 1998). The test consists of 60 problems divided into ,ve
sets (A, B, C, D, and E), each includes 12 problems. All the testing materials were delivered in pencil
and paper format and were not timed. However, each of the three vocabulary measures should not
take longer than 10-15 minutes to complete. The non-verbal IQ , on the other hand, should take about
45 minutes to ,nish.
Yes/No tests have been reported in the literature as being suitable, reliable and valid measure of breadth
vocabulary knowledge (e.g., Harrington & Carey, 2009; Milton, 2009; Mochida & Harrington, 2006; Read,
2000). It allows for the sampling of a large number of items, and is easy and economical to administer and score.
The scoring system of the three Yes/No tests used in the current study is far from being complicated. Yes
responses to real words is calculated to represent a participant’s raw score, and yes responses to non-words are
false alarms. The false alarms result in a reduction in the participant’s total score. The scoring matrix of Yes/No
tests is presented in Table 1.
Table 1
Matrix of Possible Responses in XK-Lex, Arabic Lex, and AVST, Where UPPER CASE = Correct Responses
Word Non-word
Yes HIT False alarm
No Miss CORRECT REJECTION
Procedure
Participants were tested on two consecutive days at each institution to avoid testing fatigue. After instructions
were delivered, participants were ,rst presented with the two general vocabulary size measures (Arabic-Lex and
XK-Lex) followed by the academic vocabulary size test (AVST), which was administered after a short break. On
the second day, the non-verbal IQ test was delivered to the same participants. All the testing procedures were
performed with help of volunteer lecturers at each institution.
Results
Correlation Analysis
In this study four predictor variables of academic success were used (XK-Lex as a measure of general English
vocabulary size, Arabic-Lex as a measure of Arabic vocabulary size, AVST as a measure of English academic
vocabulary knowledge, and SPM as a measure of non-verbal IQ). The descriptive statistics of the four predictor
variables are shown in Table 2.
Table 2
Descriptive Statistics for Four Variables (IQ , Arabic-Lex, XK-Lex, and AVST)
N Minimum Maximum Mean Std. Deviation
IQ 96 12 39 28.25 5.55
Arabic_Lex 96 8500 43000 30843.75 7784.03
XK_Lex 96 1100 6400 3125.00 1310.40
AVST 96 20 470 171.48 86.62
Table 3
Correlation between Variables in The Study
IQ Arabic_Lex XK_Lex AVST GPA
IQ - .501** .340** .411** .469**
Arabic_Lex - .512** .446** .590**
XK_Lex - .782** .683**
AVST - .728**
GPA -
Note. ** = Correlation is signi,cant at the 0.01 level.
In order to examine research question 1 (i.e., the relationship between the four measures in our study and
academic achievement), correlational analysis was conducted between the observed scores from the four
measures and students’ GPA. The correlation of the four predictor variables with GPA is shown in Table 3. All
predictor variables appear to correlate signi,cantly with GPA. This indicates the validity of these measures since
they had all been identi,ed on theoretical grounds to be related to academic success, although most of the
previous studies have examined their relationship with academic success individually. XK-Lex and AVST scores
show the strongest correlation with GPA, followed by Arabic-Lex and IQ. Also, the strongest correlation between
the predictor variables is reported between XK-Lex and AVST (r = .782), which may indicate that a test of
academic vocabulary (AVST) resembles very strongly a test of overall vocabulary size (XK-Lex) as suggested in
Masrai and Milton (forthcoming). The correlation matrix reported in Table 3 provides a preliminary indication
of the effect size (ES) of each independent variable on the dependent variable (GPA). However, to examine the
ES in more depth, partial Eta Squared was calculated for each predictor variable.
Analysis of variance showed large ES of L2 general and academic vocabulary knowledge on academic
achievement (F(31, 64) = 6.79, p = .001, ηp2 = .77; F(35, 60) = 63.54, p = .001, ηp2 = .97, respectively). The other
two variables (Arabic-Lex scores and IQ scores) were also found to explain some levels of ES on academic
performance, but to a lesser extent (F(27, 68) = 5.75, p = .001, ηp2 = .69; F(18, 77) = 3.31, p = .001, ηp2 = .44,
respectively). Although correlation analysis and ES measures provide insight into how well independent variables
relate to the dependent variable, a more detailed analysis is needed to gain further understanding of the
predictive power of the different predictor variables. Thus, regression analysis was performed to calculate the
explanatory power of the variables individually and combined.
Regression Analysis
Since some levels of high inter-correlations were observed between the predictor variables, multicollinearity
diagnostics were preformed prior to reporting regression analysis. Result shows no indication for multicollinearity
(all values for tolerance were > .02 and all values for VIF were < 5).
To examine research question 2 (i.e., the predictive power of scores on XK-Lex, Arabic-Lex, AVST, and
IQ measures for students’ academic achievement measured by GPA), regression analysis was performed.
Table 4
Explained Variance in The Regression Model Predicting GPA with The Four Measures Combined
Model R R2 SE
a
1 .80 .64 .49
Note. a. Predictors: (Constant), IQ , XK_Lex, Arabic_Lex, AVST.
First, a multiple regression was carried out with GPA as the dependent variable and XK-Lex, Arabic-Lex,
AVST and IQ as independent variables, using Enter method. This led to a signi,cant model (F(4, 91) = 39.675, p
< .001) which explains about 64% of the variance in students’ GPA.
However, we are also interested in the individual contribution of each predictor variable towards the
predictive power of the regression model. To examine this, multiple regressions were carried out to compute the
effect of each variable individually. The models summary is reported in Table 5.
Table 5
Explained Variance in The Regression Model Predicting GPA with Each of The Four Measures
Predictor Model R R2 SE
XK_Lex 1 .68 .47 .58
Arabic_Lex 2 .59 .35 .64
AVST 3 .73 .53 .55
IQ 4 .47 .22 .71
As shown in Table 5, each variable can explain variance in students’ success. XK-lex and AVST scores
explain the greatest variance in students’ GPA, (R2 = .47 and .53, respectively). The other variables, Arabic-Lex
and IQ , also explain substantial amounts of variance in the students’ achievement (R2 = .35 and .22,
respectively).
To further examine the explanatory power, we carried out hierarchical regression with L2 vocabulary size
measures (XK-lex and AVST) in block 1, and L1 vocabulary size measure (Arabic-Lex) and IQ in block 2. Since
XK-lex and AVST are the best predictors there is a danger the contribution of the less well correlated predictors,
Arabic-Lex and IQ , will be lost in a combined model. Dividing the factors this way allows the contribution of
these other, less well correlated variables, factors to the model, to be estimated. The result is shown in Table 6.
Table 6
Hierarchical Regression Models
Change Statistics
2 2
Model R R SE R Change F Change df1 df2 Sig. F Change
a
1 .75 .56 .53 .56 59.92 2 93 .000
b
2 .80 .64 .49 .07 9.05 2 91 .000
Note. a. Predictors: (Constant), AVST, XK_Lex; b. Predictors: (Constant), AVST, XK_Lex, IQ , Arabic_Lex.
The variables in block 1 produce a signi,cant model (F(2, 93) = 59.92, p < 001) which predicts about 56%
of the variance in GPA and this is substantial. The other variables in block 2, however, can still be shown to
contribute marginally to the predictive power of the regression model. The addition to R2 is still signi,cant (F(2,
91) = 9.05, p < 001) and these two factors appear to explain an additional 7% of the variance in GPA. These
results indicate that when general L2 vocabulary knowledge (measured with XK-Lex) and L2 academic
vocabulary knowledge (measured with AVST) are combined they can have a very strong positive effect on
learners’ performance when studying through the medium of English but that the predictive power of other
factors can still improve on this result.
To provide an answer to research question 3 (i.e., whether L2 general vocabulary knowledge can explain a
unique variance in academic success) we had to control ,rst for the scores from AVST, the academic vocabulary
knowledge test, as these two variables were highly correlated. Interpretation of this strong correlation will be
provided in the discussion section of the study. However, to measure if a unique predictive power can be
explained by L2 general vocabulary per se, a stepwise regression model was generated including the R2 change,
but with AVST scores removed from the model. The result is summarised in Table 7.
Table 7
Predictive Power of General L2 Vocabulary When Academic Vocabulary Is Controlled for
Change Statistics
2 2
Model R R SE R Change F Change df1 df2 Sig. F Change
a
1 .68 .47 .58 .47 82.35 1 94 .000
b
2 .74 .55 .54 .08 16.07 1 93 .000
c
3 .75 .57 .53 .02 4.89 1 92 .030
Note. a. Predictors: (Constant), XK_Lex; b. XK_Lex, Arabic_Lex; c. XK_Lex, Arabic_Lex, IQ.
The result in Table 7 shows a signi,cant unique contribution of L2 general vocabulary knowledge in
explaining academic success. The R2 of .47, explaining about 47% of variance, therefore, has already been
shown in Table 5. But the two other factors are able to enhance this and, combined, add a further 10% to the
explanation of variance in GPA scores.
Factor Analysis
Factor analysis was run in an attempt to provide an answer to research question 4 (i.e., examining whether
different factors can be discerned in the four sets of results). The factor analysis results are summarised in the
Scree plot in Figure 1 and the component matrix in Table 8.
Table 8
Component Matrix from The Four Sets of Data
Component
1
XK_Lex .855
AVST .854
Arabic_Lex .767
IQ .679
Note. Extraction Method: Principal Component Analysis; a = 1 Components Extracted.
There appears to be only one component extracted with an Eigen value above 1 and it is concluded that
the four variables examined in this study are measuring the same construct.
Discussion
In this study, the contribution of four variables were investigated to assess their impact on students’ academic
performance measured in GPA. These variables were L2 general vocabulary knowledge, L2 academic
vocabulary knowledge, L1 vocabulary knowledge, and non-verbal IQ. While the predictive power of these
variables on academic achievement is widely reported in the literature, we investigated their power to predict
academic performance among Arabic university students incorporating the four variables in one experimental
setting. The study also aimed at ,nding out whether including academic vocabulary knowledge among other
factors will explain a unique variance and remain the greatest contributing factor towards students’ academic
success.
Research Question 1: The Relationship between the Four Measures in This Study
In answer to research question 1, all the measures show statistically signi,cant correlations with students’
academic performance, as measured by GPA (see Table 2). This ,nding is broadly in line with what is reported in
the literature (e.g., Alderson 2005; Milton & Treffers-Daller, 2013; Laidra et al., 2007; Townsend et al., 2012).
The strongest correlation (r = .728) is between L2 academic vocabulary knowledge and GPA. The correlation
between L2 general vocabulary knowledge and GPA is moderate to strong ( r = .683). The other two factors, L1
vocabulary knowledge, and non-verbal IQ , also display moderate correlations with GPA, which are less strong
than the two L2 vocabulary knowledge factors. Since all four test variables correlate moderately to strongly with
GPA, it should not be a surprise that they also correlate moderately with each other as is shown in Table 3.
There is a particularly strong correlation between L2 general vocabulary knowledge and L2 academic
vocabulary knowledge (r = .782). The way that the L2 academic vocabulary knowledge test is likely also to test
general vocabulary knowledge has already been suggested in the second sub-section of the literature review. The
L2 academic vocabulary knowledge test is based on the AWL and these words occur in general frequency lists
spread across the most frequent bands. Good correlations should therefore be expected between any test based
on the AWL and any well-formed general vocabulary size test, as is noted in Masrai and Milton (forthcoming).
Although the four measures show signi,cant correlations with GPA, multiple regression analyses were required to
quantify the effect of each measure on academic performance.
A multiple regression, Table 4, indicates that the four variables combined can explain 64% of the variance
in GPA scores. This combined result is greater than the two factor model investigated in Townsend et al. (2012),
which examined only L2 language knowledge factors. This suggests that the variables examined in this study
would be particularly useful in a practical setting where, for example, university and school teachers need to
anticipate which of their students are at risk of low academic performance and are in need to support in their
academic studies.
The further regression analyses carried out in this study are designed to examine the way the variables
interact with each other, to better understand in what proportions these variables combine in their interactions
with GPA. The strongest predictors among the four variables are the L2 vocabulary factors, academic vocabulary
knowledge and general vocabulary size. It has been indicated above that these two tests may be testing a single
factor and the hierarchical regression reported in Table 6 has therefore been carried out to separate out the L2
language factor from the potential contribution of the other variables in gaining good GPA scores. The results
suggest that L1 Arabic vocabulary size and IQ combined can add slightly more than 7% to the predictiveness of
the L2 language factor. The 56% of variance explained by general and academic vocabulary knowledge rises to
64% once IQ and L1 vocabulary size are added in (note there is some rounding of numbers in Table 6). The
regression analysis summarised in Table 7 separates out the contribution of IQ and L1 vocabulary size and for
this analysis scores from the L2 academic vocabulary size variable have been omitted because of their co-
linearity with general vocabulary size, and the better to examine the effect of the other factors. The results in
Table 7 suggest that both IQ and L1 vocabulary can make separate and unique contributions to the predictive
ability of the model and with this combination of variables, L1 vocabulary size appears to add some 8% to the,
47% of variance explained by L2 general vocabulary size. IQ appears to add a further 2%. This last ,gure need
not contradict the suggestion of Rohde & Thompson (2007) where between about 25% and 50% of variance in
academic achievement can be explained by IQ alone since studies in the effect of IQ rarely include in their
models the powerful effects of L2 vocabulary as measured with the sophistication of the most recent L2 tests.
Research Question 3: The Relationship between General and Academic Vocabulary Knowledge
The co-linearity of L2 vocabulary factors, L2 general vocabulary size and L2 academic vocabulary knowledge,
has been noted above and has raised the question whether academic vocabulary knowledge is capable of making
a unique contribution to the variance in GPA scores, over and above the impact of general vocabulary size. In
this study, as distinct from the Townsend et al.’s (2012), it is the L2 academic vocabulary knowledge test which is
the best individual predictor among the four variables, slightly better than L2 general vocabulary size. The
difference between the results might largely be attributed to the academic word measures used in both studies.
Townsend et al. (2012) used the academic part of the revised version of Vocabulary Levels Test (VLT) (Schmitt,
Schmitt, & Clapham, 2001) which includes only 30 words sample of the 570 AWL (Coxhead, 2000). The low
sampling rate and also the problematic sampling technique (see Schmitt et al., 2001) of this part of the VLT
might explain, in part, why the predictive power of the test scores is lower than for their general vocabulary
measure. On the other hand, the test used in the current study (AVST) is thought to produce more credible
scores, as it features a high sampling rate (1:5) and is based on frequency selection of its items (Masrai & Milton,
forthcoming).
Nonetheless, this result mirrors the ,ndings reported in Masrai and Milton (forthcoming). This suggests
that while the results produced by the L2 academic vocabulary knowledge test must include L2 general
vocabulary size (its construction using words drawn from across the most frequent general vocabulary bands
means it cannot avoid this), the two types of knowledge can nonetheless still be differentiated. Our best
interpretation of the data is that L2 general vocabulary size is crucial to academic achievement and that
academic vocabulary knowledge will add marginally to this. It appears that knowledge of the AWL speci,cally
can add an additional 7% to L2 general vocabulary knowledge in explaining variance in GPA. This conclusion is
strikingly similar to the results obtained in the studies by Townsend et al. (2012) and Masrai and Milton
(forthcoming) and similar too to other studies (e.g., Harington & Roche, 2014; Milton & Treffers-Daller, 2013;
Saville-Troike, 1984).
Research Question 4. How Many Separate Factors Can Be Identi1ed in These Variables?
One argument used in Masrai and Milton (forthcoming) to suggest that a test based on the AWL is likely to
function also as a general vocabulary size test, is that when scores for the two different tests were subjected to
factor analysis, only one component could be identi,ed leading to the conclusion that they were testing the same
construct. Figure 1 and Table 8 report the results of factor analysis with the four sets of data obtained in this
study. In line with the earlier study, the results here also suggest that the two L2 factors, L2 general vocabulary
size and L2 academic knowledge, are part of the same component. But the results in Figure 1 and Table 8 also
suggest that the other two variables investigated in the study, IQ and L1 vocabulary size, are included in the same
component and are also, in some way, measuring the same construct.
Perhaps it should not be a surprise if all three of the language related variables form part of the same
component. L1 and L2 vocabulary size have been demonstrated to correlate closely among native Arabic
speakers who use English as a foreign language (e.g., Masrai, 2015). But there are suggestions at the level of
theory too, for example Cummins’ Common Underlying Pro,ciency ideas (Cummins, 2000), that L1 and L2
vocabulary size should be related. There may be a general language ability factor at play here. However, it is not
so clear why IQ scores should form part of the same factor. The tests used in this study have been deliberately
chosen to be non-verbal assessments with the intention that this would avoid potential interference from language
knowledge and ability. The tests used are abstract reasoning tasks which involve completing a pattern or ,gure
with a part missing, by choosing the correct missing piece from among six alternatives. However, there is some
evidence that these types of reasoning task can function well in predicting language learning aptitude in young
children (Milton & Alexiou, 2006), and may pick up on the ability to infer rules and structures in language. It
must be noted that all four variables correlate quite strongly with each other and there is a long-standing
tradition that a wide variety of variables can all fall under a single general intelligence factor as in Spearman’s G
factor (Spearman, 1927). Nonetheless, this idea that the four variables may all be part of a single factor need not
detract from the evidence of the regression analyses which suggests the four variables investigated here interact
with academic performance as measured by GPA in slightly different ways and that a unique contribution to
GPA for each of them can be found.
Conclusions
The attempt to use several factors to predict and explain academic performance has produced results which are
very encouraging. The combined model of four variables in this study can predict nearly two-thirds of variance
in academic performance as measured by GPA, stronger than any individual factor. This suggests greater
predictiveness than most other studies even where several factors are combined in a predictive model (e.g.
Townsend et al, 2012; Daller & Yixin, 2016; and Roche & Harrington, 2013). This may be the result of the
particular circumstances of the learners and staff who provided the marks for GPA, involved in this study. The
bulk of the explanatory power is provided by L2 knowledge factors but the regression analyses suggest it is
possible to identify a unique, if sometimes marginal, contribution to variance in GPA scores for all the factors
investigated here.
The L2 general vocabulary and L2 academic vocabulary scores are strongly correlated and it is dif,cult to
decide how independently these two variables function. Our best interpretation of the results is to con,rm
Townsend et al.’s (2012) conclusion that knowledge of academic words provides some unique, albeit marginal,
variance to general academic success as measured by GPA, in addition to general vocabulary size. A focus on the
AWL in teaching, within this interpretation, appears a useful element of any English for academic purposes
course provided it is implemented within the context of an overall programme of vocabulary development for
learners to reach the size of lexicon necessary for =uent language use.
The factor analysis suggests all the variables here are closely related, and here our best interpretation is
that there may be a general language ability factor at play which is linked to other factors, identi,ed in other
studies, like IQ. Even though these factors appear closely related, the use of multiple tests in combination appears
to have potential for identifying learners at risk of academic failure. It may be possible to provide language
support for students at risk. The prominence of L2 vocabulary knowledge in predicting academic success
suggests that a wider use of vocabulary size tests speci,cally, in the acceptance process for learners at school or
university, could help improve the selection process and ensure those entering education and studying through
the medium of English as a foreign language have the skills to succeed academically.
While these results are encouraging, it must also be noted that this is a single study, drawing learners from a
homogenous L1 Arabic speaking background, with results drawn from two institutions in Saudi Arabia. Further
research is needed with larger samples, learners from different L1s, and including groups from different
disciplines, to con,rm the idea that combinations of factors can usefully predict students’ academic attainment.
References
Alderson, J.C. (2005). Diagnosing foreign language pro<ciency: The interface between learning and assessment. London:
Bloomsbury.
Astika, G. (1993). Analytical assessments of foreign students’ writing. RELC Journal, 24, 61–70.
Beglar, D., & Hunt, A. (1999). Revising and validating the 2000 word level and university word level vocabulary
tests. Language Testing, 16(2), 131–162.
Chen, Q., & Ge, C. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word
families in medical research articles. English for Speci<c Purposes, 26, 502–514.
Chung, M., & Nation, P. (2003). Technical vocabulary in specialized texts. Reading in a Foreign Language, 15, 103–
116.
Cobb, T., & Horst, M. (2004). Is there room for an AWL in French? In B. Laufer & P. Bogaards (Eds.), Vocabulary
in a second language: Selection, acquisition, and testing (pp. 15-38). Amsterdam: John Benjamins.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238.
Cummins, J. (2000). Language, power and pedagogy: bilingual children in the cross<re. Clevedon: Multilingual Matters.
Daller, M. H. & Phelan, D. (2013). Predicting international student study success. Applied Linguistics Review, 4(1).
173–193.
Daller, M., & Yixin, W. (2016). Predicting study success of international students. Applied Linguistics Review, (ahead
of print).
Harrington, M., & Carey, M. (2009). The online yes/no test as a placement tool. System, 37(4), 614−626.
Harrington, M. & Roche, T. (2014). Identifying academically at-risk students in an English-as-a-Lingua-Franca
university setting. Journal of English for Academic Purposes, 15, 37–47.
Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.
Laidra, K., Pullmann, H., & Allik, J. (2007). Personality and intelligence as predictors of academic achievement:
A cross-sectional study from elementary to secondary school. Personality and Individual Differences, 42(3),
441-451.
Laufer, B. (1992). How much lexis is necessary for reading comprehension? In H. Bejoint & P. Arnaud (Eds.),
Vocabulary and applied linguistics (pp. 126-132). London: Macmillan.
Laufer, B. (1998). The development of passive and active vocabulary in a second language: Same or different?
Applied Linguistics, 19(2), 255-271.
Laufer, B. & Ravenhorst-Kalovski, G. (2010). Lexical threshold revisited: Lexical text coverage, learners’
vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15-30.
Lesaux, N., Kieffer, M., Faller, S. E., & Kelley, J. G. (2010). The effectiveness and ease of implementation of an
academic vocabulary intervention for linguistically diverse students in urban middle schools. Reading
Research Quarterly, 45, 196–228.
Masrai, A. (2015). Investigating and explaining the relationship between L1 mental lexicon size and organisation and L2
vocabulary development. Unpublished PhD thesis. Swansea University, UK.
Masrai, A., & Milton, J. (2012). The vocabulary knowledge of university students in Saudi Arabia. TESOL Arabia
Perspectives, 19(3), 13-20.
Masrai, A., & Milton, J. (2017). How many words do you need to speak Arabic? An Arabic vocabulary size test.
Language learning Journal, (ahead of print).
Masrai, A., & Milton, J. (forthcoming). Measuring the contribution of academic and general vocabulary
knowledge to learners’ academic achievement. Journal of English for Academic Purposes.
Masrai, A., & Milton, J. (forthcoming). Frequency distribution of the words in AWL in BNC and BNC/COCA.
Milton, J. (2009). Measuring second language vocabulary acquisition. Bristol, UK: Multilingual Matters.
Milton, J. (2013). Measuring the contribution of vocabulary knowledge to pro,ciency in the four skills. In C.
Bardel, C. Lindqvist & B. Laufer (Eds.), L2 vocabulary acquisition, knowledge and use: New perspectives on
assessment and corpus analysis (pp. 57-78): EUROSLA monograph 2.
Milton, J., & Alexiou, T. (2006). What makes a good young language learner? In A. Kavadia, M. Joanopoulou &
A. Tsaggalidis (Eds.), New directions in applied linguistics (636-646). Thessaloniki, Greece: Aristotle University
of Thessaloniki.
Milton, J. and Fitzpatrick, T. (eds.). (2014). Dimensions of vocabulary knowledge. Basingstoke, UK: Palgrave
Macmillan.
Milton, J., & Treffers-Daller, J. (2013). Vocabulary size revisited: The link between vocabulary size and academic
achievement. Applied Linguistics Review, 4(1), 151–172.
Milton, J., Wade, J., & Hopkins, N. (2010). Aural word recognition and oral competence in a foreign language. In
R. Chacón-Beltrán, C. Abello-Contesse & M. Torreblanca-López (Eds.), Further insights into non-native
vocabulary teaching and learning (pp. 83-98). Bristol: Multilingual Matters.
Mochida, A., & Harrington, M. (2006). The yes/no test as a measure of receptive vocabulary knowledge.
Language Testing, 23(1), 73–98.
Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Speci<c Purposes, 25,
235–256.
Nagy, W., & Townsend, D. (2012). Words as tools: Learning academic vocabulary as language acquisition.
Reading Research Quarterly, 47(1), 91-108.
Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Nation, I.S.P. (2004). A study of the most frequent word families in the British National Corpus. In Bogaards P.
and Laufer B. (Eds), Vocabulary in a second language: Selection, acquisition and testing (pp. 3-13). Amsterdam:
John Benjamins.
Nation, I.S.P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language
Review, 63(1), 59–81.
Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, A. W., Brody, N., Cesi, S. J., et al. (1996). Intelligence: Knowns
and unknowns. American Psychologist, 51, 77–101.
Qian, D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension.
Canadian Modern Language Review, 56, 282–308.
Raven, J., Raven, J. C., & Court, J. H. (1998). Manual for Raven's Progressive Matrices and vocabulary scales:
Section 4 Advanced Progressive Matrices sets I and II, 1998 ed. Oxford, UK: Oxford Psychologists Press
Ltd.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Roche, T., & Harrington, M. (2013). Recognition vocabulary knowledge as a predictor of academic performance
in an English as a foreign language setting. Language Testing in Asia, 3(1), 1-13.
Rohde, T. E., & Thompson, L. A. (2007). Predicting academic achievement with cognitive ability. Intelligence,
35(1), 83-92.
Saville-Troike, M. (1984). What really matters in second language learning for academic achievement? TESOL
Quarterly, 18(2), 199–219.
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching.
Language Teaching, 47(4), 484-503.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of
the Vocabulary Levels Test. Language Testing, 18(1), 55–88.
Spearman, C. (1927). The abilities of man. Oxford: Macmillan
Stæhr, L. S. (2008). Vocabulary size and the skills of listening, reading and writing. Language Learning Journal, 36(2),
139–152.
Townsend, D., Filippini, A., Collins, P., & Biancarosa, G. (2012). Evidence for the importance of academic word
knowledge for the academic achievement of diverse middle school students. The Elementary School Journal,
112(3), 497-518.
West, M. (1953). A general service list of English words. London: Longman.
Zimmerman, K. (2004). The role of vocabulary size in assessing second language pro<ciency. Unpublished MA thesis.
Brigham Young University.
James Milton is Professor of Applied Linguistics at Swansea University, UK. He worked in Nigeria and in Libya
before coming to Swansea in 1985. A long-term interest in measuring lexical breadth and establishing normative
data for learning has produced extensive publications including Measuring Second Language Vocabulary Acquisition
(Multilingual Matters, 2009).
Shadan Roghani
Swansea University, UK
James Milton*
Swansea University, UK
Abstract
This paper reports an investigation into whether a test of productive vocabulary size using a category generation task can be
useful and effective. A category generation task is a simple task where learners are asked to name as many as words as they
can from a prescribed category such as animals or body parts. The virtue of this approach is that it potentially allows an
estimate of productive vocabulary size, comparable to receptive size estimates, to be made. Four such tasks were trialled on
92 learners ranging from elementary to advanced level. Subjects also took Nation’s Productive Vocabulary Levels Test
(PVLT) (2001) and Meara & Milton’s X-Lex (2003). The results suggest that category generation tasks can produce
vocabulary size estimates and these are comparable in size with PVLT and about one third of the size of a receptive
vocabulary size estimate (X-Lex). The tests appeared very reliable and can distinguish between learners of different levels of
performance. There are still issues to be resolved concerning the tasks which can be used and the volumes of vocabulary
they can potentially obtain. Factor analysis suggests the receptive and all the productive tasks test a single factor.
Key words: productive vocabulary, vocabulary size, category generation task, vocabulary assessment, frequency
vocabulary bands
Introduction
The acquisition of vocabulary knowledge, that is growing a lexicon of an appropriate size and quality, is crucial
to language learning success. Since it is an aspect of language knowledge which is so important, it would make
sense to measure and monitor its development among learners, and where this is done it appears that
measurements of knowledge can be very useful. So, for example, estimates of vocabulary size correlate well with
performance in all the language skills and in formal exams (e.g. Stæhr, 2008; Milton et al, 2010). Learners with
larger vocabularies tend to perform better than those with smaller vocabulary knowledge in these activities.
Approximate vocabulary sizes have been identi6ed as requirements for passing formal exams such as Cambridge
FCE in English, and have been linked to hierarchies of communicative levels as in the CEFR (Milton, 2010;
Milton & Alexiou, 2009). Because vocabulary is so important perhaps it is not surprising that students identify
shortcomings in their L2 vocabulary knowledge as a principle obstacle to comprehension (Laufer, 1989). The
importance of vocabulary is such that Long & Richards (2007, p.xii) suggest that ‘vocabulary can be viewed as
the core component of all the language skills.’
*Tel: (44) 1792 295678; E-mail: j.l.milton@swansea.ac.uk; Department of Applied Linguistics, Swansea University, Singleton
Park, Swansea SA2 8PP, UK.
While clearly insightful, there is a feeling among academics that making this kind of measurement can be a
complicated business. Vocabulary knowledge, it seems, is multifaceted. It can include knowledge of both the
written and oral forms of words. It includes possessing a link between a word form and its meaning including the
associations which a word can carry and which can vary from one language to another. It can include a
knowledge of how words can combine into collocations and idioms, and a knowledge too of when, and when
not, to use some of these words. Vocabulary researchers usually make a distinction between vocabulary size or
breadth, the number of words a learner knows, and vocabulary depth, how well these words are known and how
well and idiomatically they can be used. It can also include making a distinction between receptive and
productive knowledge, an observation that goes back at least as far as Palmer (1921). Palmer identi6ed a
difference between the words learners can recognise, what is called today receptive or passive vocabulary, and the
words a learner can use and communicate with, a sub-set of the receptively known words which learners can
readily call to mind for use in speech and writing and referred to today as productive or active vocabulary. He
suggested these different types of word knowledge should be assessed separately. Different tasks, it seems, appear
to activate different kinds of vocabulary knowledge (Webb, 2005), and different kinds of vocabulary knowledge
can impact on the different language skills. For example, Milton & Riordan (2006) observed that knowledge of
words in their oral form can be measured separately from word knowledge in written form and that oral word
recognition predicts success in speaking tests while the ability to recognise words in their written form does not.
These different dimensions and features of vocabulary knowledge cannot, of course, be entirely unrelated.
Possession of a large receptive vocabulary is a precondition of having a large productive vocabulary, for example,
and the various dimensions of knowledge generally correlate quite well with each other as is noted by Fitzpatrick
& Milton (2014). There are even arguments that suggest they can be collapsed into a single dimension of
vocabulary knowledge. Vermeer (2001) argues that breadth and depth are essentially the same construct. Meara
(1997) argues that automaticity in word production is a product of the number of links between words, a product
of depth therefore. Fitzpatrick & Milton (2014, p.177) in considering the strength of the inter-relationship
between the elements of vocabulary knowledge speculate that it may be possible, ‘ through frequency (Ellis,
2002a; 2002b) to explain the driver behind all the aspects of knowledge in [Nation’s] table.’ Nonetheless,
multiple testing of vocabulary knowledge is often advocated so that a learner’s knowledge can be more fully
characterised (e.g. Nation, 2007; Richards & Malvern, 2007). While there seems general agreement that using
multiple tests is desirable it is not clear that this is actually done outside the realm of specialist researchers.
Perhaps this is because the standard tests of vocabulary are relatively few and are limited, largely, to testing
receptive vocabulary breadth. This paper is particularly concerned with assessing the potential for a test which
measures productive knowledge; how many words do learners have that they can easily activate and use for
communication, in the hope that this will make the process of multiple testing more practical.
There are several well recognised tests is the area of receptive vocabulary size, but well-established tests are
lacking in other areas of vocabulary knowledge such as productive vocabulary knowledge. Receptive vocabulary
size, or breadth, testing attempts to estimate how many words in the foreign language a learner can recognise,
and this type of testing is usually distinguished from vocabulary depth testing which attempts to assess how well
these words are known and whether they can be used appropriately Receptive breadth tests have the advantage
in their creation that the writer can control the items being tested and make a principled selection of words from
which a good estimate of knowledge can be made. Both Nation’s Vocabulary Size Test (VST) (2012) and Meara
& Milton’s X-Lex (2003) work in this way and sample words across the frequency bands and this is used to form
an estimate of vocabulary size. These tests also have the advantage that they do not have to be customised to the
6rst language of the learners and can be quick to deliver and are easy to mark. Nation’s VST uses a multiple
choice format where the learners select a meaning for a test word from a choice of four explanations and where
the explanations are ‘in much easier language than the tested word’ (Nation, 2012, p.3). The checklist format in
Meara & Milton’s X-Lex is particularly minimalist requiring only that the testee identi6es words that they
recognise in a list, and the computer version of this takes only a few minutes to deliver and marks itself. With
both tests it appears relatively straightforward to produce parallel forms of the tests and the different forms are
reported to be equivalent (Nation, 2012; David, 2008).
However, even these tests have their drawbacks. Nation (2012) reports that VST may under-estimate where
learners are not motivated to perform on the test, but this could be said of any form of assessment. A more
serious consideration is the potential for the test to over-estimate where learners are prepared to use guesswork to
provide answers to words they do not know. The multiple choice format means that there is a one in four chance
of getting the right answer by guesswork and there appears to be no mechanism for recognising where this is
occurring and adjusting for it when it does occur. X-Lex does have such a mechanism and includes false words,
and, where the testee identi6es these as known words, an arithmetic formula is applied and the score is reduced.
But X-Lex’s simple checklist method is also prey to potential problems especially in terms of dealing with
learners’ uncertainty over their knowledge of a word. This form of test takes no account of partial or incomplete
knowledge, and low level learners in particular are often unsure over things like spelling and may not, therefore,
be able to represent the knowledge that they have. Nonetheless, both tests are reported to be robust and reliable.
In an ideal world a test of productive vocabulary knowledge would have the good qualities of the receptive tests
and would be easy to use and capable of accessing a suf6cient and principled sample of the learner’s vocabulary
from which to form a good estimate of size. Ideally it should be able to demonstrate good reliability so test and
retest scores, for example, should not differ signi6cantly if there is no change in the vocabulary knowledge being
tested. It should be able to demonstrate the same kinds of construct validity that receptive tests have, as in the
ability to draw on a principled sample of words from across the frequency bands so that a good estimate of size
can be made. It should possess good concurrent validity and correlate appropriately with other scores of the
same or similar quality. So, a good productive test, if it is working well, should correlate with other tests of
productive vocabulary size and should probably correlate too, although perhaps less well, with receptive
vocabulary size which is generally considered a different although related construct.
Well recognised productive tests are harder to 6nd than receptive tests. This may be because in many
productive tasks, the choice of words is that of the testee and this may prevent a useful sample of words being
created from which meaningful conclusions about vocabulary size or knowledge can be drawn. Thus, measures
of lexical diversity and sophistication (e.g. Meara & Bell, 2001, P-Lex) appear sensitive to genre (van Hout &
Vermeer, 2007) so the scores they produce may say more about the nature of the text rather than the lexicon
which produced it. These measures are also sensitive to length and a minimal length, usually several hundred
words, is needed before stable results are achieved (e.g. Meara & Bell, 2001). These approaches do not generally
produce an estimate of size but the exception to this is Meara & Miralpeix’s V-Size (2008) which analyses a
testee’s text and calculates the proportions of vocabulary occurring in 6ve frequency bands to produce a curve.
This curve can then be compared with curves from other texts where the size of the writer’s lexicon is known and
an estimate of the testee’s lexical size can be made. Meara & Miralpeix’s initial conclusions are that this
approach is not sensitive to genre or to the length of text and that it can discriminate between learners of
different ability levels. The idea is an interesting one but our experience is that the scores it produces are rather
erratic and more work is probably needed to demonstrate the reliability of this approach.
Other approaches to productive vocabulary testing use controlled methods for eliciting knowledge. Laufer
& Nation’s Productive Vocabulary Levels Test (PVLT) (1999) takes a sample of words from the second, third, 6fth
and tenth 1000 word frequency ranges, and from the university word list as the target vocabulary for their test.
Students are presented with a sentence giving context with the target word missing from the context, although
the initial letters of the target are provided. Testees 6ll in the missing word. This approach has the considerable
merit that its sample of words is directly equivalent to Nation’s receptive Vocabulary Levels Test (Nation, 2001)
and so productive and receptive scores ought to be directly comparable. The approach has been criticised,
however, in that the degree of contextualisation may be so great that it becomes a receptive test in another form
(Webb, 2008). This strikes at the heart of the issue in the creation of a test of productive vocabulary knowledge.
Productive performance requires some kind of prompt and there is no agreed construct of productive knowledge
to guide us as to how rich or minimal in contextualisation such a prompt should be. Webb (2008) considers a less
rich context in testing, therefore and suggests the merit of a translation test where the testees are presented with a
prompt in their native language to elicit a translation into the foreign language target word. The approach is a
simple one which ought to allow the test writer to make the kind of sample of knowledge that an estimate of
vocabulary size could be drawn from. In terms of practicality, however, this approach will not be so
straightforward in, say, a class of learners from many different 6rst language backgrounds and where multiple
different forms of the test will be needed. It seems there is still the opportunity for a convincing methodology to
emerge in this area to produce meaningful and useful estimates of productive vocabulary size which, like the
receptive tests described above, are simple enough to be used by learners from all language backgrounds and
with a simple enough prompt to avoid replicating a receptive test in another form.
The research presented in this paper aims to access and measure productive vocabulary size using a new
test format to see if category generation tasks can be a useful addition to testing in this area.
Research Questions
The intention in this study is to use four category generation tasks with EFL learners and to use the words that
testees produce to calculate estimates of productive vocabulary size which might be seen as equivalent to the
receptive vocabulary size estimates produced by X-Lex. The broad aim, therefore, is to examine whether these
estimates can be fairly described as believable, reliable and valid. Do category generation tasks have potential as
useful measures of vocabulary knowledge?
To achieve this broad aim we have set a number of speci6c research questions.
1. Is there a frequency effect in learning to suggest that a test targeted on the 6rst 6ve 1000 words bands is
appropriate in a productive test?
2. Does the test produce suf6cient data for estimates of size to be made?
3. Do the scores from parallel forms of the test suggest that the test is reliable? Do they produce estimates
which are similar in size and which correlate?
4. Are the scores comparable with other equivalent tests of vocabulary size and knowledge: Laufer &
Nation’s PVLT (1999) and Meara & Milton’s X-Lex (2003)?
5. Are estimates on the test capable of distinguishing between learners at different levels of knowledge and
performance: beginner, intermediate and advanced levels?
6. Do these tests and PVLT access a single factor of knowledge, productive vocabulary size, and can this be
distinguished from a receptive vocabulary size measure X-Lex?
Method
Participants
A total of 92 EFL learners were tested in a foreign language teaching institute in Iran. The learners came from
three difference levels of knowledge: basic, intermediate and advanced levels as categorised by the institute. The
92 learners comprised 43 male and 49 female participants, were aged between 15 and 40, and were distributed
among the three levels as shown in Table 1.
Table 1
Participant Levels
Level Basic Intermediate Advanced Total
Number 36 23 33 92
The Tests
Four category generation tasks were used: animals, clothes, body parts and furniture. These categories are described by
Izura et al (2005, p.386) as ‘commonly used in cognitive, neuropsychological and linguistic research’ and which
proved capable of prompting considerable language output from the participants.
Laufer & Nation’s Productive Levels Test version C (Nation, 2001, p.425-428) was used as a second test of
productive vocabulary knowledge. Scores from versions of Nation’s VLT are widely used as a proxy for
vocabulary size (e.g. Stæhr, 2008). The entire test was not administered and only the 2,000, 3,000 and 5,000
levels were used. This was converted to a productive vocabulary size estimate out of 5,000 using the formula:
2000 level score * 2000 + 3000 level score * 1000 + 5000 level score * 2000 = size
18 18 18
A paper version of Meara & Milton’s X-Lex (2003) was used as a second measure of vocabulary
knowledge. This version tests 20 words in each sample across each of the 6ve most frequent 1000 word bands
taken from Hindmarsh (1980) and Nation (1984). The test contains a further 20 false words. Testees are required
to indicate if they know each of these words. Yes responses to the false words are taken to indicate that the testee
is over-estimating their knowledge and the score drawn from the Yes responses to the real words is adjusted
downwards accordingly.
Procedure
The participants took the tests in class in the order: X-Lex, the generation tasks, and 6nally the PVLT. They were
given a booklet to record all their answers. Instructions were given orally in English. There was no time limit
imposed but all students completed the tasks within the 45 minutes of the class.
Analytical Procedure
The tests can be argued to have good construct validity if they can be shown to generate words across the 6rst
6ve 1000 words frequency bands and it is expected that frequency effects should be visible in the data produced
by students. Learners should score more in the higher frequency bands than the less frequent ones. If the
responses do not display this kind of frequency pro6le then this will undermine the potential for category
generation tasks as we are using them to provide a good estimate of size.
The number of words available for selection from each of the four categories separated by the 6ve 1000
words frequency bands (taken from the BNC/COCA lists) is shown in Table 2.
Table 2
Availability of Words in the First Five Frequency Bands Divided by Category.
1000 2000 3000 4000 5000 Total
Animals 6 6 15 14 15 56
Clothes 16 10 4 7 14 51
Body parts 24 12 10 17 10 73
Furniture 20 39 4 10 10 83
The words produced by learners from each of these bands are compared with the number of words
available for selection in each frequency band and these 6gures are used to generate an estimate of knowledge
out of 5000. For example, if a learner were able to produce 28 of the 56 available words in the animal category
then it would be assumed that this represented productive knowledge of 50% of the 5,000 most frequent words
in English; a score of 2,500 words.
In testing this format’s reliability the results from the four categories can be used to generate a calculation
for Cronbach's Alpha. If the tests work well then the calculations generated by each test should correlate well and
the Alpha score should be high.
The category generation tasks can be argued to be valid if results correlate well when compared with
results from other tests of the same quality. It might be expected that they should correlate well with PVLT,
which tests the same construct of productive vocabulary knowledge. They should correlate too with X-Lex,
though perhaps not so well since X-Lex is, in theory, testing a slightly different construct. The tasks, if they are
producing useful estimates of productive vocabulary size, should also be able to distinguish between low level
learners and high level learners for example. It would be expected, too, that frequency effects should be visible.
Learners should score more in the higher frequency bands than the lower frequency bands.
Finally, it might be expected that if the category generation tasks and PVLT are testing the same quality of
productive vocabulary size then factor analysis and the calculation of Eigen values will con6rm that a single
factor underlies the results all 6ve tests. If receptive vocabulary knowledge is a separate and distinct construct
then these calculations should show that a second factor underlies the X-Lex scores.
Table 3
Total Responses by Frequency Band
1000 2000 3000 4000 5000
animals 316 216 168 262 229
clothing 455 292 88 127 205
furniture 501 449 210 47 40
Body parts 508 369 104 103 124
500
vocab size (max 1000)
400
300
200
100
0
1000 2000 3000 4000 5000
1000 word frequency bands
Table 3 and Figure 1 demonstrate a visible frequency effect with the bulk of learners’ vocabulary
knowledge lying in the most frequent 1000 and 2000 word bands. Beyond this mark and frequency effect is no
longer visible. Nonetheless, productive vocabulary knowledge resembles receptive vocabulary knowledge, with
the presence of the frequency pro6le as suggested by Ellegård (1960) and Meara (1982), and as observed in
Waring (1997). The implication of this is that the category generation tasks are be capable of providing a
characterisation of the scale of a learner’s productive vocabulary size. Since such an estimate is similar in its
calculation to a test such as X-Lex, which also draws its estimate from these frequency bands, this should allow
productive and receptive vocabulary size to be meaningfully compared.
no normalised 6gures for the size of productive vocabularies for learners at the levels in this study and the
signi6cance of these 6gures can only become apparent when compared with results from the others tests.
Table 4
Mean Word Knowledge by Category Generation Task
Mean productive vocabulary size SD
animals 1155.86 386.90
clothing 1243.61 380.38
furniture 790.99 243.96
body parts 1026.84 357.88
There are several reasons why the tasks used here might vary in the scores they produce. One is that the
topics, taken from the literature on testing in psychology, have not been chosen with EFL testing speci6cally in
mind. However, they are thematic areas which are typically contained in teaching texts for young and beginner
learners of EFL although we have no way of knowing exactly what lexis is contained in these teaching texts nor
how the treatment of this lexis may vary from one theme to another in terms of presentation and recycling. It is
conceivable that these differences in measured knowledge may accurately reKect differences in the presentation
of the material and this might challenge the usefulness of this approach as a quick and easily replicable method
generating consistent measures for productive vocabulary size. A second is that the size of the estimate may vary
according to the theme chosen for testing and not just the overall vocabulary knowledge of the learner. A third
possibility is that these differences may be related to the size of the category generation task itself. Thus, the
furniture category which has the largest number of words available for production produces the smallest size
estimate, and clothing which has the smallest number of words available produces the largest estimate. It is also
quite possible, however, that these differences are the by-product of different task forms and different
administrations, where some variation in scores is inevitable even in well-constructed and regulated tests. Nation’s
14,000 word multiple choice test, for example, has parallel forms which in trials, he reports (2012, p.5), produce
different scores.
These differences in the means between all four category generation tasks are statistically signi6cant, and
the results of t-test and Cohen’s D comparisons are given in Table 5. If parallel forms of this task consistently
produce scores which are different then this challenges the validity of the testing method and the usefulness of
the technique as a method for quickly and easily assessing productive vocabulary size. However, the Cohen’s D
calculations show that the effect size is highly variable. It is not yet clear, therefore, whether these differences do
challenge the test’s validity in this way or a simply part of the kind of variation which repeated testing produces
and which Nation (2012), for example, reports in relation to receptive vocabulary size testing.
Table 5
T-test Comparisons between the 4 Category Generation Tasks
Clothes Test Furniture Test Body Test
t-score Cohen’s D t-score Cohen’sD t-score Cohen’sD
Animal Test 2.617** 0.223 11.502** 1.13 3.511** 0.35
Clothes Test 17.836** 1.42 9.607** 0.79
Furniture Test 8.708** 0.59
Note. ** = signi6cant at the 0.01 level
Reliability Calculations
There are moderate to good correlations between scores on the four category generation tasks. All correlations
are statistically signi6cant at the 0.01 level. The 6gures are shown in Table 6.
Table 6
Category Task Inter-test Correlations
Clothes Test Furniture Test Body Test
Animal Test .554** .618** .649**
Clothes Test .688** .830**
Furniture Test .781**
Note. ** = signi6cant at the 0.01 level
The Body parts task scores correlate particularly well with both the Furniture and the Clothes task while
the Animals task scores correlate least well with the others. This observation might be connected to the number
of words available for production in these tests. The Body parts and Furniture tasks have the highest number of
words in the 5,000 word bands, 73 and 83 words respectively, while the Animal task has only 56 words. For
comparison it might be considered that the receptive X-Lex test samples 100 words form this 5,000 word range
and in the Animal and Clothing tasks there are only about half this number available for production. The
reliability of the task might be inKuenced by the sampling rate and, as a general rule, a larger sample is likely to
produce a more useful estimate. However, in this type of task a very large sample may challenge the immediate
recall ability of the learner and lead to under-estimation. A thematic prompt where there are 20 words available
from the 5,000 word range under examination is an achievable task but a similar task with 2,000 words is not.
The impact of the potential sample size available from different themes and task is something to be investigated.
The calculation of Cronbach’s Alpha using the 4 parallel forms of the productive task can be taken as an
indication of the degree to which these tests measure a single construct. The Cronbach’s Alpha result was .885
(N = 4). Notwithstanding potential dif6culties with individual category tasks and their sampling rate, the score
of .885 is good and can be taken as con6rmation that these tasks can produce results which are both reliable and
consistent.
Table 7
Mean Productive Vocabulary Size Scores by Level
Level animals clothes furniture body parts
mean sd mean sd Mean sd mean Sd
Beginner 910 343 745 200 606 176 940 209
Intermediate 1102 298 995 223 754 114 1185 195
Advanced 1461 265 1357 289 1019 182 1616 296
The productive size scores generated by all four tasks increase with the level of the students as is expected.
The advanced group of learners produce in each task, on average, more words from the 5,000 word frequency
ranges, than the intermediate level students who, in turn, can produce more words on average than the students
at the beginner level. An ANOVA con6rms that this relationship is statistically signi6cant and the results are
shown in Table 8. Tukey tests con6rm that there are statistically signi6cant differences between the means at all
levels in all tests. The ability of these tasks to discriminate meaningfully between learners at different levels of
knowledge and performance supports the construct behind the test and suggests this technique is valid.
Table 8
ANOVA Scores from the Category Generation Tasks
test degrees of freedom F Sig
animals between groups 2 28.439 < .001
within groups 89
clothes between groups 2 55.648 < .001
within groups 89
furniture between groups 2 54.277 < .001
within groups 89
body parts between groups 2 68.634 < .001
within groups 89
Table 9
PVLT Scores Divided by Level
n mean Sd
Beginner 36 982 745
Intermediate 23 910 717
Advanced 33 2124 1348
Total 92 1338 1138
Table 10
X-Lex Scores Divided by Level
n mean Sd
Beginner 36 3084 845
Intermediate 23 2737 567
Advanced 33 3685 790
Total 92 3213 847
Correlations between PVLT and X-Lex scores, and the scores on the four category generation tasks are
shown in Table 11.
Table 11
Correlations between Category Generation Task Scores and PVLT and X-Lex Scores
PVLT X-Lex
Animals test 0.494** 0.362**
Body parts test 0.408** 0.424**
Furniture test 0.344** 0.324**
Clothes test 0.481** 0.353**
Both PVLT scores and X-Lex scores indicate, broadly, that the vocabulary size of the learners increases
with level as might be expected and this is con6rmed by ANOVAs (PVLT F (2,89) = 14. 539, sig<.001, X-Lex F
= (2,89) = 11. 237, sig<.001). Tukey tests, however, indicate that neither test is able to produce a statistically
signi6cant difference in the means between the Beginner and Intermediate students. The category generation
tasks were capable of doing this and one interpretation of this is that the category generation tasks are better
able to distinguish levels of knowledge among lower level learners than the other tests. PVLT produces an
estimate of size which is slightly larger than the estimates produced by the category generation tasks. An analysis
of variance used to calculate effect size suggests a moderately large effect size but this result is not statistically
signi6cant (F(89,2)=5.312, sig=.171). This may be a product of the different methodologies and knowledge being
accessed. PVLT provides quite extensive context and a letter cues for each test word where the category
generation tasks so not. Or it may be an outcome of the formula for turning PVLT scores into a size estimate
where not all frequency bands are tested and knowledge in these missing bands has to be inferred from
knowledge elsewhere. The difference between the means for the PVLT and the largest scoring category task,
Clothes, is not statistically signi6cant. The difference between the means for PVLT and Animals is signi6cant
only at the .05 level (t = 2.077, sig = .041). There are signi6cant differences between PVLT
and the means for the other two tests (Furniture t = 3.138, sig = .002, Body parts t = 5.177, sig < .001).
X-Lex produces a larger estimate of vocabulary size than either the category generation tasks or PVLT. X-
Lex is a receptive vocabulary size test and it is expected that receptive size estimates will be larger than
productive size estimates. An analysis of variance used to calculate effect size produces a result that is not
statistically signi6cant (F(89,2)=1.016, sig=.622). In a review of the literature in this area Milton (2009), Nation
(1990) and Schmitt (2000) report that the difference between these scores varies but that, typically, receptive sizes
are about double that of productive sizes. In this study the scores suggest that the productive size estimates are
between one third and a half of the size of the receptive estimates and the relationship is summarised in Figure
2. This 6gure suggests that while the 6ve productive sizes mean scores can be distinguished statistically, they are
of similar scale and in the right kind of proportion in relation to receptive vocabulary size. It may be that re6ning
the category generation tasks can make them perform more consistently in producing more similar size estimates.
Factor Analysis
Since PVLT and the four category generation tasks are all designed to access productive vocabulary knowledge
and produce estimates of productive size, it is expected that factor analysis should reveal a single factor
underlying the scores. Factor analysis and the calculation of Eigen values allows this to be investigated. The scree
plot (Figure 3) and component matrix (Table 12) suggest that this is the case. The scree plot identi6es only one
component with a score above 1. The component matrix indicates that the four category generation tasks all
correlate well with this factor while the correlation produced with PVLT is smaller but still satisfactory.
Table 12
Component Matrix for Productive Vocabulary Size Tests
Component 1
Animals .806
Clothing .865
Furniture .854
Body parts .928
PVLT .626
It is expected too, that when the 6ve productive tests and X-Lex are compared that more than one factor
should be visible since X-Lex is designed to access a different construct from the others and that receptive
knowledge is considered to be qualitatively and quantitatively different from productive knowledge. It is not clear
from the factor analysis that this is visible. The scree plot (Figure 4) and component matrix (Table 13) suggest that
a single factor underlies the scores in all six tests even if X-Lex, like PVLT, correlates less well with this single
factor than the category tasks. The implication of this is that receptive and productive knowledge scores are all,
largely, explained by just one factor. We presume this is vocabulary size but it could be other things. It could be a
general vocabulary knowledge factor or it could be a something non-linguistic like intelligence.
It is fashionable to think of vocabulary as multidimensional but these results suggest that one of the oldest
divisions of vocabulary knowledge, receptive and productive knowledge, may not be quite the division that is
thought. Of course, receptive and productive knowledge cannot be completely unrelated. A condition of having
a large productive vocabulary knowledge is having a large receptive vocabulary knowledge; it is presumably
impossible to produce meaningfully words in a foreign language that are not even recognised as words. In
principle, it should be possible for the reverse to be true and for a large number of words to be recognised even if
knowledge is so limited that they cannot be activated and used. However, our interpretation of the factor
analysis, and correlations between the productive and respective tests, is that in practice productive knowledge
tends to grow with receptive knowledge. Co-linearity is a feature of the studies which compare vocabulary size
with automaticity in production (Schoonen, 2010).
Table 13
Component Matrix All Vocabulary Size Tests
Component 1
Animals .794
Clothing .856
Furniture .828
Body parts .902
PVLT .669
X-Lex .599
Conclusions
What can we conclude from this? It is possible to make a case that the category generation task is, potentially, a
useful test format which can measure, and put a size on, productive vocabulary knowledge. The tests have proven
reliable and, in certain ways, valid. The category generation task triggers learners at all levels to produce a large
number of words with minimum direction or interference from the teacher or a text. It is able to target a
predictable range of words in the frequent vocabulary bands so that a workable estimate of productive
knowledge can be formed, and these estimates correlate reasonably well with each other so the Alpha score is
high. It distinguishes between low, intermediate and high level learners well, arguably rather better than PVLT
or X-Lex. It correlates, although modestly, with other tests of productive and receptive vocabulary knowledge,
and this suggests that teaching effects may not be signi6cantly affecting the ability of the technique to make a
good estimate of size. It also produces scores, consistently, which are smaller than receptive vocabulary size which
makes sense. It is a very easy format that requires very little adaptation to work across learners from different
language backgrounds, and it may be particularly useful in assessing knowledge among very low level learners.
This type of test for productive vocabulary size seems to have potential, therefore, but this study has raised
questions about the use of the technique and the estimates it creates which need to be investigated more
thoroughly.
One is that the separate scores from the different category generation tests and the PVLT all produce
different mean scores and, with one exception, the differences are suf6ciently great to be statistically signi6cant.
Parallel forms which give a stable size estimate are necessary if the test is to perform like the receptive tests of
vocabulary size and be capable of being used as a standard test in this area. Nonetheless, the scores that are
produced are all about one third the estimate of receptive vocabulary knowledge and that ties in with other
studies in the literature which compare receptive and productive vocabulary knowledge. It has already been
noted that parallel test forms rarely produce identical scores, but what should be made of the scale of variation
seen here is, as yet, unclear. As Meara (2009) points out, the words produced for assessments in productive tasks
are dependent on the task, the genre and the prompt itself so, perhaps, a range of scores is what we should be
seeing if students are responding to a range of tasks even if their vocabulary remains unchanged. The construct
of productive vocabulary could usefully bene6t from a more precise speci6cation to help us work through these
dif6culties.
It has to be noted too that this is just one study based on learners with one language background and in
one country. It would make sense to repeat this form of testing on other learners with different learning and
language backgrounds as a check to see that the technique is applicable beyond learners in Iran.
There are issues too with the prompts used in this study which are a small group of prompts drawn from
the psychology literature. These prompts were chosen not least because they are also areas typically covered in
young learner syllabuses. But this may make the scores they produce potentially misleading since words drawn
this way may also challenge the underlying idea that a good estimate of size is made by using a random sample
of words across the frequency bands. A sample that draws on the subject areas that we know that learners have
covered is not a random sample. The effect of such a choice of prompt also needs to be clari6ed although it is
not clear from this study that any effect that does exist is very great.
The sampling across the frequency bands, produced by these prompts, produces a workable selection, from
which an estimate can be made. But it is notable that the selections this produced are of different sizes and not
evenly spread across the frequency bands. The effect on the estimate this produces will need to be measured and
appraised. Given the issues which may surround the size of the potential sample a thematic prompt can produce,
it would also make sense to repeat this work with other prompts. It would make sense to investigate prompts
capable of producing larger samples in order to test the effect of this on the size of the estimate. Large prompts
seem likely to produce smaller estimates. It would be useful to know at what levels the estimates appear less than
useful. It would make sense, too, to investigate prompts capable of producing better and more equally sized
samples. This would seem likely to help control for the variation in scores produced by the four tests used in this
study. This would require the use of themes other than the four used in this study which were, in any case taken
from psychology. If the methodology is to prove useful in EFL then a wider variety of themes, perhaps more
directly applicable to EFL testing, might be appropriate. It might even be useful to test the use of other prompts
such as letters of the alphabet rather than thematic cues although in the psychology literature, these appear to
work rather differently.
Finally the factor analysis is raising an unexpected question since it appears that productive and receptive
vocabulary knowledge used here are not the separate constructs as they are generally portrayed but are all
tapping into a single factor which may be some general vocabulary knowledge or size. Maybe that should not be
surprising since the various dimensions of vocabulary knowledge ought to be connected. The ability to produce a
word has as a precondition that the word is known receptively, so it follows that a large productive vocabulary
knowledge must be associated with a large receptive score. High productive and low receptive scores ought to be
impossible if the construct of the lexicon is as we understand it, and the tests we use to access knowledge are
working tolerably well. The opposite may be potentially true, where a high receptive knowledge might be
associated with a small productive knowledge, but it is hard to imagine the circumstances of teaching and
learning that might produce a very highly disparate set of scores. The common acceptance of the idea of multi-
dimensionality in vocabulary knowledge and the need for multiple testing, therefore, should not blind us the way
these dimensions necessarily interconnect. Our interpretation of the factor analysis in this study is that for most
practical purposes, the need for multiple testing in vocabulary is probably not as important as is thought.
Multiple testing may be useful in the research community but it seems as though for most practical purposes a
single well-constructed test is likely to give a good impression of all aspects of vocabulary knowledge.
This study suggests that in its present form the test would be useful in schools in order to generate an
estimate of size so learners can be ranked or compared on their productive knowledge. Where a productive test
in particular is wanted, this will likely work well. However, it is not yet in a state where parallel forms can be
generated and a stable estimate of size produced and used, as in receptive vocabulary size tests, for use in
research or to link with other factors of language performance like exam performance.
References
Cobb, T. (2014). http://www.lextutor.ca/. (accessed 31st August 2014).
David, A. (2008). Vocabulary breadth in French L2 learners. Language Learning Journal, 36(2), 167-180.
Ellegård, A. (1960). Estimating vocabulary size. In Word, 16, 1960, 219-244.
Ellis, N. C. (2002a). 'Frequency effects in language processing: A Review with Implications for Theories of
Implicit and Explicit Language Acquisition. Studies in second language acquisition, 24(02), 143-188.
Ellis, N. C., (2002b). ReKections on frequency effects in language processing. Studies in second language acquisition,
24(02), 297-339.
Fitzpatrick T. and Milton J. (2014). Reconstructing vocabulary knowledge. In Milton, J. and Fitzpatrick, T. (eds.)
Dimensions of Vocabulary Knowledge(pp. 173-177). Basingstoke: Palgrave.
Hindmarsh, R. (1980). Cambridge English Lexicon. Cambridge: Cambridge University Press.
Izura, C., Hernández-Muňos, N. and Ellis, A. (2005) Cognitive norms for 500 Spanish words in 6ve semantic
categories. Behavior Research Methods, 37(3), 385-397.
Laufer, B. (1989). What percentage of text is essential for comprehension? In Lauren, C. and Nordman, M. (eds.)
Special Language; from Humans Thinking to Thinking Machines (pp. 316-323). Cleveden: Multilingual Matters.
Laufer, B. & Nation, P. (1999). A vocabulary size test of controlled productive ability. Language Testing, 16(1), 33-
51.
Long, M. and Richards, J (2007). Series Editors’ Preface. In Daller, H., Milton, J. and Treffers-Daller, J. Modelling
and Assessing Vocabulary Knowledge (pp. xii-xiii). Cambridge: Cambridge University Press.
Meara, P. (1982). Word association in a foreign language: a report on the Birkbeck vocabulary project. Nottingham
Linguistic Circular, 11, 29-37.
Meara, P. (1997). Towards a new approach to modelling vocabulary acquisition. In N. Schmitt, and M.
McCarthy, (Eds.) Vocabulary: Description, Acquisition and Pedagogy. Cambridge; Cambridge University Press,
109-121.
Meara, P. (2009). Connected Words: Word associations and second language vocabulary acquisition . Amsterdam: John
Benjamins.
Meara, P. and Bell, H. (2001). P-Lex: A simple and effective way of describing the lexical characteristics of short
L2 texts. Prospect 16(3), 323-37.
Meara, P. and Milton, J. (2003). The Swansea Levels Test. Newbury: Express.
Meara, P. M. and Miralpeix, I. (2008). Vocabulary Size Estimations: V_Size 41st Annual Meeting of the British
Association for Applied Linguistics (BAAL). Swansea, UK.
McKinney, K L (2009). Lexical Errors Produced During Category Generation Tasks by Bilingual Adults and Bilingual Typically
Developing and Language-Impaired Seven to Nine-Year-Old Children. Unpublished MA thesis The University of
Texas at Austin.
Milton, J. (2007). Lexical pro6les, learning styles and the construct validity of lexical size tests. In Daller, H.,
Milton, J. and Treffers-Daller, J. (eds.) Modelling and Assessing Vocabulary Knowledge (pp. 47-58). Cambridge:
Cambridge University Press.
Milton, J. (2009). Measuring second language vocabulary acquisition. Bristol: Multilingual Matters.
Milton, J. (2010). The development of vocabulary breadth across the CEFR levels. In Vedder, I. Bartning, I. &
Martin, M. (eds.) Communicative proEciency and linguistic development: intersections between SLA and language testing
research (pp. 211-232). Second Language Acquisition and Testing in Europe Monograph Series 1.
Milton, J. & Alexiou, T. (2009). Vocabulary size and the Common European Framework of Reference for
Languages. In Richards, B., Daller, M., Malvern, D., Meara, P., Milton, J. & Treffers-Daller, J. (eds.)
Vocabulary Studies in First and Second Language Acquisition (pp. 194-21). Basingstoke: Palgrave.
Milton, J. & Riordan, O. (2006). Level and script effects in the phonological and orthographic vocabulary size of
Arabic and Farsi speakers. In Davidson, P., Coombe, C., Lloyd, D. and Palfreyman, D. (eds) Teaching and
Learning Vocabulary in Another Language (pp. 122-133). UAE: TESOL Arabia.
Milton J., Wade, J. & Hopkins, N. (2010). Aural word recognition and oral competence in a foreign language. In
Chacón-Beltrán, R., Abello-Contesse, C. & Torreblanca-López, M. (eds.) Further insights into non-native
vocabulary teaching and learning (pp. 83-98). Bristol: Multilingual Matters.
Nation, I.S.P. (ed) (1984). Vocabulary Lists: words, afExes and stems. English University of Wellington, New Zealand:
English Language Institute.
Nation, I S P (1990). Teaching and Learning Vocabulary. Boston: Heinle and Heinle.
Nation, I.S.P. (2001). Vocabulary Levels Test. In Nation, I.S.P. (2001) Learning Vocabulary in Another Language (pp.
416-424). Cambridge: Cambridge University Press.
Nation, I.S.P. (2007). Fundamental issues in modelling and assessing vocabulary knowledge. In Daller, H.,
Milton, J. & Treffers-Daller, J. (Eds.) Modelling and Assessing Vocabulary Knowledge (pp. 33-43). Cambridge:
Cambridge University Press.
Nation, I.S.P. (2012). Vocabulary Size Test. instructions a t http://www.victoria.ac.nz/lals/about/staff/paul-
nation (accessed 31st August 2015).
Palmer, H.E. (1921.) The Principles of Language Study. London: Harrap.
Richards, B.J., & Malvern, D.D. (2007) Validity and threats to the validity of vocabulary measurement. In Daller,
H., Milton, J. & Treffers-Daller, J. (Eds.) Modelling and Assessing Vocabulary Knowledge (pp. 79-92).
Cambridge: Cambridge University Press.
Schmitt, N (2000). Vocabulary in Language Teaching. Cambridge: Cambridge University Press.
Schoonen, R. (2010). The development of lexical pro6ciency knowledge and skill. Paper presented at the
Copenhagen Symposium on Approaches to the Lexicon, Copenhagen Business School on 8-10 December
2010. Accessed at https://conference.cbs.dk/index.php/lexicon/lexicon/schedConf/presentations on
03.03.2011.
Stæhr, L.S. (2008). Vocabulary size and the skills of listening, reading and writing. Language Learning Journal, 36(2),
139-152.
Tinkham, T. (1997). The effects of semantic and thematic clustering on the learning of second language
vocabulary. Second Language Research, 13(2), 138-163.
van Hout, R. & Vermeer, A. (2007). Comparing measures of lexical richness. In Daller, H., Milton, J. & Treffers-
Daller, J. (Eds.) Modelling and Assessing Vocabulary Knowledge (pp. 95-115). Cambridge: Cambridge University
Press.
Vermeer, A. (2001). Breadth and depth of vocabulary in relation to L1/L2 acquisition and frequency of input.
Applied Psycholinguistics (2001) 22, 217-234.
Waring, R (1997). Comparison of the receptive and productive vocabulary knowledge of some second language
learners. Immaculata; The Occasional Papers of Notre Dame Seishin University. 1997, 94-114.
Webb, S. (2005). The effects of reading and writing on word knowledge. Studies in Second Language Acquisition, 27,
33-52.
Webb, S. (2008). The effects of context on incidental vocabulary learning. Reading in a Foreign Language, 20(2), 232-
245.
James Milton is Professor of Applied Linguistics at Swansea University, UK. He worked in Nigeria and in Libya
before coming to Swansea in 1985. A long-term interest in measuring lexical breadth and establishing normative
data for learning has produced extensive publications including Measuring Second Language Vocabulary Acquisition
(Multilingual Matters, 2009).
Hedy McGarrell*
Brock University, Canada
Abstract
This study reports on the analysis of a widely used “General English” textbook to explore the relationship between lexical
bundles included in the text and lexical bundles identi ed in relevant corpora to determine the appropriateness of the text’s
vocabulary in relation to its stated objective. Appropriateness is examined through the analysis of usefulness and functions,
and the relationship between the two, by comparing the usefulness scores of various functions. The results show a relatively
low level of usefulness of the lexical bundles in the textbook, meaning low frequency and small range of usage for the
analysed items. The function analysis showed that textbook includes all the functions. The most common function was
referential, followed by stance, special conversational, and discourse organizing functions. The current study offers an initial
step for future research of lexical bundles and their functions, and usefulness in language teaching and teaching materials
development; speci cally, it suggests a possible methodology to be used in such research. Moreover, the results of this study
provide insights into the value of lexical bundles in teaching and the development of teaching materials.
Introduction
Textbooks in second or foreign language learning programs are typically the main or even sole source of
vocabulary input for learners in classroom contexts and thus have a major impact on the vocabulary learners
encounter (McDonough, Shaw & Mashuhara, 2012; Neary-Sundquist, 2015). However, researchers, teachers
and their learners have repeatedly questioned whether the language included in these textbooks re6ects the
language used in real life situations (Biber & Reppen, 2002). Increasingly, studies show that language from
language in use, as captured in corpora, and language teaching materials are often at odds (Gabrielatos, 2006;
Koprowski, 2005; Meunier & Gouverneur, 2009; Shortall, 2007). The availability of suitable techniques for
analyses may have prevented more extensive research in the past, but increasingly corpus linguistics, with its large
data banks of naturally occurring text, provides a promising way of investigating such questions. One such
technique involves the analysis of multi-word combinations that co-occur repeatedly within the same register in
native speaker usage but are not typically xed nor structurally or semantically complete (Csomay, 2013). Conrad
& Biber (2004) show that approximately 20 percent of the words (tokens) in written academic
*Tel. 1 905 688 5550, ext. 3757. Email: hmcgarrell@BrockU.CA, Department of Applied Linguistics, Brock University, 1812
Sir Isaac Brock Way, St. Catharines, ON, Canada, L2S 3A1
texts occur within three or four word groups of such multiword combinations, which makes them an important
focus for further investigation as they have the potential to support subsequent language learning (L2). The
question then is whether textbooks include the multiword combinations typical for the stated purpose of a given
L2 textbook, in this study speci cally in an English as a Subsequent Language (ESL) for ‘general English’
textbook.
Researchers interested in the relationship between actual language use and the language presented in
textbooks and other teaching materials have pointed out how corpora can be used to answer questions about
variations in language across registers, lexico-grammatical associations, discourse variables, language acquisition.
McCarten (2010) argues that corpora provide sources for textbook developers to compile systematic lexico-
grammatical syllabi based on authentic texts. Research that focuses on the study of frequently occurring
multiword combinations (Biber, Johansson, Leech, Conrad, & Finegan, 1999: Cortes, 2004; Sinclair, 1991), often
referred to as lexical bundles in recent work, encountered in the texts represented in corpora is particularly
relevant for the current study. These vocabulary focused studies have investigated multiword combinations and
their structural and functional characteristics in various disciplines and registers such as academic prose,
conversation, classroom discourse, demonstrating their importance in diverse naturally occurring, which in turn
makes them an important component for learners’ vocabulary development. Increasingly, research ndings
support arguments in favour of including lexical bundles in textbooks and other pedagogic materials.
Considering the discrepancies pointed out in recent research between the vocabulary included in textbooks
and its occurrence in authentic language illustrated in corpora, the current study presents an analysis of a widely
used General English textbook, English File intermediate (student’s book) b y Latham-Koenig and Oxenden (2013).
The focus of analysis is on lexical bundles and seeks to determine whether the lexical bundles included in the
textbook represent broader usage as indicated in corpora. Given the research that demonstrates the frequency of
lexical bundle in a broad range of naturally occurring texts, with different bundles and functions depending on
register, such bundles have been shown to be present in textbooks Biber, 2006; Hyland, 2005), but with important
disciplinary differences. The underlying question that motivates the current study is whether the lexical bundles
in language learning texts re6ect those bundles researchers have identi ed as particularly frequent in language
situations relevant to the stated purpose of such a textbook.
The literature review below serves to de ne key terms used in the study and to provide background from
recent, directly related, studies. The role of corpora in materials development is discussed rst. It is followed by a
de nition of lexical bundles and a discussion of their role in natural language. The section concludes with a
de nition of function in relation to lexical bundles.
speci c areas: grammar features included (types of adjectives), order of grammar topics (simple and progressive
aspect), and vocabulary used to present these areas. Their analyses showed that the relevant
materials in the selected textbooks did not re6ect the frequency data in corpora, that the sequence of grammar
points presented was not grounded in actual use and that there was little consistency in selecting vocabulary.
Biber and Reppen concluded that the textbooks analysed were developed based on instinct rather than language
in use. Considering their ndings, they argued that frequency information should be a key factor in materials
development choices, as frequently occurring vocabulary and grammar features are likely more useful for
learners. A replication of Biber and Reppen’s study, with more recent editions of either the same or comparable
grammar texts (Lee & McGarrell, 2011), suggests increasing awareness of the existing gap between materials and
language in use. These more recent editions were either corpus-based or corpus-informed (McCarthy, 2008),
thus were expected to re6ect a more authentic description of the speci c areas being analysed. Lee &
McGarrell’s analysis showed that the more recent texts tended to represent corpus ndings more closely, but still
left considerable room for improvement in terms of re6ecting actual language use. Similarly, Cheng and Warren
(2007) examined 15 EFL textbooks endorsed by the Hong Kong Education and Manpower Bureau and
compared them to the ndings generated from the Hong Kong Corpus of Spoken English (HKCSE). Analyses
showed that the vocabulary and language forms introduced in the textbooks were low-frequency items associated
primarily with academic registers, thus more complex and explicit than the forms found in the HKCSE. Finally,
two studies investigated speci cally the use of multiword combinations. Koprowski (2005) investigated the
usefulness of lexical phrases, in terms of their frequency and range, in contemporary textbooks compared to
corpus data. The analysis involved 822 items based on their usefulness scores generated from the frequency and
range data in the COBUILD Bank of English, a computerized corpus containing 17 different British and
American native-speaker subcorpora (e.g., newspapers, magazines, books, radio, informal conversations).
Findings showed that one third of lexical phrases used in the textbooks analysed were low-frequency items, thus
unlikely useful in most real communication. Koprowski questioned the validity of lexical selections, suggesting
that they were likely again based on the textbook writers’ intuitions and experience rather than real language.
The studies discussed in this section show that despite the availability of corpora and corpus research
ndings, materials writers rely heavily on intuition. The paucity of textbooks that incorporate insights from
corpora may be attributed to the fact that early corpora tended to be designed for linguists and were dif cult to
access for materials designers and teachers. In his investigation into the attitudes of text book writers towards
corpus materials, Burton (2012) discovered that many of these authors share a lack of knowledge of corpora, in
terms of their existence, bene ts and exploitation. Considering these ndings, he agrees with McCarthy (2008) in
his conclusion that to effect change, teachers and their students will need to request that publishers produce
materials that re6ect the most accurate portrayals of language. This, in turn, underlines the need for language
teacher education programs to include readings on corpus linguistics and encourage student teachers to become
familiar with the exploitation of corpus materials for learning and teaching language. Timmis (2013) stresses the
value of viewing corpora as contributors to course materials rather than arbiters of lexical-grammatical choices.
He points out that such a view allows for corpus frequency information to be reconsidered to accommodate e.g.,
developmental sequences, local need, intuition, cultural and pedagogic considerations and concludes that
corpora do not inform practitioners what or how to teach, they do, however, provide valuable information on the
nature of language and language production for consideration in materials design.
components as xedness, idiomaticity, completeness, and intuitive recognition by native speakers of English. The
combination is an idiom that has a xed form and is recognized by native speakers as one unit (one is unlikely to
hear native speakers use a variation such as at the hat’s drop), with low transparency in meaning. Multiword
combinations thus differ from idiomatic expressions and collocations in both form and scope. For a review of full
discussion see e.g., Wray (2002). Nattinger and DeCarrico (2001) refer to multiword combinations as lexical
phrases, stressing the importance of xedness and pragmatic completeness, while Bahns and Eldaw (1993) use the
term word combination, which for their purposes does not include the xedness component. Building on their own
and earlier work, Conrad and Biber refer to multiword items as lexical bundles, and point out two main criteria:
frequency and register. The rst criterion, frequency, relates to cut-points, meaning the number of times a lexical
bundle occurs in a corpus, in relation to the size of the overall corpus and the research goals. The second
criterion relates to multitext occurrences (i.e., dispersion), typically at least ve texts in any one register, but again
dependent on the corpus and research goals. This criterion is intended to avoid personal preference by individual
writers of text in their use of lexical bundles. While Conrad and Biber (2004) recognize that other features are
involved in de ning multiword combinations, they have identi ed frequency and multitext occurrences as the
most important. They argue that such lexical bundles represent “the most frequent recurring xed lexical
sequences in a register” (p. 59).Researchers have identi ed lexical bundles of varying lengths but increasingly
focus their analyses on 4-word bundles. The structure of 4-word bundles tends to contain 3-word bundles
(Cortes, 2003; Hyland, 2008; Wood, 2013) but exclude most non-standard or meaningless bundles of two or
three words (Hyland, 2008). Wood (2013) points out that 5- and 6-word bundles are relatively less common than
4-word bundles, thus the longer bundles would provide more limited frequency data.
Research shows that the use of lexical bundles is connected to improved 6uency in learners’ spoken and
written discourse (Fan, 2009; Nation, 2001; Wood, 2010; Wood & Appel). From a psycholinguistic perspective,
there is an underlying assumption that such lexical bundles are stored as one unit, making their recognition and
retrieval easier, faster, and requiring less attention to complete a task, thereby freeing up processing capacity for
greater 6uency (Conrad & Biber, 2004; Wood, 2010). To examine the relationship between ESL learners’ use of
lexical bundles in academic writing and their English language ability, Appel (2016) analysed argumentative
essays the learners wrote for the Canadian Academic English Language (CAEL) test. The resulting corpus of
essays was divided into three subcorpora: the Lower Level Corpus (LLC), which included essays that the
examiners had judged to be at a beginner level, the Medium Level Corpus (MLC), texts produced by
intermediate level writers, and the High Level Corpus (HLC) from upper-intermediate and advanced level
writers. The lexical bundles in each subcorpus were then examined in terms of their frequency similarity, and
length. The ndings showed that high-level writers tended to use more lexical bundles than low-level writers. In
addition, HLC writers typically used shorter bundles with less repetition of usage. Appel’s study thus provides
support for the notion that lexical bundle use is correlated to ability level in ESL learners.
and to have occurred in at least ve different texts in the academic prose corpus. The researchers compared the
resulting 4000 bundles from the conversation sub-corpus and the 3000 bundles from the academic subcorpus
based on three criteria: frequency in each register, structural pattern, and function. The frequency analysis
showed that the bundles appeared more frequently in conversation (28%) than in academic texts (20%). The
structural analysis showed that most of the bundles in conversations included part of a verb phrase while most of
the bundles in academic texts included parts of noun phrases and/or prepositional phrases. Finally, the function
analysis, which focused only on 4-word bundles as longer bundles are less frequent and typically include 4-word
bundles as part of their structures, showed that register resulted in noticeable differences between the function
bundles. For example, epistemic stance and discourse organizing bundles were more frequent in conversation,
while referential bundles occurred widely in academic texts. An additional category of function identi ed, special
conversational bundles, covered such functions as politeness routines (thank you very much), simple inquiry (what are
you doing?), and reporting clauses (I said to him), were identi ed in conversational discourse only. Further qualitative
analysis showed that epistemic stance bundles in conversation were widely used to express personal uncertainty,
opinions, desires, and intentions, while stance bundles in academic prose re6ected personal certainty. Discourse
organizing bundles in conversation were used to introduce or focus on a topic or as clari cation, while the same
type of bundles in academic texts was used to convey explicit contrast. Conrad and Biber concluded that while
lexical bundle use is frequent in both conversations and written academic texts, the type of bundle used depends
on register, its context, and purpose. Their ndings show that lexical bundle use is not accidental but re6ects
common patterns and types of bundles that vary depending on register, context, and purposes. One conclusion
that is suggested in these ndings is that language learners would likely bene t from some explicit instruction in
some of the most common patterns and bundles relevant to their learning goals.
In related research, Wood and Appel (2013) IN their analyses of the lexical bundles from the business and
engineering textbooks, showed that referential bundles were the most frequently occurring (62%), followed by
discourse organizing (24%), and stance bundles (14%). The researchers attribute the large number of referential
bundles to the fact that textbooks typically point out and explain subject matter. Wood and Appel suggest that
awareness of high-frequency lexical bundles used in different disciplines is likely to assist teachers and materials
developers in selecting the most appropriate items to include in textbooks of various disciplines. The inclusion of
lexical bundles in language teaching thus should serve to bene t learners’ awareness and linguistic ability.
An investigation of lexical bundles and their functions in relation to discourse structure is also the focus in
Csomay (2013), who examines classroom discourse. A corpus based on selected data from the TOEFL 2000
Spoken and Written Academic Language corpus and the rst six units of 196 university classroom sessions in the
Michigan Corpus of Academic Spoken English was analysed. The 84 4-word bundles identi ed were analysed
for their functions. The ndings show that stance bundles were used more frequently in the opening phase of
classroom session, while referential bundles were used more frequently in the instructional phase of the classroom
discourse. Stance bundles were typically used to convey personal obligation (e.g., I don’t know; do you think so) ,
while directive (e.g., it is necessary to; you don’t have to) and referential bundles (e.g., at the same time; one of the most) were
used to express time, place, and the speci cation of attributes. Discourse organizing bundles (e.g., what do you
think; on the other hand) were the least frequent in classroom discourse. Similar to previous studies, Csomay
concluded that the use of different types of lexical bundles varied according to the communicative context and
purpose and also suggested that the inclusion of different types of lexical bundles in pedagogy would likely
enhance students’ understanding of these lexical items in academic settings.
The above studies support the notion that various registers are associated with different types of lexical
bundles, based on the context and purposes of a register. Further research will likely clarify and con rm the
various associations. In the meantime, the authors of the above studies tend to agree that ndings should be
re6ected in textbooks and other classroom materials. While textbooks might be expected to re6ect frequently
occurring lexical bundles, studies exploring the relationship between textbooks and relevant corpora are not yet
readily available. Yet, the underlying assumption is that explicit explanations and illustrations in appropriate text
selections will have bene cial effects on learners’ language development. To address this perceived gap in the
literature, the study described in the following was designed to determine the extent to which a textbook used for
intermediate level English learners incorporates both relevant examples of lexical bundles and their functions.
Findings serve to shed light on the relationship between textbook language and language use as re6ected in
corpora.
This Study
The general English textbook English File: Intermediate Student’s Book (Latham-Koenig & Oxenden, (2013) was
selected as it is a widely used, with much of it available online. The text is intended to focus on spoken English
for general purposes, consists of 10 units divided into sections including grammar, vocabulary, pronunciation and
practical English episodes.
Methods
The data for the current study consist of an electronic version of the textbook under investigation. The 41,752-
word corpus created includes the reading texts, dialogues, and listening transcripts from all parts of all units, but
excludes grammar and vocabulary exercises, which include tasks such as matching, ll-in-the-blank, answering
questions, and instructional language. Similarly, items with names of people, nicknames, names of countries,
states and websites and social media were also excluded from analysis to avoid coincidences related to the
textbook itself.
Four-word bundles were generated through use of kfNgram concordancing software (Fletcher, 2012), a free
tool that extracts lexical bundles and provides frequency numbers. To generate the bundles, kfNgram was set to
extract 4-word bundles that had at least three occurrences, which re6ects the frequency cut-off of 40-99 times
per million words identi ed in Biber et al. (2004). This frequency requirement resulted in a total of 222 4-word
bundles. These 4-word bundles were analysed to identify sequences that were true 4-word bundles rather than 3-
word bundles with variable slots (Wood, 2013). The procedure entails the separation of each 4-word bundle into
two 3-word bundles. If frequency counts indicate that the 3-word bundle is more frequent than the 4-word
bundle, the 3-word bundle is considered to be the base structure. For example, the 4-word bundle in other words the
can be separated into in other words and other words the. As the frequency of the former is greater than that of the
latter, in other words is considered the base structure and the article the is considered a variable slot and placed in
parentheses. Once all 4-word bundles generated by kfNgram had been analysed, a list of 169 4-word bundles with
between three to 10 occurrences resulted, as illustrated in Table 1.
Table 1
Frequency of 4-Word Bundles
Frequency Number of items Percentage Example
10 2 1.2 I don’t know; I don’t think
8 2 1.2 I don’t want; do you think you
7 6 3.6 at the end of; don’t want to
6 5 2.9 as soon as I; if you don’t
5 6 3.6 do you think you; I was going to
4 22 13 and there is a; do you have a
3 127 75.1 about going to the; can you pass the
Total 169 100%
Analyses
Three stages of analysis served to address the research question. The rst stage assessed the 4-word bundles
based on their usefulness score. The second stage identi ed the various functions of the 4-word bundles in the
textbook corpus. A quantitative and qualitative analysis of the usefulness scores and functions was carried out in
the third stage of analysis. Each stage is described in the following.
Research referred to in the above has shown that the importance of a given lexical item is re6ected in its
frequency in naturally occurring texts from different but relevant sources. Koprowski (2005) suggested a
procedure to assign usefulness scores, i.e., a value that captures the frequency of lexical items in terms of
occurrences per million words in speci c corpora in addition to information about range, which refers to the
number of registers or text types in which a given lexical item can be found. Following Koprowski, usefulness
scores were assigned to the 4-word bundles in the textbook under investigation by comparing analysed items with
the COBUILD concordance, to determine their frequency data in ve sub-corpora of different text types, where
the analysed items were most commonly found. For the rst stage of analyses, averaged frequency scores from
the ve individual frequency scores provided the usefulness score for each 4-word bundle and re6ect their
frequency and range across ve text types.
A second stage of analyses involved identifying the various functions of all 4-word bundles in the textbook
corpus. Whilst the purpose of functions varies depending on register, Conrad and Biber (2004) identi ed three
types of functions of 4-word bundles: stance expressions, discourse organizers, and referential expressions. Table 2 shows
that the function stance expressions includes bundles that re6ect personal or impersonal attitudes towards an action
or event in a text and is sub-divided into epistemic bundles and attitudinal/modality bundles, a group that is further
divided into desire, obligation/directive, intention/prediction and ability. The function discourse organizers is divided into the
sub-categories topic introduction/focus and topic elaboration/clari?cation bundles. The former introduces new topics or
directs attention toward speci c topics, the latter provides additional information or clari cation to a topic. The
third function includes referential bundles, which indicate speci c features of physical or abstract entities. Referential
bundles are divided into the four sub-categories identi?cation/focus, imprecision, speci?cation of attributes , and multi-
functional bundles. In turn, these sub-categories serve to stress the importance of an object, re6ect imprecision or
uncertainty about an object, focus on selected aspects of an object and may include quantity, physical or abstract
attributes and, the fourth sub-category, to refer to various time-related aspects. Conrad and Biber also identi ed a
speci cally conversational function, which includes categories such as politeness routines, simple inquiry and
reporting clauses. Bundles from this last function appear in their conversation sub-corpus only. A summary of
these functions and their sub-categories is offered in Table 2.
Conrad and Biber’s four functions and their sub-categories served to classify the 4-word bundles from the
textbook corpus. Bundles that did not clearly t into any of these functional categories were placed into a no-
function category for further analysis.
The third stage of analysis entailed the quantitative and qualitative analysis of the usefulness scores and
functions of the extracted 4-word bundles. Each function and its subcategories was allocated the overall
usefulness scores achieved by averaging the usefulness scores of the items under the functions and their
subcategories. The purpose of the analyses was to determine the relationship between the functions of 4-word
bundles and their usefulness and represents the nal stage in the analyses carried out to answer the research
questions. These stages served to answer three speci c research questions:
1. What is the relationship between the 4-word lexical bundles identi ed in the textbook under
investigation and corpus-research ndings in terms their frequency and range?
2. How do the 4-word lexical bundles presented in the textbook re6ect corpus-research ndings in
terms of their functions?
3. What is the relationship between the usefulness and functions of the 4-word lexical bundles in the
textbook?
A key assumption underlying these questions is that textbooks intended for general language purposes re6ect
frequently occurring 4-word lexical bundles in corpora collected from naturally occurring language.
Table 2
Functions of Bundles According to Conrad and Biber (2004)
______________________________________________________________________________
Function Stance expressions Epistemic
___________________________________________
Attitudinal/modality desire;
obligation/directive;
intention/prediction;
ability
______________________________________________________________________________
Discourse organizers Topic
Introduction/focus
______________________________________________________________________________
Referential expressions Identi cation/focus;
imprecision;
speci cation of attributes;
multifunctional;
______________________________________________________________________________
Special conversational Politeness routines;
simple inquiry;
reporting clause.
______________________________________________________________________________
Findings
The ndings from the analyses described above will be presented to respond to each of the three research
questions. The rst question What is the relationship between the 4-word lexical bundles identi?ed in the textbook under
investigation and corpus-research ?ndings in terms their frequency and range? is addressed through the usefulness score. This
score, representing frequency and range, was determined based on information from COCA and BNC and
shows that the 169 4-word bundles vary in usefulness between a high of 93.78 and a low of 0, with an average
usefulness score for all the items of 4.4. Nineteen of the 169 lexical bundles identi ed in the textbook, 11.2% of
the total number of 4-word bundles, reach a usefulness score over 10, as shown in Table 3.
A total of 20 (11.8%) of the 169 4-word bundles in General English have usefulness scores of zero,
indicating that they did not occur in either COCA or BNC, while another 88 (52%) 4-word bundles in General
English have usefulness scores between 0.005 and 0.995. In addition, 13 (7.7%) of the 4-word bundles have a raw
frequency of one to four occurrences in both corpora, or one to four occurrences in one corpus and zero
occurrences in the other. For example, the bundle it is considered bad has zero occurrences in COCA and two in
BNC. The limited number of 4-word bundles with high usefulness scores, the low average usefulness score and
large percentage of items with zero usefulness scores suggest that the 4-word bundles included in the textbook
have comparatively low range and frequency in everyday language as re6ected in COCA and BNC.
Table 3
Four-word Bundles with Usefulness Scores of over 10
Item Frequency per Frequency per Usefulness score
million words in million words in
COCA BNC
the end of the 83.24 104.32 93.78
at the end of 68.87 91.68 80.275
for the ?rst time 63.18 53.4 58.29
on the other hand 48.38 52.62 50.5
one of the most 54.18 40.49 47.335
in the middle of 48.51 28.07 38.29
the middle of the 31.58 22.11 26.845
was one of the 26.36 23.05 24.705
what do you think 30.9 12.4 21.65
the back of the 21.36 20.82 21.09
I’d like to 17.22 15.75 16.485
I was going to 18.39 10.8 14.595
do you want to 14.58 11.25 12.915
in one of the 13.52 12.1 12.81
from time to time 9.28 16.32 12.8
a member of the 23.7 0 11.85
a bit of a 7.63 15.74 11.685
what do you mean 11.8 9.72 10.76
a lot of money 13.02 7.87 10.445
To answer the second question asked in this study, How do the 4-word lexical bundles presented in the textbook reBect
corpus-research ?ndings in terms of their functions?, the 169 4-word bundles identi ed in the textbook examined were
analysed and sorted according to the different functions identi ed in Conrad and Biber (2004). The ndings
show that 55 (32.5%) of the 4-word bundles re6ect identi able functions, of which referential ones were the most
frequent, followed by stance, special conversational and, the least frequent, discourse organizer functions, as
summarized in Table 4.
The most frequent of the functions identi ed were referential expressions with 21 (12%) items, of which 6
(3.6%) items fall into the subcategory of identi cation/focus (e.g., in one of the, one of my best), 12 (7.1%) items
under speci cation of attributes (e.g., a bit of a, as soon as I), 3 (1.8%) items under multi-functional (e.g., the end
of the, and in the end), while no items were identi ed as part of imprecision subcategories. The second most
frequently occurring function in the textbook is that of stance expressions with 16 (9.5%) items, of which 9
(5.3%) are epistemic bundles (e.g., do you know if, I don’t know) and 7 (4.1%) attitudinal bundles (e.g., do you want to, I
was going to). The function for special conversational expressions included 14 (8.3%) items, 2 (1.2%) of which are
part of the subcategory of politeness routines (e.g., no thanks I’m), 12 (7.1%) items of simple inquiries (e.g., can you
tell me), none of reporting clauses. The least frequently identi ed category of functions, discourse organizers,
includes 4 (2.4%) items that belong to the topic elaboration/clari cation (e.g., on the other hand) and none for
introduction/focus subcategory.
Table 4
Summary of functions in General English textbook
Functions Number of Percentage
occurrences
Referential Total 21 12.0
Identi cation/focus 6 3.6
Speci cation of attributes 12 7.1
Multi-functional 3 1.8
Imprecision 0 0
Stance Total 16 9.5
Epistemic 9 5.3
Attitudinal 7 4.1
Special conversational Total 14 8.3
Politeness routines 2 1.2
Simple inquiries 12 7.1
Reporting clause 0 0
Discourse organizer Total 4 2.4
Topic elaboration/clari cation 4 2.4
Topic introduction/focus 0 0
No-function Total 144 66.9
Collocational phrases 14 8.3
Context speci c 28 16.6
No subcategory 71 42.0
The no-function category includes 144 (66.9%) 4-word bundles, a large enough category to suggest further
analysis. This analysis shows that 14 (8.3%) of these bundles belong to the collocational phrases (Conrad & Biber,
2004) subcategory (e.g., had a great time, o’clock in the morning), while an additional 28 (16.6%) no-function bundles
belong to the context speci c subcategory (e.g., lawyer of the defence, the docklands light railway ). The remaining 71
(42%) bundles could not be attributed to any subcategories re6ected in the literature (e.g., and there is a, I usually
have a). Figure 1 summarizes these ndings, re6ecting that all the functions and most of the subcategories
identi ed in Conrad and Biber were also identi ed in the textbook under analysis, but more than half of the 4-
word bundles in the textbook could not be attributed to any of the function categories identi ed.
The third stage of analyses was designed to answer the question What is the relationship between the usefulness and
functions of 4-word lexical bundles in the textbook? The analysis of the 4-word bundles that re6ect one of the functions
shows that their overall usefulness score is 11.9. A breakdown of the different function categories identi ed is
presented in Figure 2.
25
22.7
20
15.4
15 Stance expressions
Referenal expression
Discourse organizers
10 Special conversaonal
No-funcon
6.8
0.4 0.4
0
Functions
Figure 2. Usefulness Score of Functions
Figure 2 shows that the highest usefulness score, 22.7, was achieved by referential expressions, followed by
discourse organizers at 15.4, stance expressions at 6.8 and, special conversational expressions at 0.4. The
usefulness score of the no function expressions attained a usefulness score of 6.8, while the This overall usefulness
score includes the no-function expressions, which was calculated as 0.4.
The usefulness scores for each function’s subcategories were also calculated and are re6ected in Figure 3.
It shows that the sub-categories for the referential function achieved usefulness scores of 14.8 for
identi cation/focus, 22.3 for attributes, speci cation of attributes 22.3 40.9 for multi-functional expressions.
Within stance expressions, epistemic expressions reached a usefulness score of 5.9, while attitudinal expressions’
usefulness score reached 4.1. Discourse organizers include items from only the topic elaboration/clari cation
subcategory, which obtained an overall usefulness score of 15.4. Finally, the subcategories for the special
conversational function, politeness routines and simple inquiries, achieved usefulness scores of 0 and 0.5
respectively. The no-function subcategory of collocational phrases received a usefulness score of 1.2, the context
speci c subcategory 0.6 and the uncategorized group 0.6. The above ndings show that lexical bundles with
functions tend to have higher usefulness scores than those without functions. The most useful items identi ed in
the textbook corpus are part of referential expressions, followed by discourse organizers, stance expressions, with
special conversational functions showing the lowest usefulness scores. The most useful subcategory is that of
multi-functional expressions, the least useful the one covering politeness routines.
Discussion
Key ndings from this exploratory study of the usefulness and function of lexical bundles identi ed in a textbook
for general English language learners are discussed in the order of the speci c research questions raised. The rst
research question explored the level of usefulness of the 4-word bundles generated from the textbook. Usefulness
was determined through numeric scores developed in Koprowski (2005), scores comprised of frequency and
range data from COCA and the BNC. The ndings show a comparatively low level of usefulness of the analysed
items, determined by their low frequency of usage in various registers and text types re6ected in corpora from
general language use. The ndings in this study are consistent with the ndings reported in Koprowski and in
Cheng and Warren (2007), whose work also found low-frequency items and inconsistencies between the
vocabulary items included in teaching materials and those found in actual language use re6ected in corpus data.
The ndings also re6ect the observation other studies on teaching materials (Biber & Reppen, 2002; Gabrielatos,
2006; Koprowski, 2005; Lee & McGarrell, 2011; Meunier & Gouverneur, 2009; Shortall, 2007) that the
language presented in these materials do not closely match the language from naturally occurring language
re6ected in corpora. Although the stated purpose of the textbook analysed for the current study is to improve
students’ general English abilities, the ndings suggest that most of the lexical bundles included have highly
limited usage in general communication contexts. This lack of convergence between textbook and corpus
material suggests that the textbook developers may have relied on intuition in the selection of material, as
discussed in Biber and Reppen (2002) and Lee and McGarrell (2011) rather than actual data sources, or that the
selection criteria used were unable to identify material representative of general language use.
The second research question investigated in the current study examined the functions, as de ned by
Conrad and Biber (2004), of the 4-word lexical bundles identi ed in the textbook. The ndings show that over
65% of the lexical bundles do not fall within any identi able function. This may, in part, be due to the low
frequency of lexical bundles with at least three occurrences in the textbook, suggesting that the textbook lacks the
kind of repetition typically needed for language development. The most frequently identi ed function of the
textbook bundles was referential, followed by stance, special conversational and, least frequent, discourse
organizing. The nding that referential bundles are the most frequently occurring in the textbook under
discussion re6ects an academic in6uence in its language focus, as shown in previous research. Wood (2013), in an
analysis of business and engineering textbooks, showed that the referential function was the most frequently
occurring in those academically oriented textbooks. Wood’s study shows discourse organizing and stance bundles
as the second, respectively third most frequently used functions. Similarly, Conrad and Biber showed that
referential bundles are more common in academic prose compared to other language uses. The objectives of the
textbook and the focus on academic language again suggests a misalignment between the two. The second most
frequent function of lexical bundles in the current study, stance, shows that the textbook also focuses on
conversation and speaking registers, but to a lesser extent. These functions are associated with more informal
language use, as indicated in Conrad and Biber. The latter’s investigation of bundles from conversation and
academic prose discovered that stance and special conversational bundles are more frequent in conversation. In
light of these ndings, the textbook under discussion thus presents a mix of academic and conversational
registers. This mix, combined with the relatively low recurrence of bundles, may prevent learners from
encountering relevant functions in suf cient numbers for each register to internalise them successfully. In turn,
this may impede register appropriate production as the information available to them lacks clear distinctions of
function use in different registers.
The third research question investigated the relationship between usefulness and functions of the 4-word
bundles identi ed in the textbook. To address this question, each function and its subcategories were given an
overall usefulness score by averaging the usefulness scores of the items under the functions and their
subcategories. The ndings show that the lexical bundles with functions have a higher usefulness score compared
to those that cannot be attributed to any function. This nding is linked directly to past studies that have stressed
the importance of referring to frequency information on actual language use in teaching and materials
development (Biber & Reppen, 2002; Cheng & Warren, 2007; Koprowski, 2005, Lee & McGarrell, 2011). A
detailed analysis also shows that the referential function, which is typically associated with more formal and
academic language, has the highest usefulness score in the textbook under discussion. In addition, the second
stage of the analysis shows that referential bundles are also the most common type of bundles identi ed in the
current study. They include the items with the highest usefulness scores, such as the end of the (93.7), at the end of
(80.2), for the ?rst time (58.2). This nding suggests that the inclusion of referential bundles in teaching syllabi and
textbooks working on academic registers may be particularly valuable in support of language learners’ ability to
acquire native-like multiword expressions. The second most useful type of lexical bundle belongs to the discourse
organizing function. For example, the bundle on the other hand (50.5) was also shown as frequently occurring in
academic prose in Conrad and Biber (2004). Although the discourse organizing function is considered the least
common type of function, its high usefulness score suggests that it is a valuable item for inclusion in teaching
materials. The stance function, with noticeably lower scores, is third in terms of usefulness in the current study.
As the detailed presentation in the results section shows, the stance bundles in the textbook have low usefulness
scores, with a few exceptions such as what do you think (21.6). A careful analysis of corpus data may help materials
developers identify selections that are useful in terms of broad actual language use. The fourth function, the
special conversational function, has the lowest usefulness score, even though it was found to be more frequent
than the discourse organizing function. One explanation for this may be the range criteria imposed in
determining usefulness scores, i.e., the criteria that ensures that lexical items learned are useful in varied contexts.
As the special conversational function is expected to be used in the conversational register only, the range of
bundles in this category are, by de nition, limited. Usefulness scores are comprised of both frequency and range
data, thus such types of bundles will not yield high scores, even if the bundles are frequent in their register. A
textbook concentrating on conversational English might reasonably be expected to have many lexical bundles
that fall within the conversational function. Again, careful matching of corpus data in light of the purposes of a
given textbook would seem to be a key objective for materials designers. Finally, the lexical bundles with the
lowest usefulness scores, even though they account for over half the bundles identi ed in the textbook under
discussion, were those that fell within the no-function category. One potential explanation for the large number
of no-function bundles may be the subject matter around which the textbook presents language items, subject
matter that may be guided more by introspection and intuition rather than an analysis of general language needs
in relation to data of actual language use re6ected in corpus materials. Koprowski (2005:328) noticed a similar
outcome in his analysis of the usefulness of lexical phrases in contemporary textbooks and attributes such low
usefulness to “an unprincipled and careless selection process” by textbook developers. A selection process that,
Koprowski adds, is likely focussed around the selection of themes and topics rather than the usefulness of lexical
phrases.
identi cation and selection of lexical items for a textbook. As Timmis (2013) points out, corpora are less
appropriate as arbiters of what to teach and how to teach, but they are valuable in re6ecting details about the
nature of language and language use. In the case of general English, corpora suggest a considerable difference
between corpora and the lexical bundles and functions presented in the textbook. A broader question is the
relationship between corpora, which typically include long passages of texts, with typical textbooks and their
short, often unconnected texts representing different genres. This question has been addressed in part by studies
that highlight differences in the use of lexical bundles depending on genre (e.g., Biber, 2006; Hyland, 2008). In
the case of a textbook, one question is whether such learning and teaching materials might be developed to
include relevant, engaging topics that serve to illustrate language that is truly general and widely used. The
inclusion of frequently recurring lexical bundles is particularly important as research shows that even advanced
learners of ESL have dif culties in producing texts that re6ect native speaker usage (Grami & Alkazemi, 2016).
Yet pedagogical materials rarely include activities or instructions on which words go together (Alali & Schmitt,
2012). Increased attention to careful selection of lexical strings that re6ect actual language use re6ected in
relevant corpora can only support the challenging task of developing vocabulary skills, which include appropriate
use of lexical bundles. Combined with explorations into ways in which the acquisition of lexical strings might be
facilitated in ESL classes, as illustrated in Jones and Haywood (2004) and more recently AlHassan (2016) and
AlHassan & Wood (2016), promise to further support the very challenging task of supporting vocabulary
development in subsequent language learning.
References
Alali, F., & Schmitt, N. (2012). Teaching formulaic sequences: The same or different from teaching single words?
TESOL Journal, 3(2), 153–180.
AlHassan, L. (2016). Learning all the parts of the puzzle: Focused instruction of formulaic sequences through the
lens of activity theory. In H.M. McGarrell & D. Wood (eds.). Contact - Refereed Proceedings of TESL Ontario
Research Symposium, 42(2), 44-65. Available at: http://www.teslontario.net/publication/research-
symposium
AlHassan, L. & Wood, D. (2015). The effectiveness of focused instruction of formulaic sequences in augmenting
L2 learners' academic writing skills: A quantitative research study. Journal of English for Academic Purposes,
17, 51-62.
Appel, R. (2016). Lexical bundles in L2 English academic writing: Pro ciency level differences. In H.M.
McGarrell & D. Wood (eds.). Contact - Refereed Proceedings of TESL Ontario Research Symposium, 42(2), 66-81.
Available at: http://www.teslontario.net/publication/research-symposium
Ari, O. (2006). Review of three software programs designed to identify lexical bundles. Language Learning &
Technology, 10(1), 30-37.
Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam: Benjamin.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English.
London, UK: Longman.
Biber, D., & Reppen, R. (2002). What does frequency have to do with grammar teaching? Studies in Second
Language Acquisition, 24, 199–208.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks.
Applied Linguistics, 2(3), 371-405.
Burton, G. (2012). Corpora and coursebooks: destined to be strangers forever? Corpora, 7(1), 91-108.
Byrd, P., & Coxhead, A. (2010). On the other hand: Lexical bundles in academic writing and in the teaching of
EAP. University of Sydney Papers in TESOL, 5, 31-64. Available at:
http://faculty.edfac.usyd.edu.au/projects/usp_in_tesol/pdf/volume05/Article02.pdf
Carter, R., & McCarthy, M. (1995). Grammar and the spoken language. Applied Linguistics, 16(2), 141-158.
Cheng, W., & Warren, M. (2007). Checking understandings: Comparing textbooks and a corpus of spoken
English in Hong Kong. Language Awareness, 16(3), 190-207.
Conrad, S., & Biber, D. (2004). The frequency and use of lexical bundles in conversation and academic prose.
Lexicographica, 20, 56-71.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and
biology. English for Speci?c Purposes, 23, 397-423.
Csomay, E. (2013). Lexical bundles in discourse structure: A corpus-based study of classroom discourse. Applied
Linguistics, 34(3), 369-388.
Davies, M. (2008-) The Corpus of Contemporary American English: 520 million words, 1990-present. Available online at
http://corpus.byu.edu/
Fletcher, W. (2012). kfNgram: Information and help. Available at:
http://www.kwic nder.com/kfNgram/kfNgramHelp.html
Gabrielatos, C. (2006). Corpus-based evaluation of pedagogical materials: If-conditionals in ELT coursebooks
and the BNC. In: 7th Teaching and Language Corpora Conference. Available online at
http://eprints.lancs.ac.uk/882/
Grami, G., & Alkazemi, B.Y. (2016). Improving ESL writing using an online formulaic sequence word-
combination checker. Journal of Computer Assisted Learning, 82, 95–104.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Speci?c Purposes, 27, 4–21.
Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sentences: An exploratory study in an
EAP context. In N. Schmitt (Ed.), Formulaic sequences (pp. 269-291). Amsterdam, Netherlands: John
Benjamins.
Koprowski, M. (2005). Investigating the usefulness of lexical phrases in contemporary coursebooks. ELT Journal,
59, 322–332.
Latham-Koenig, C., & Oxenden, C. (2013). English ?le: Intermediate student’s book. Oxford, UK: Oxford University
Press.
Lee, D., & McGarrell, H. (2011). Corpus-based/corpus-informed English language learner grammar textbooks:
An example of how research informs pedagogy. In H.M. McGarrell & D. Wood (Eds.). Contact - Refereed
Proceedings of TESL Ontario Research Symposium, 37(2), 78–100. Available at:
http://www.teslontario.net/publication/research-symposium
McCarten, J. (2010). Corpus-informed course book design. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge
Handbook of Corpus Linguistics (pp. 413–427). London, UK: Routledge.
McCarthy, M. (2008). Accessing and interpreting corpus information in the teacher education context. Language
Teaching, 41(4), 563-574.
McDonough, J., Shaw, C., & Mashuhara, H. (2012). Materials and methods in ELT : a teacher's guide. Malden, MA :
Blackwell.
Meunier, F., & Gouverneur C. (2009). New types of corpora for new educational challenges: Collecting,
annotating and exploiting a corpus of textbook material. In K. Aijmer (Ed.), Corpora and Language
Teaching, (pp. 179-201). Amsterdam & Philadelphia: John Benjamins.
Nation, P. (2001). Learning vocabulary in another language. Cambridge, UK: Cambridge University Press.
Neary-Sundquist, C.A. (2015). Aspects of Vocabulary Knowledge in German Textbooks. Foreign Language Annals,
48(1), 68–81.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action: An introduction. In N. Schmitt (Ed.), Formulaic
sequences: Acquisition, processing and use (pp. 1–22). Amsterdam, Netherlands: John Benjamins.
Shortall, T. (2007). The L2 syllabus: Corpus or contrivance? Corpora, 2(2), 157-185.
Timmis, I. (2013). Corpora and materials: Towards a working relationship. In B. Tomlinson (Ed.), Developing
materials for language teaching (2nd ed.) (pp. 461-474). London, UK: Bloomsbury Academic.
Wood, D., & Appel, R. (2013). Formulaic sequences in rst year university business and engineering textbooks: A
resource for EAP. In H.M. McGarrell & D. Wood (eds.). Contact - Refereed Proceedings of TESL Ontario Research
Symposium, 39(2), 92-102. Available at: http://www.teslontario.net/publication/research-symposium
Wood, D. (2010). Formulaic language and second language speech Buency: Background, evidence and classroom applications.
Bloomsbury Publishing.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge, UK: Cambridge University Press.
Betsy Quero*
Victoria University of Wellington, New Zealand
Abstract
The main goal of this study is to report on the number of words (vocabulary load) native and non-native readers of medical
textbooks written in English need to know in order to be able to meet the lexical demands of this type of subject-speci!c
(medical) texts. For estimating the vocabulary load of medical textbooks, a corpus comparison approach and some existing
word lists, popular in ESP and EAP, were used. The present investigation aims to answer the following questions: (1) How
many words are needed beyond the General Service List (GSL; West, 1953), the Academic Word List (AWL; Coxhead,
2000), and the EAP Science List (Coxhead and Hirsh, 2007) to achieve a good lexical text coverage? and (2) What is the
vocabulary load of medical textbooks written in English? The implementation of this corpus comparison approach
consisted of: (1) making a written medical corpus of 5.4 million tokens, (2) compiling a general written corpus of the same
size (5.4 million tokens), (3) running both corpora (i.e., the medical and general) through some existing word lists (i.e., the
GSL, the AWL, and the EAP Science List), and (4) creating new subject-speci!c (medical) word lists beyond the existing
word lists used. The system for identifying medical words was based on Chung and Nation’s (2003) criteria for classifying
specialised vocabulary. The results of this investigation showed that there is a large number of subject-speci!c (medical)
words in medical textbooks. For both native and non-native speakers of English training to be health professionals, this
!gure represents an enormous amount of vocabulary learning. This paper concludes by considering the value of creating
specialised medical word lists for research, teaching and testing purposes.
Key words: medical word lists, vocabulary load, English for medical purposes, text coverage.
Introduction
One of the main purposes of this study is to propose a methodology for the creation of subject-speci!c word lists
(i.e., medical word lists) that include the most salient vocabulary in medical texts. After doing a review of the
previous studies on the vocabulary load of medical textbooks, explaining the methodology and presenting the
subject speci!c lists of the most relevant words in medical texts, the results of this investigation attempt to : (1)
identify the lexical demands of medical texts using a corpus comparison approach, and (2) provide guidelines for
the creation of medical word lists organised by levels of frequency and salience.
Vocabulary Load
The number of known words (vocabulary load) needed for unassisted reading comprehension has been
investigated by several vocabulary researchers (Hirsh & Nation, 1992; Hu & Nation, 2000; Laufer, 1989; Nation,
2006). The !rst investigations (Laufer, 1989, 1992) on the vocabulary load of academic texts suggested a reading
* Tel: + 64 2102387831; E-mail: betsy.quero@vuw.ac.nz; PO Box 14416 Kilbirnie, Wellington 6241, New Zealand
comprehension threshold of 95% text coverage. More recent research on the vocabulary load of written texts
(Hu and Nation 2000; Laufer and Ravenhorst-Kalovski 2010; Nation 2006; Schmitt, Jiang, and Grabe 2011) has
indicated that a higher lexical threshold of 98% text coverage or more is required for optimal unassisted reading
comprehension. In the present study, we explore the number of words required to be known to achieve a 98%
text coverage, and refer to 98% as an optimal lexical threshold.
Levels of Vocabulary
In order to estimate the number of words (vocabulary load) that learners of English for Medical Purposes (EMP)
need to know in order to be able to meet the vocabulary demands of medical texts written in English and achieve
a suitable reading comprehension threshold (i.e., between 95% and 98% text coverage); the various levels of
vocabulary proposed by Schmitt and Schmitt (2012) and Nation (2001, 2013) will be identi!ed in the corpus of
medical textbooks compiled for this study. Frequency (high-frequency, mid-frequency, and low-frequency words),
and text type (i.e., general, academic, scienti!c, technical or specialised) are the two main criteria currently used
to classify the vocabulary of academic and specialised texts.
Schmitt and Schmitt’s (2012) classi!cation of the levels of vocabulary is a frequency-based one, and
consists of the following three bands or levels: high-frequency, mid-frequency, and low-frequency words. The
high-frequency level includes the !rst 3,000 most frequent words in a language. The mid-frequency level refers to
those words between the 4,000 and the 9,000 frequency levels. The low-frequency level comprises those words
beyond the 9,000 frequency band. The concept of mid-frequency vocabulary was !rst introduced in Schmitt and
Schmitt’s (2012) classi!cation. The introduction of this frequency level has served to stress the importance of
mid-frequency vocabulary and of words beyond the 3,000 most frequent words of the English language.
Nation’s (2013) classi!cation, which was initially presented in 2001 and then revised in 2013, is both a
frequency and text-type based classi!cation. Nation’s (2001) frequency levels included two frequency bands (i.e.,
high-frequency vocabulary and low-frequency vocabulary) and two kinds of text type words (academic
vocabulary and technical vocabulary). In 2013 Nation added to his classi!cation of vocabulary levels the mid-
frequency band proposed by Schmitt and Schmitt in 2012. According to Nation (2013), there are three levels of
frequency based words, that is, high-frequency words, mid-frequency words and low-frequency words, and two
levels of text-type words (academic words and technical words) which are particularly likely to occur in academic
and specialised texts. Both the frequency and text-type based aspects of Nation’s (2013) classi!cation are analysed
and discussed in the !ndings and discussion sections of this study.
GSL has received over the years, this is the general word list used in this study to replicate the corpus comparison
approach. The GSL is used in this investigation in order to: (1) serve as a starting point when estimating the
vocabulary load of medical texts, and (2) allow comparisons with previous studies in ESP that have also used the
GSL to look at the number of words in the health and medical sciences.
The other existing word list used in the present study is Coxhead’s (2000) Academic Word List (AWL). The
AWL works in conjunction with the GSL. That is, it includes words that do not occur in the GSL. Up to the
present, the AWL has been extensively used to learn, teach, and research academic vocabulary. To make the
AWL, Coxhead (2000) gathered a corpus of 3,513,330 tokens. This corpus was comprised of a variety of
academic texts from 28 academic subject areas, seven of which were grouped into one of the following four
disciplines: Arts, Commerce, Law, and Science. The AWL contains 570 word families and provides around a
10% text coverage for academic texts. For validating the AWL, Coxhead (2000) created a second academic
corpus (comprising 678,000 tokens) which accounted for 8.5% coverage.
Two new academic word lists have been recently developed: (1) The New Academic Word List (NAWL)
created by Browne, Culligan, and Phillips in 2013 and available at http://www.newacademicwordlist.org/, and
(2) The New Academic Vocabulary List (AVL) created by Gardner and Davies (2014) and available at
http://www.academicvocabulary.info/download.asp. Both the NAWL and the AVL were developed from large
academic corpora of 288 and 120 million tokens, respectively. Despite the current availability of these more
recently developed academic word lists (i.e., the NAWL and the AVL), the decision to use Coxhead’s (2000) AWL
for the present study is based on the fact that for more than a decade the AWL has been widely researched and
used by ESP researchers to calculate the lexical demands posed by written academic texts.
Drawing on some aspects of the methodology used by Coxhead (2000) to create the AWL, various subject-
speci!c word lists have been developed: an EAP Science Word List (Coxhead & Hirsh, 2007), three medical
academic word lists (Chen & Ge, 2007; Lei & Liu, 2016; Wang, Liang, & Ge, 2008), a nursing word list (Yang,
2015) a pharmacology word list (Fraser, 2007), some engineering word lists (Mudraya, 2006; Ward, 1999, 2009),
a business word list (Konstantakis, 2007), and an agricultural word list (Martínez, Beck, & Panza, 2009). While
some of these subject-speci!c lists have been developed to work in conjunction the GSL (e.g., Yang’s (2015)
Nursing Word List, and Wang, Liang & Ge’s (2008) Medical Academic Word List), other word lists have been
created to work in conjunction with both the GSL and AWL (e.g., Coxhead and Hirsh’s (2007) EAP Science List,
and Fraser’s (2007) Pharmacology Word List).
Coxhead and Hirsh’s (2007) EAP Science List is another existing word list used in the present study to
estimate the vocabulary load of medical textbooks. Coxhead and Hirsh’s (2007) study aims at creating a science
word list that could help increase the lower coverage of the AWL over science texts (Coxhead, 2000). Criteria of
range, frequency of occurrence, and dispersion were considered for selecting the words to be added to the EAP
Science List. This list is based on a written science corpus of English comprising a total of 2,637,226 tokens. As
Coxhead and Hirsh (2007, p. 72) reported, the 318 word families in the EAP Science List cover 3.79% over the
science corpus compiled to create this list. Moreover, the EAP Science list covers 0.61% over the Arts subcorpus,
0.54% over the Commerce subcorpus, 0.34% over the Law subcorpus, and 0.27% over the !ction corpus
compiled by Coxhead (2000). The above mentioned coverage results con!rm the scienti!c nature of the EAP
Science List. Coxhead and Hirsh’s (2007) study also attempts to draw a line between the percentage of general
vocabulary versus the percentage of science-speci!c vocabulary in science texts written in English that EAP
students are required to read at university. In addition to the GSL and the AWL, Coxhead and Hirsh’s (2007)
EAP Science List is used in the present investigation when adopting the corpus comparison approach to estimate
the vocabulary load of medical textbooks.
Since the present study focuses on investigating the vocabulary load of the most commonly used existing
general, academic and scienti!c word lists, these lists are used as the starting point to estimate the lexical
coverage of medical texts. By choosing a set of commonly used general/academic/scienti!c word lists, this study
tries to focus on general/academic/scienti!c vocabulary that has extensively been presented in EAP and ESP
teaching materials, assessments, and research. However, this investigation by no means attempts to undermine
the value of more recently created general (i.e., the two NGSLs) and academic (i.e., the NAWL and the AVL)
word lists. Also, to the best of our knowledge, no study has so far estimated the vocabulary load of medical
textbooks having as a starting point for this quanti!cation this set of widely used word lists (i.e., the GSL, the
AWL, and the EAP Science List) in EAP and ESP.
Moreover, existing pedagogical vocabulary lists of general high-frequency words (West’s GSL) and
academic words (Coxhead’s AWL), and scienti!c words (Coxhead and Hirsh’s EAP Science List) cannot provide
a complete coverage of the kinds of vocabulary in subject-speci!c texts. This happens particularly because the
GSL, the AWL and the EAP Science List were not designed to identify all the different kinds of vocabulary of
specialised texts. For this reason, a more inclusive approach to identify the various levels of vocabulary that occur
in medical texts could provide a clearer picture of the vocabulary demands of medical textbooks.
Research Questions
The present investigation looks at the vocabulary load of medical texts and explores the role played by the levels
of vocabulary proposed by Nation (2013) and Schmitt and Schmitt (2012). In particular, the three frequency-
based levels of vocabulary (high, mid, and low-frequency words) and four topic-based word lists (the GSL, the
AWL, the EAP Science List, and some specialised medical lists) that draw on words from these three frequency
levels were used in the analyses of the lexical frequency pro!les of medical texts here investigated. With the main
goal of estimating the vocabulary load of medical textbooks in mind, the !ndings of this study provide answers
to the following research questions:
1) How many words are needed beyond the General Service List (GSL; West, 1953), the Academic Word
List (AWL; Coxhead, 2000), and the EAP Science List (Coxhead and Hirsh, 2007) to achieve a good
lexical text coverage?
2) What is the vocabulary load of medical textbooks written in English?
Methodology
The methodology used to estimate the number of words (vocabulary load) associated with the various levels of
vocabulary found in a corpus of medical textbooks is discussed in this section. The implementation of this
methodology involves compiling the medical and general corpora, adopting a corpus comparison approach,
adapting a semantic rating scale, creating a series of medical word lists, and justifying the unit of counting
selected for the present study.
corpus comparison. General words referring to abbreviations, living organisms, parts of the body, participants in
the health and medical community were classi!ed as medical words. The manual checking of all the word types
(including content words, abbreviations, acronyms and proper nouns) classi!ed using the semantic rating scale
involved: (1) looking up word types with unclear medical meaning in a specialised medical dictionary and (2)
con!rming the medical senses of these in their actual context of occurrence in the medical corpus.
Results
How many words are needed beyond the General Service List (GSL; West, 1953), the Academic
Word List (AWL; Coxhead, 2000), and the EAP Science List (Coxhead and Hirsh, 2007) to achieve
a good lexical text coverage?
This question is answered by presenting the cumulative text coverage results of running three sets of word lists:
(Set 1) the GSL, AWL, EAP Science List, (Set 2) the three 1000 MGEN lists, and (Set 3) the twenty-three MED
lists through the medical corpus using the Range software (Heatley et al., 2002). First, the cumulative coverage
of the GSL1 and GSL2, the AWL and the EAP Science List, and the words outside these lists is presented in
Table 1. Then, the cumulative text coverage of these three sets of word lists is summarised in Table 2.
Table 1 suggests that a 22.12% of the words outside the lists (i.e., the GSL1 and GSL2, AWL, and EAP
Science List) is still needed to achieve an optimal lexical threshold of 98% (i.e., 75.88% coverage of word types
in the lists plus 22.12% coverage of word types outside the lists). In order to !nd out how many more word types
are required beyond the four existing word lists summarised in Table 1, we applied the semantic rating scale
described in the methodology section of the present study. This rating scale served as a semantic checking system
to classify over 30,000 medical word types (see Quero, 2015) occurring in the medical corpus and create the 26
medical word lists whose text coverage results are summarised in Table 2.
Table 1
Cumulative Coverage of the GSL1 and GSL2, the AWL and the EAP Science List over the Medical Corpus including the Words
outside the Lists
Word List Coverage % Number of Word Types
GSL1, GSL2, AWL, EAP Science List 75.88 9,412
Words outside the lists 24.12 45,942
Total 100.00 55,354
Table 2
Cumulative Coverage of the GSL, the AWL, the EAP Science List, the three 1,000 MEDGEN Lists, and the Twenty-three 1,000
MED Lists
Word List Number of Tokens Coverage % Number of Word Types
GSL1, GSL2, AWL, EAP Science List 4,121,539 75.88 9,412
MGEN (three 1,000) lists 607,498 11.18 3,000
MED (twenty-three 1,000) lists 542,747 10.00 23,000
Cumulative total of existing lists 5,271,784 97.06 35,414
Note in Table 2 that the cumulative text coverage of the GSL1, GSL2, AWL and EAP Science List
(75.88%) indicates that an additional 21.18% coverage is required to achieve a 97.06% text coverage. Moreover,
the results in Table 2 show that 26,000 new medical word types (i.e., 3,000 medical word types in the MGEN
lists, and 23,000 medical word types in the MED lists) need to be added to the GSL, AWL, and EAP Science List
for readers of medical texts to be able to understand 97.06% of the words they meet when they read medical
textbooks in English.
Table 3
Coverage of the GSL1 and GSL2, the AWL and the EAP Science List over the Medical Corpus
Word List Coverage % Number of Word Types
GSL1 55.62 3,291
GSL2 5.97 2,415
AWL 8.23 2,418
EAP Science List 6.06 1,288
Cumulative total 75.88 9,414
In relation to the coverage of the AWL over medical texts, Table 3 shows that the AWL accounts for
8.23% of the 5.4 million tokens of the medical corpus. 527 of the 3,107 word types in the AWL were identi!ed
as medical words. Examples of medical words in the AWL include depression, labour, and topical. When compared
with the coverage of the GSL1 over medical texts, the 8.23% coverage of the AWL seems a good coverage of
academic words over medicine. Since the lexical coverage by the AWL is 2.26% higher than that of the GSL2,
these coverage results suggest that it may be more useful for ESP medical students to start learning the AWL
right after they have acquired the words in the GSL1. The AWL is a particularly useful word list to learn when
ESP medical students need to focus on academic words. For this reason, the AWL is a helpful list for medical
students taking !rst year ESP reading courses.
As also indicated in Table 3, the high coverage of the EAP Science List over medicine (6.06%), when
compared with the coverage of the GSL1, GSL2, and the AWL over medical texts, shows that EAP Science List
plays an important complementary role in helping ESP medical students become familiar with scienti!c words
that occur in texts of health and medicine (see Coxhead & Quero, 2015, for further discussion on the behaviour
of the EAP Science List over medical texts). Examples of some scienti!c words with a medical meaning in the
EAP Science List are cell, anatomy, and digest. These results also suggest that the EAP Science List is of particular
interest to science and medical students rather than to learners of general English. Additionally, the lexical
coverage results of the GSL, AWL and EAP Science List over the medical corpus suggest that the learning of
high frequency general, academic and scienti!c words in English could be sequenced differently for ESP medical
students.
Table 4
Cumulative Coverage of the Three 1,000 MGEN Lists
Word List Coverage % Number of Word Types
MGEN1 8.49 1,000
MGEN2 1.82 1,000
MGEN3 0.87 1,000
Cumulative total 11.18 3,000
Let us now look at the text coverage of the new medical word lists (i.e., the three 1,000 MGEN lists). These
3,000 medical word types are divided into three 1,000 word lists and referred to as MGEN1, MGEN2, and
MGEN3 in Table 4. Examples of medical words in the MGEN lists are syndromes, radiologist, and anatomical. Note
also in Table 4 that the three 1,000 MGEN lists provide a coverage of 11.18%. This means that the GSL, AWL,
EAP Science List and the three MGEN list together cover 87.06% (i.e., 75.88% for the GSL, AWL and EAP
Science List, and 11.18% for the three MGEN lists) of medical texts. This cumulative coverage of 87.06%
indicates that a 10.94% coverage is still needed to reach an optimal lexical threshold of 98%.
Table 5 gives the coverage details of the twenty-three frequency-ranked 1,000 MED word lists that are
unique to the medical corpus. As can be observed in Table 5, there is a large amount of low-frequency medical
words occurring in medical texts. Examples of medical words in the 23 MED lists are subcutaneously, polyarteritis,
and catarrhalis.
Table 5
Coverage of the Twenty-three 1,000 MED Lists
Word List Coverage % Number of
Word Types
MED1 5.16 1,000
MED2 1.46 1,000
MED3 0.82 1,000
MED4 0.54 1,000
MED5 0.39 1,000
MED6 0.30 1,000
MED7 0.23 1,000
MED8 0.18 1,000
MED9 0.15 1,000
MED10 0.12 1,000
MED11 0.10 1,000
MED12 0.09 1,000
MED13 0.07 1,000
MED14 0.06 1,000
MED15 0.06 1,000
MED16 0.05 1,000
MED17 0.04 1,000
MED18 0.04 1,000
MED19 0.04 1,000
MED20 0.04 1,000
MED21 0.02 1,000
MED22 0.02 1,000
MED23 0.02 1,000
Cumulative 10.00 23,000
total
Table 6 shows that 2.94% of the tokens and 19,942 word types occur in the medical corpus but not in the 30
existing word lists. These words outside the lists include single letters of the alphabet or roman numerals,
marginal medical words (e.g., chap an abbreviation of chapter), pre!xes (e.g., non-, and micro-), and low-frequency
medical words (e.g., encephalographic, and haematologist).
Table 6
Coverage of the GSL, the AWL, the EAP Science List, the Three 1,000 MEDGEN Lists, and the Twenty-three 1,000 MED
Lists Including Words outside the Existing Lists
Word List Number of Tokens Coverage % Number of Word Types
Cumulative total of existing lists 5,271,784 97.06 35,414
Words outside the lists 159,956 2.94 19,942
Total 5,431,740 100.00 55,354
The cumulative coverage of all the 30 existing lists (i.e., the GSL, AWL, EAP Science List, and the three
MGEN lists, and the twenty-three MED lists) and words outside these lists is compared in Table 6. The results in
Table 6 show that if readers of medical texts want to get closer to a 98% text coverage over medical texts, a large
number of the 19,942 word types left outside all these 30 word lists are required to achieve a 98% coverage.
Based on the cumulative total coverage (97.06%) of the word lists shown in Table 6, we conclude that at least
twenty-two 1,000 low-frequency medical word lists would need to be added to these already existing 30 lists to
increase the text coverage from 97.06% to 97.50% and start getting closer to 98% (the optimal lexical threshold).
Another way to get closer to 98% with a smaller amount of word types could be to add word lists of high and
mid-frequency words with general academic meaning that, for different reasons, are not included in the existing
general, academic, and scienti!c word lists (i.e., the GSL, AWL, EAP Science List) used as part of the present
investigation. (See also Appendix A with text coverage and occurrence !gures of all the lists discussed in this
study).
Discussion
Next, we discuss the value of the twofold methodology here adopted for identifying medical words. This
discussion refers to the following aspects of the present study: (1) the semantic rating scale, (2) the size of the
corpus, (3) the corpus comparison approach, and (4) the new medical word lists.
corpus used for creating the lists. In spite of the usefulness of the semantic rating scale for making
decisions on the number of content area vocabulary found in medical texts, its implementation proved
to be a very demanding and time-consuming.
3. Medical corpus limited to textbooks. The medical texts included in the medical corpus compiled for the
present investigation was restricted to textbooks. For future research to estimate the vocabulary load of
medical texts, it would be worth including a variety of text types (such as medical articles in specialised
journals and scienti!c magazines, book chapters, technical reports, and laboratory manuals) when
creating a specialised corpus of medical texts written in English.
4. Pedagogical value of the medical word lists. The results of this investigation have shown that readers of
medical textbooks need to know about 26,000 medical word types beyond existing word lists – as
represented by the GSL, AWL, and EAP Science List, respectively – to be able to meet the lexical
demands of medical textbooks. As detailed in Appendix A, the pedagogical value of the last two-thirds
of the new medical word lists (i.e., around 16,000 medical word types needed for an additional 1%
cumulative text coverage) is questionable. The acquisition of 26,000 medical words is a vocabulary
learning goal that seems unrealistic to achieve in the restricted time span (one to two years at most) of
most English of Medical Purposes reading courses. The need to learn these 26,000 medical word
types clearly indicates that the technical vocabulary of medicine is very large and represents a major
learning burden for the students learning to read medical texts written in English.
Vocabulary expansion of medical terms should be an important goal for teachers of English for Medical
Purposes. In order to help ESP learners better cope with the lexical demands of medical texts and the large
number of medical words required to achieve an adequate lexical threshold, ESP teachers need to:
1. Design a lexical syllabus to teach the vocabulary learning strategies, such as guessing from context,
using mnemonic techniques, using word cards, doing extensive reading, that enable medical students to
cope with most of the new vocabulary independently.
2. Encourage learners to do extensive reading on topics that address the vocabulary they are trying to
learn.
3. Promote the use of genuine lexical contexts and provide authentic examples of medical vocabulary.
Examples of authentic reading materials for meeting and learning medical terms in context are
medical textbooks like those used to create the medical corpus mentioned in the present study.
4. Emphasise word relationships such as lexical bundles, word frequency, and phraseology.
5. Set ambitious vocabulary learning goals for your students of around 50 words per week.
6. Group the vocabulary that needs to be learnt in a manageable format (e.g., word family lists).
In conclusion, it is important to equip medical students in the ESP classes at university with the vocabulary
learning strategies necessary to manage the acquisition of the massive number of words required to achieve good
reading comprehension of medical texts written in English.
References
Brezina, V., & Gablasova, D. (2013). Is there a core general vocabulary? Introducing the new general service list.
Applied Linguistics, 1–13. https://doi.org/10.1093/applin/amt018
Browne, C. (2013). The new general service list: Celebrating 60 years of vocabulary learning. The Language Teacher,
37(4), 13–16.
Chen, Q., & Ge, G.-C. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL
word families in medical research articles (RAs). English for Speci?c Purposes, 26(4), 502–514.
Chung, T. M., & Nation, I. S. P. (2003). Technical vocabulary in specialised texts. Reading in a Foreign Language,
15(2), 103–116.
Chung, T. M., & Nation, I. S. P. (2004). Identifying technical vocabulary. System, 32(2), 251–263.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.
Coxhead, A., & Hirsh, D. (2007). A pilot science word list for EAP. Revue Francaise de Linguistique Appliqueé, 7(2), 65–
78.
Coxhead, A., & Quero, B. (2015). Investigating a Science Vocabulary List in university medical textbooks.
TESOLANZ Journal, 23, 55–65.
Engels, L. K. (1968). The fallacy of word-counts. IRAL - International Review of Applied Linguistics in Language
Teaching, 6(3), 213–231. https://doi.org/10.1515/iral.1968.6.1-4.213
Fauci, A. S., Braunwald, E., Kasper, D. L., Hauser, S. L., Longo, D. L., Jameson, J. L., & Loscalzo, J. (2008).
Harrison’s principles of internal medicine (17th Edition). New York: McGraw-Hill. Retrieved from
http://highered.mcgraw-hill.com/sites/0071466339/information_center_view0/table_of_contents.html
Fraser, S. (2005). The lexical characteristics of specialized texts. In K. Bradford-Watts, C. Ikeguchi, & M.
Swanson (Eds.), JALT2004 conference proceedings (pp. 318–327). Tokyo: JALT. Retrieved from http://jalt-
publications.org/archive/proceedings/2004/E115.pdf
Fraser, S. (2006). The nature and role of specialized vocabulary: What do ESP teachers and learners need to
know. Hiroshima Studies in Language and Language Education, 9, 63–75.
Fraser, S. (2007). Providing ESP Learners with the Vocabulary They Need: Corpora and the Creation of
Specialized Word Lists. Hiroshima Studies in Language and Language Education, 10, 127–145.
Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327.
https://doi.org/10.1093/applin/amt015
Goldman, L., & Ausiello, D. (Eds.). (2008). Cecil textbook of internal medicine (23rd edition). Philadelphia, PA: W.B.
Saunders Elsevier. Retrieved from http://www.us.elsevierhealth.com/cecil-medicine/goldman-cecil-
medicine-expert-consult/9781416028055/
Heatley, A., Nation, I. S. P., & Coxhead, A. (2002). Range [Computer software]. en, Wellington, New Zealand:
Victoria University of Wellington.
Hirsh, D., & Nation, I. S. P. (1992). What vocabulary size is needed to read unsimpli!ed texts for pleasure?
Reading in a Foreign Language, 8, 689–696.
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign
Language, 13(1), 403–30.
Hwang, K., & Nation, I. S. P. (1989). Reducing the vocabulary load and encouraging vocabulary learning
through reading newspapers. Reading in a Foreign Language, 6(1), 323–335.
Hyland, K., & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly, 41(2), 235–253.
https://doi.org/10.1002/j.1545-7249.2007.tb00058.x
Konstantakis, N. (2007). Creating a business word list for teaching business English. Elia, 7, 79–102.
Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? Special Language: From Humans
Thinking to Thinking Machines, 316–323.
Laufer, B. (1992). How much lexis is necessary for reading comprehension. In H. Béjoint & P. J. Arnaud (Eds.),
Vocabulary and applied linguistics (Vol. 3, pp. 126–132). London: Macmillan.
Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners’
vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30.
Lei, L., & Liu, D. (2016). A new medical academic word list: A corpus-based study with enhanced methodology.
Journal of English for Academic Purposes, 22, 42–53. https://doi.org/10.1016/j.jeap.2016.01.008
Martínez, I. A., Beck, S. C., & Panza, C. B. (2009). Academic vocabulary in agriculture research articles: A
corpus-based study. English for Speci?c Purposes, 28(3), 183–198.
Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Speci?c Purposes,
25(2), 235–256.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language
Review/La Revue Canadienne Des Langues Vivantes, 63(1), 59–82.
Nation, I. S. P. (2013). Learning vocabulary in another language (Second edition). Cambridge: Cambridge University
Press.
Nation, I. S. P. (2016). Making and Using Word Lists for Language Learning and Testing . Amsterdam: John Benjamins
Publishing Company. Retrieved from http://www.jbe-platform.com/content/books/9789027266279
Quero, B. (2015). Estimating the vocabulary size of L1 Spanish ESP learners and the vocabulary load of medical textbooks.
(Unpublished PhD thesis). Victoria University of Wellington, Wellington, New Zealand.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Read, J. (2007). Second language vocabulary assessment: Current practices and new directions. International
Journal of English Studies, 7(2), 105–125.
Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading
comprehension. The Modern Language Journal, 95(1), 26–43.
Schmitt, N., & Schmitt, D. (2012). A reassessment of frequency and vocabulary size in L2 vocabulary teaching.
Language Teaching, available on CJO2012. https://doi.org/10.1017/S0261444812000018
Wang, J., Liang, S., & Ge, G. (2008). Establishment of a medical academic word list. English for Speci?c Purposes,
27(4), 442–458.
Wang, K., & Nation, I. S. P. (2004). Word meaning in academic English: Homography in the Academic Word
List. Applied Linguistics, 25(3), 291–314. https://doi.org/10.1093/applin/25.3.291
Ward, J. (1999). How large a vocabulary do EAP engineering students need? Reading in a Foreign Language, 12(2),
309–324.
Ward, J. (2009). A basic engineering English word list for less pro!cient foundation engineering undergraduates.
English for Speci?c Purposes, 28(3), 170–182.
West, M. P. (1953). A general service list of English words. London: Longman.
Yang, M.-N. (2015). A nursing academic word list. English for Speci?c Purposes, 37, 27–38.
https://doi.org/10.1016/j.esp.2014.05.003
Appendix A
Text Coverage and Frequency of Occurrence of the Medical Corpus by the GSL, AWL, EAP Science List and the Twenty-Six
Medical Word Lists
Word List Tokens # Tokens % Types # Types % Families #
GSL1 3,021,029 55.62 3,291 5.95 981
GSL2 324,020 5.97 2,415 4.36 886
AWL 447,254 8.23 2,418 4.37 565
EAP Sc. List 329,236 6.06 1,288 2.33 316
MGEN1 461,169 8.49 1,000 1.81 n/a
MGEN2 98,853 1.82 1,000 1.81 n/a
MGEN3 47,476 0.87 1,000 1.81 n/a
MED1 280,114 5.16 1,000 1.81 n/a
MED2 79,208 1.46 1,000 1.81 n/a
MED3 44,413 0.82 1,000 1.81 n/a
MED4 29,254 0.54 1,000 1.81 n/a
MED5 21,085 0.39 1,000 1.81 n/a
MED6 16,127 0.30 1,000 1.81 n/a
MED7 12,593 0.23 1,000 1.81 n/a
MED8 10,018 0.18 1,000 1.81 n/a
MED9 8,168 0.15 1,000 1.81 n/a
MED10 6,635 0.12 1,000 1.81 n/a
MED11 5,546 0.10 1,000 1.81 n/a
MED12 4,773 0.09 1,000 1.81 n/a
MED13 4,000 0.07 1,000 1.81 n/a
MED14 3,502 0.06 1,000 1.81 n/a
MED15 3,000 0.06 1,000 1.81 n/a
MED16 2,978 0.05 1,000 1.81 n/a
MED17 2,000 0.04 1,000 1.81 n/a
MED18 2,000 0.04 1,000 1.81 n/a
MED19 2,000 0.04 1,000 1.81 n/a
MED20 2,000 0.04 1,000 1.81 n/a
MED21 1,333 0.02 1,000 1.81 n/a
MED22 1,000 0.02 1,000 1.81 n/a
MED23 1,000 0.02 1,000 1.81 n/a
Words outside 159,956 2.94 19,942 36.03 0
the lists
Total 5,431,740 100.00 55,354 100.00 2,748
Acknowledgements
I would like to thank Emeritus Professor Paul Nation and Dr. Averil Coxhead of Victoria University of
Wellington for their unfailing assistance and advice on an earlier version of this article. I am also grateful to the
two anonymous TESOL International Journal reviewers for their constructive critiques and comments that have
helped enhance the quality of this work.