Corpus Linguistics (CL) in The Design PDF

Frontiers of Language and Teaching …...……………………….
Volume 4 (2013)
Corpus Linguistics (CL) in the Design of English for Academic

Purposes (EAP) Courses
Mariana Mussetta and Andrea Vartalitis

Universidad Nacional de Villa María, Argentina
* Corresponding Author’s Email: marianamussetta@hotmail.com
Abstract
Reading comprehension of academic texts in a foreign language, especially in English, is a
growing need in the Latin American academic world, and CL has become a valuable tool in
the design of reading comprehension courses in English for Academic Purposes (EAP). This
research work presents the design of a reading comprehension course aimed at university
professors and researchers, and whose purpose is the development of reading skills in the
interpretation of international academic calls in English.
Keywords: Corpus Linguistics, Reading Comprehension, English for Academic Purposes

(EAP), Course Design
Introduction
Reading comprehension of academic texts in a foreign language, especially in English, is a
growing need in the Latin American academic world, and CL has become a valuable tool in
the design of reading comprehension courses in EAP. This research work presents the design
of a reading comprehension course aimed at university professors and researchers, and whose
purpose is the development of reading skills in the interpretation of international academic
calls of different kinds, for example calls for scholarships or grants, calls for academic
papers, calls for internships, and the like.
As a lingua franca in the academic world, English can represent an obstacle for those
professors and/or researchers who are not proficient in the use of this foreign language when
participating in international academic events, or when applying for scholarships, grants, or
internships.
Literature Review
Corpus Linguistics (CL) is the computer-based study of language relying on samples of
language use in its natural contexts (McEnery & Wilson, 1996). CL makes use of great
collections of oral and/or written texts (corpora) gathered according to specific criteria and
which contain millions of words. Such corpora can be electronically scanned (Biber et al,
2007) and allow for the application of a quantitative analysis in terms of the possibility of
studying the frequency with which a particular linguist phenomenon occurs. On the other
hand, CL also makes it possible to carry out a qualitative study: while the pattern association
counts represent a quantitative analysis, the interpretation of their numerous associations, that
is to say a functional approach, can also be carried out.
The relevance of using large corpora is linked to the fact that this allows the researcher to
observe the use of language in a great variety of different contexts not only at a micro level
but also at a macro level, within and throughout a variety of oral and/or written samples, as
well as throughout various texts, registers, and disciplines. Therefore, these corpora constitute
an important empirical reference of the descriptions that the researcher makes of the use of
language in natural contexts, as well as of the empirical patterns which contrast with and
usually contradict traditional descriptions of the organized nature of language (Boxer &
Pickering, 1995; McCarthy, 1998). For example, CL research work has demonstrated that
45
Frontiers of Language and Teaching …...………………………. Volume 4 (2013)
language follows, to a great extent, regular patterns of pre-structured chunks: what Sinclair
(1991) calls idiomatic principle, as opposed to an open selection principle, which refers to
combinations based on the selection of single words. One of the implications of the idiomatic
principle is that these pre-grouped phrases—and not isolated words—can and should be the
linguistic units of analysis. On the other hand, given the fact that the texts which make up a
corpus have been digitalized, complex calculations can be easily made to reveal recurrent
lexical and structural patterns, estimates which would manually demand great amounts of
time and effort. In this way, computer-based linguistic analysis has smoothed the process of
decision making in terms of what, where, and how to conduct a search since the computer
fulfills the task of the statistical treatment of data.
Moreover, this approach enables the researcher to quickly and effectively verify working
hypotheses, contributing to a greater objectivity in linguistic research studies. Beside its
application in the description of language use in natural contexts, this method has been
employed in different areas of study, including dictionary design and grammar reference
books, (Biber et al, 1999; Carter & McCarthy, 2006), literary criticism (Louw 2007), legal
discourse (Cotterill, 2004), advertising and media (O‘Keeffe, 2006), politics (Charteris-
Black, 2004), health discourse (Adolphs et al, 2004), and the design of ELT materials (Johns,
1991), among others.
In recent years, there has been a growing interest in the use of CL for the study of language
variations in specific academic and professional genres. From this perspective, the analysis of
academic genres—understood in terms of typification of rhetorical moves (Miller, 1984;
Bazerman, 1994; Berkenkotter & Huckin, 1995), or in terms of conventional and purpose-
oriented discursive social practices developed in stages (Martin et al, 1987; Martin, 1993), or
in terms of its communicative purpose (Swales, 1990; Bhatia, 1993)—can be considered in
general terms as the study of institutionalized linguistic behavior in academic and/or
professional contexts. For the purpose of such an analysis, Connor and Upton (1996) argue in
favor of the creation of specialized corpora saying:
While general corpora are important and provide a critical foundation for the study of
language structure and use, they are less conductive for analyzing language use in
specific academic and professional situations. Consequently, there is now a strong and
growing interest in compiling specialized corpora that focus on specific types of genres
within specific contexts. Instead of being compiled for representativeness of language
across a large number of communicative purposes, specialized corpora often focus on
one particular genre…or specific situation… (p. 2)
Therefore, specialized corpora have proven to be extremely useful in research works focused
on equally specific needs.
Methods
The present course aims at those professionals who, while lacking proficiency in this foreign
language, do often participate in conferences and scientific events, have published academic
papers, and are familiar with academic contexts in Spanish. Such background knowledge can
and should be profited from if we consider Rosenblatt’s (2004) transactional theory of
reading comprehension, which sustains that meaning is constructed in the reader’s mind by
means of a transactional process which involves the reader, the text, and the context, and
where the reader’s previous knowledge of the world merges with their linguistic knowledge.
Drawing from our target reader’s background knowledge, this course is designed to develop
the necessary reading comprehension strategies and skills in order to understand EFL texts in
academic calls.
46
Given the purpose of the course, the selection of the texts that comprise the specialized
corpus includes different genuine samples of calls for scholarships, grants and scientific
articles; guidelines for the presentation of reports, CV’s, and research projects; international
research program agreements; financial, academic and institutional reviews of twinning
mechanisms; guides for applicants, and the like. The corpus is made up of one million words
approximately, which is the recommended number of words for a reliable corpus,
representative of the textual types studied. Likewise, a specialized parallel corpus in Spanish
was designed, which was highly useful to compare and contrast the uses of certain
expressions and/or structures in the genre across the two languages, yielding a more accurate
analysis. Once all the digital texts were gathered, they were converted into plain text files
(.txt) so that they could be analyzed with the concordance program called AntConc.
AntConc is a freeware concordance program developed by Laurence Anthony of Waseda
University, Japan, and which can be run on Windows, Mac OS X, and Linux systems. The
program can generate Key Word in Context (KWIC), concordance lines and concordance
distribution plots, and it also has tools to analyze word clusters (lexical bundles), n-grams,
collocates, word frequencies, and keywords. AntConc facilitated the analysis of the linguistic
features of a great number of academic texts, which allowed for a purposeful selection of the
contents for the course, prioritizing the recurrent grammatical and lexical structures of these
texts. Furthermore, AntConc also allowed for easy-to-find relevant exemplification of those
contents with lexical items taken from their natural contexts.
Hypotheses about a number of specific features of the texts based on previous readings were
corroborated or refuted according to the empirical evidence that the computer-based
customized analysis of the corpus yielded. For example, the frequency with which a
particular lexical item or morpheme considered a key word was assessed using the program,
which not only provided the occurrence rate of the word but also gave an easy access to the
text of origin in each case, allowing for the analysis of the use by simply clicking on the link
(Fig. 1).
Figure 1
Figure 1 shows how, after running the program, AntConc has counted the number of times
the word appears in the selected corpus. The program displays concordance lines with the key
word in the middle of the line linked to its original text, whose name appears on the right.
47
This allows for an easy assessment of the frequency of the word as much as of its most
frequent collocations to the right and/or to the left, which constitutes the data of further
linguistic analysis of the term. Furthermore, the program gives the researcher the possibility
to customize both the search and the way in which the concordance lines should be displayed:
in alphabetical order applied to the first, second, third or fourth word to the right or to the left
of the key word (Fig. 2).
Figure 2
This function facilitates the analysis of different types of specific collocations. In the case
shown in Figure 2, the frequency of passive voice structure can be analyzed selecting the ed
ending to the right or to the left combined with the key word been. AntConc also informs
about the frequency of the words in the corpus chosen, and allows for easy access to valuable
information for the analysis of key words (Fig. 3).
Figure 3
48
Figure 3 clearly shows that in the corpus chosen for the example, research is the content
word with the highest rate of frequency in the texts: 1526 hits.
Findings
The above examples are some of the numerous possibilities of the different functions that this
program offers the researcher for the analysis of their corpora. In our case, these functions
also allowed for a purposeful search of relevant examples of the linguistic structures included
in the course. Furthermore, the computer-assisted linguistic analysis of our large specialized
corpus allowed us to reevaluate certain lexico-grammatical contents that have traditionally
been included in similar reading comprehension courses in English, prioritizing those which
proved to have the highest rate of frequency in the corpus and therefore being more relevant
given the objectives of the course. The appendix included shows Course Unit VIII to
illustrate how exemplification from the specialized corpora—selected by means of
Antconc—worked in the course units. In this case, Unit VIII introduces a number of modal
verbs and their use in the passive voice.
The computer-based treatment of large corpora that this program makes possible yielded data
that would have remained hidden had we conducted a more traditional processing of the data;
moreover, it would have taken longer without being as accurate or reliable. Furthermore, the
evidence gathered related to the frequency of certain structures in the academic texts of the
corpus contradicted the hypotheses that prescriptive traditional designs of courses of this type
have long supported. For example, while traditionally reading comprehension courses in
English have included both future forms—will and going to—since they were considered
equally relevant, thought to be used with the same frequency to express future time, the
computed-based analysis carried out with AntConc rated the frequency of the two structures
and provided evidence that challenged this assumption. While the future form going to had 3
(three) hits, the use of the modal will realizing the meaning of future reached 1518 (one
thousand five hundred and eighteen) hits. Other examples included the different frequency in
the use of modal verbs of similar meaning, such as may and might. While may got 417 (four
hundred and seventeen) hits, might only got 23 (twenty three). Likewise, the frequency in the
use of other structures, such as the comparative and the superlative forms were studied and
proven to respond only partially to the rate of frequency they may show in other contexts.
Conclusion
To conclude, it can be said that the present research comes to affirm the value of applied
linguistics in general and of the program AntConc in particular for the study of language in
its natural contexts. In this particular case, this approach has provided the tools for the design
of a more efficient reading comprehension course of academic texts in English; however, the
great potential it has for the design of other language courses in a wide variety of contexts
can hardly be overlooked.
References
Adolphs, S. et al. (2004). Applying Corpus Linguistics in a health care context. International
Journal of Applied Linguistics, 1(1), 9 - 28.
Bazerman C. (1994). System of genres and the enhancement of social intentions. In Aviva
Freedman & Peter Medway (Eds.), Genre and New Rhetoric (pp.79 - 101). London:
Taylor and Francis.
Berkenkotter C. & Huckin Thomas N. (1995). Genre knowledge in disciplinary
communication: Cognition/ Culture/ Power. Hillsdale, N. J.: Lawrence.
Biber, D. et al. (1999). Longman grammar of spoken and written English. Harlow: Longman.
49
Biber, D. et al. (2007). Discourse on the move: using corpus analysis to describe discourse.
Amsterdam: John Benjamins Publishing Company.
Bhatia, V. K. (1993). Analyzing genre: Language use in professional settings. London:
Longman.
Connor U. & Upton T. A. (1996). Discourse in the professions: Perspectives from Corpus
Linguistics. Amsterdam: John Benjamins Publishing Company.
Boxer, D. & Pickering L. (1995). Problems in the presentation of speech acts in ELT
materials: The case of complaints. ELT Journal 49, 99 - 158.
McCarthy, M. (1998). Spoken language and applied linguistics. Cambridge: Cambridge
University Press.
Carter, R. & McCarthy M. (2006). The Cambridge grammar of English: A comprehensive
guide to spoken and written grammar and usage. Cambridge: Cambridge University
Press.
Charteris-Black, J. (2004). Why “an angel rides in the whirlwind and directs the Storm.” A
Corpus-based comparative study of metaphor in British and American political
discourse. Language and Computers. 49, 1133 - 1150.
Cotterill, J. (2004). Collocation, connotation, and courtroom semantics: Lawyers control of
witness testimony through lexical negotiation. Applied Linguistics 25 (4), 513 - 37.
Johns, T. (1991). Should you be persuaded: Two examples of data driven learning. English
Language Research Journal. 4, 1 - 16.
Lowu, W. (2007). Collocation as the determinant of verbal art. In Miller, D. R. & Turci, M.
(Eds.), Language and Verbal Art Revisited: Linguistic Approaches to the Study of
Literature (pp.149–80). London: Equinox.
Martin, J. R. et al. (1987). Social processes in education: A reply to Sayer and Watson (et al).
In I. Reid (Ed.), The Place of Genre in Learning: Current Debates (46 - 57).
Geelong: Deakin University Press.
Martin J. R. (1993). A contextual theory of language.” In B. Cope & M. Kalantzis (Eds.),
The Powers of Literacy: A Genre Approach to teaching Writing (116 – 136).
Pittsburg, Pa: University of Pittsburg Press.
McEnery, T. & Wilson, A. (1996). Corpus Linguistics. Edinburgh: Edinburgh University
Press.
Miller, C. R. (1984). Genre as social Action. Quaterly Journal of Speech. 70, 157 - 178.
O’Keeffe, A. (2006). Investigating Media Discourse. London: Routledge.
Pang, E. et al. (2005). Teaching reading. Educational Practise Series. The International
Bureau of Education. France: UNESCO.
Rosenblatt, L.M. (2004) The transactional theory of reading and writing. In R.B. Ruddell, &
N.J. Unrau (Eds.), Theoretical Models and Processes of Reading (1363 -1398).
Newark, DE: International Reading Association.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Swales, J. M. (1990). Genre analysis: English in academic and research settings. Cambridge:
Cambridge University Press.
50
Appendix
Guía VIII
Contenidos:
 Verbos modales
 Verbos modales en voz pasiva
Leer las siguientes oraciones.
Names of authors must appear with their affiliations.

The abstract must reflect the content of the article including keywords.
Applicants must ensure that proposals conform to the page limits and layout given in the
form for applicants.
EU funding requested must not exceed Euros 2.5 million.
Selected proposals have to provide evidence that a commitment was made by the relevant
national research programmes.
Participants will not have to submit their legal and financial information (and supporting
documents) each time they submit a proposal or negotiate a grant agreement.
Beneficiaries do not have to provide evidence of actual costs.
Participants should introduce their PIC (Participant Identification Code).
Manuscripts should be complete.
In a research report, the abstract should be informative, including actual results.
The abstract should not contain confidential information.
You should not annex any other file unless this is specially required.
Argentina will identify a number of new potential projects for expanding the twining process
to include projects from other countries of the Mercosur region.
The workshop will include scientific sessions addressing issues of common interest and a
session reviewing the twining mechanism.
Some changes (such as changes in the address or legal name of the beneficiary) in ongoing
grants will not require a formal amendment in each of the grant agreements.
Argentina can explore the possibilities to participate as a partner in IAPP.
The experienced researcher can act as the proposal coordinator before the call deadline, in
particular in order to submit the proposal.
If you cannot locate validation errors, please contact the EPSS help desk immediately.
The Research Executive Agency cannot at this stage indicate if there is a possibility of EU
funding.
This could contribute in the long-term to the integration of evaluation practices across
national research systems.
The project could contain a single building or a number of buildings located in one or more
countries.
Any reporting of the implementation of a major research programme would be incomplete
without a proper look at the results obtained and the impact achieved.
Without the help of an internship, many professionals would not be where they are today.
These costs might include travel and subsistence, networking activities, management costs
directly related to the implementation of the exchange.
In addition, ERA-NET topics might be present also in other calls.
Technical and management staff may also participate in the exchanges.
51
The Nordic Center of Excellence in Welfare Research may also contribute to the travel
expenses of participants.
These organizations may provide the consortium with facilities or assistance that enhances
the quality of work, but they may not be responsible for core activities of the project (e.g.
management, coordination, leader of work group, etc.).
Las palabras resaltadas están seguidas de un verbo en su forma infinitiva. Esto es así porque
son auxiliares que sirven para imprimirle al verbo distintas modalidades.
Los verbos modales más frecuentes en textos académicos son:
must y have to expresan obligación
(Nota: El negativo de must indica prohibición, y el negativo de have to indica falta de
necesidad)
should indica requerimiento
will indica futuro
can indica posibilidad, habilidad, o probabilidad
could expresa posibilidad, habilidad o probabilidad (más putativo)
may y might expresan posibilidad
would expresa condición
Los ejemplos anteriores corresponden a oraciones en voz activa. Sin embargo, los verbos
modales también realizan significado en oraciones en voz pasiva. En ese caso, el verbo modal
es seguido de be + el pasado participio.
Identificar en el siguiente texto oraciones de este tipo.
United Nations Trust

Fund for the Second
International Decade of
the World’s Indigenous
Peoples
Project Proposals
 Deadline for submission of proposals: 1 September 2011
 Proposal should be submitted by email to indigenousfund@un.org
 It can also be sent by regular post to:
Secretariat of the Permanent Forum on Indigenous Issues
Division for Social Policy and Development
Department of Economic and Social Affairs
2 UN Plaza, Room DC2-1454
New York, NY 10017, U.S.A.
 Projects should be implemented over a period of 12 months;
 Projects will be selected on a competitive basis, with maximum funding of
US$10,000 per project
 Regional projects will be granted up to US$20,000;
 Projects can be submitted in English, French, Spanish, or Russian;
52
 Further information may be requested;

 Notification will be sent only to selected projects;
 Financial and administrative procedures will be undertaken as per United Nations
rules and regulations.
To cite this article:

Mussetta, M., & Vartalitis, A. (2013). Corpus Linguistics (CL) in the Design of
English for Academic Purposes (EAP) Courses. Frontiers of Language and Teaching,
Vol. 4, 45-53.
53

Corpus Linguistics (CL) in The Design PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Corpus Linguistics (CL) in The Design PDF

Загружено:

Авторское право:

Доступные форматы

Frontiers of Language and Teaching …...……………………….

Corpus Linguistics (CL) in the Design of English for Academic

Mariana Mussetta and Andrea Vartalitis

Keywords: Corpus Linguistics, Reading Comprehension, English for Academic Purposes

Leer las siguientes oraciones.

Names of authors must appear with their affiliations.

Identificar en el siguiente texto oraciones de este tipo.

United Nations Trust

 Further information may be requested;

To cite this article:

Вам также может понравиться