Вы находитесь на странице: 1из 43

Corpus Linguistics and

Stylistics
PALA Summer School, Maribor, 2014
In this lecture...
Stylistics and style
Combining stylistics + corpus linguistics
Examples of studies combining corpus linguistics
and stylistics
Analysis of genres
Analysis of the works by particular authors
Analysis of individual texts
Analysis of variation inside texts
Corpus Tools
WMatrix
Stylistics
Stylistics is the study of literature
using methods, theories and
concepts from linguistics
(Leech and Short 2007: 1)

it is "[...] the study of the


relationship between linguistic
form and literary function [...]
(Leech and Short 2007: 3).
Linguistic style
Style is a way in which language
is used
(Leech and Short 2007: 31)

[S]tyle consists in choices made


from the repertoire of the
language.
(Leech and Short 2007: 31)
Linguistic style
Stylistic choice is limited
to those aspects of
linguistic choice which
concern alternative ways
of rendering the same
subject matter
(Leech and Short 2007: 31)

e.g. horse vs. steed but not


horse vs. dog
Linguistic style
Style and genre, e.g. science fiction, romance
novels, etc.
Style and author
Style and text
Style and parts of texts (e.g. the narration or
speech of different characters)
Ways of analysing style
Analysts intuitions
Manual comparative analysis
Ways of analysing style
Style and comparison
Even if style is defined as that variety of language
which correlates with context, the recognition and
analysis of styles are squarely based on comparison.
The essence of variation, and thus of style, is difference,
and differences cannot be analysed and described
without comparison.
(Enkvist 1973: 21)
Ways of analysing style
Comparative analysis manually
OK for shorter texts/extract

Comparative analysis using computers:


Corpus linguistic methods/tools
Especially useful for longer texts prose fiction
Combining corpus linguistics and
stylistics
The corpus turn (Leech and Short 2007:284).
On-going trend in stylistics to use methods
and tools from corpus-linguistics for the
analysis of literary and other texts.
Usually referred to as corpus stylistics
Other terms:
digital stylistics (Louw 2008)
electronic text analysis (Adolphs 2006)
Examples of studies
Combining corpus linguistics and stylistics
Analysis of genres
Analysis of the works by particular authors
Analysis of individual texts
Analysis of variation inside texts
Genre style
Biber (1988) multivariate statistical techniques
factor analysis
many different variables
variables = linguistic features (e.g. passive constructions)
e.g. narrative versus non-narrative texts
important variables = past tense verbs, 3rd person
pronouns, perfect aspect, present participle
clauses
High scores = narrative
Low scores = non-narrative
A range not a dichotomy

the top text-types

there exists a whole range of text-types in the


the bottom text types
middle its not just a two-way distinction
Note also spoken and written genres are
mixed together along the dimension
narrative / non-narrative
Genre style direct speech
Corpus-based study of
speech, writing and thought
presentation
(Semino and Short 2004)
Genre style direct speech
Corpus of 260,000 (approx) words of (late) 20th
century written British English

120 text samples


2,000 (approx) words each, amounting to a
total of 258,348 words. It is divided into three
sections:
Genre style direct speech
Corpus divided into three sections:
prose fiction (87,709 words),
newspaper news reports (83,603 words), and
biography and autobiography (87,036 words)

Each genre section further divided into a


serious and a popular sub-sections.
Genre style direct speech
Corpus tagged manually
<sptag cat=NRS next=DS s=0.37 w=7>
The theme parks manager, Mike Slattery said:
<sptag cat=DS next=NRS s=1.63 w=18>
By closing Crinkley Bottom, the council has shot
Morecambe in the foot. And Im out of a job.
Genre style direct speech
Section of the corpus Number of instances of DS
Whole corpus 2,974
Fiction 1,569
Press 770
(Auto)biography 635

Fiction sub-section Number of instances of DS


Serious 629
Popular 940
Authorial style
Studies attempting to fingerprint authors: i.e. to
identify linguistic items that distinguish the works by
one author from those of others.
Burrows (1987): study of Jane Austens novels
focusing on closed-class words, such as the, and, of,
a and to.
Burrows found that these words can distinguish the
works of different authors , different novels, and
even the words spoken by different characters.
Authorial style
Hoover (2002) studied a series of corpora containing
chunks from novels by different authors.
For example, he looked at a corpus containing the
first 30,000 words of 29 novels by 17 different
authors.
The distribution of the 300 most frequent words in
the corpus as a whole correctly clusters 15 out of 17
novels.
Authorial style
An analysis of the most frequent word sequences (n-
grams) can also be useful, e.g.
of the
in the
to the
it was
he was
and the
Authorial style
Mahlberg (2007, 2009, 2012)
Corpus stylistics and Dickenss fiction
Also shows that analysis of frequent
word sequences (clusters) can be
useful.

Clusters containing body parts


his hands in his pockets
his head on one side
his hands upon his
Text style
Stubbss (2005) study of
Joseph Conrads Heart of
Darkness, first published in
1899.
Marlow, the protagonist and
first-person narrator, tells of
how he was contracted to
travel up a river in the Belgian
Congo, in order to find an
ivory trader called Kurtz, who
was the subject of stories of
madness and suspect
practices. However, Kurtz dies
while travelling back down the
river.
Text style
Main themes
hypocrisy of the colonizers
unreliability of progress and civilization
breakdowns in communication
Light vs. dark
Restraint vs. frenzy
Appearance vs. reality
Marlows unreliable and distorted knowledge
(Stubbs 2005: 8-9)
Text style
Used WordSmith Tools (Scott 2007)
Compared one novel with a corpus of fictional texts
of around 700,000 words
Overused words in novel include: seemed, mystery,
darkness, absurd, horror, terror, desolation
Several words concern uncertainty, perception and
knowledge.
Coincide with some of the novels themes
Text style
Stubbs shows how the application of corpus
methods can provide:
further justification for well-established
interpretations,
new insights into the language and meaning
potential of the text.
Text style: variation inside texts
Culpeper (2002) used WordSmith Tools to do a key-
word analysis of the speech of the main characters in
Romeo and Juliet
A file with the words spoken by each character was
compared to a reference corpus containing the
words of all the other characters.
Findings are relevant to an understanding of how the
characters are linguistically constructed
(characterisation).
Text style: variation inside texts

Juliets key-words (raw frequencies in brackets):

If (31), Or (25), Sweet (16), Be (59), News (9), My


(92), Night (27), I (138), Would (20), Yet (18), Thou
(71), Words (5), Name (11), Nurse (20), Tybalts (6),
Send (7), Husband (7), That (82), Swear (5)
Text style: variation inside texts

Key-words such as if, or, would, yet can be related


to Juliets tendency to express uncertainty and
anxiety throughout the play:

I fear it is: and yet, methinks, it


should not, For he hath still
been tried a holy man (IV.iii.)
[Context: Wondering whether
the Friar has supplied sleeping
potion or poison]
Corpus tools
Corpus tools make comparison relatively easy
WordSmith Tools (Scott 2007)
WMatrix (Rayson 2009)
AntConc (Anthony 2011)
MLCT (Piao)
Summary
Style is the way in which language is used.
The notion of style is fundamentally based on
comparison
Corpus linguistic methods are relevant to the
analysis of style in fiction/literature.
They have been applied to the analysis of
genres, authors and texts.
Manual analysis and interpretation of the
output from corpus tools is needed.
Summary

[...] corpus stylistics is not


purely a quantitative study of
literature. Rather, it is still a
qualitative stylistic approach
to the study of the language
of literature, combined with
or supported by corpus-based
quantitative methods and
technology.
(Ho 2011:10)
References
Culpeper, J. (2009) Keyness: words, parts-of-speech and semantic categories in the character-talk of
Shakespeares Romeo and Juliet International Journal of Corpus Linguistics, 14(1): 29-59.
Ho, Y. (2011) Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles The
Magus. London: Continuum
Leech, G. (2008) Language in Literature: style and foregrounding Harlow, UK: Pearson
Louw, B. (2008) "Consolidating Empirical method in data-assisted stylistics: Towards a corpus-attested
glossary of literary terms" in Zyngier, S., Bortlussi, M., Chesnokova, A. and Auracher, J. Directions in
Empirical Literary Studies, pp. 243-264. Amsterdam: Benjamins.
Mahlberg M. (2007) Clusters, Key Clusters and local textual functions in Dickens Corpora 2(1): 1-31
Mahlberg, M. (2009) Corpus Stylistics and the Pickwickian watering-pot, in Contemporary Corpus
Linguistics Baker, P. (ed.) Contemporary Corpus Linguistics, pp47-63. London: Continuum.
Mahlberg, M. (2012) Corpus Stylistics and Dickenss Fiction. London: Routledge
McIntyre, D. (2010) Dialogue and Characterization in Quentin Tarantinos Reservoir Dogs: A Corpus
Stylistic Analysis, in McIntyre, M. and Busse, B. (eds.) Language and Style pp 162-182. Basingstoke:
Palgrave.
McIntyre, D. and Walker, B. (2010) 'How can corpora be used to explore the language of poetry and
drama?' in McCarthy, M. and OKeefe, A. (eds) The Routledge Handbook of Corpus Linguistics.
London: Routledge
Widdowson, H. G. (2008) The Novel Features of Text. Corpus Analysis and Stylistics in Gerbig, A. and
Mason, O. (eds.)Language, People, Numbers: Corpus Stylistics and Society, pp. 293-304.
Amsterdam: Rodopi.
WMatrix
WMatrix
Web-based corpus tool
Developed by Paul Rayson at Lancaster
University
Automated grammatical and semantic analysis
of texts/corpora
A web-based front end for CLAWS and USAS
WMatrix
Using a web interface:
Texts are uploaded onto the Wmatrix server
(at Lancaster)
The upload procedure automatically adds
(i) Grammatical or Part of Speech (POS) tags;
(ii) Semantic tags
WMatrix
CLAWS grammatical (POS) tagger.
CLAWS = Constituent Likelihood Automatic Word-
tagging System
USAS semantic tagger
USAS = UCREL Semantic Analysis System
(UCREL = University Centre for Corpus Research on
Language)
WMatrix
USAS
Assigns tags to each word using a hierarchical
framework of categorization
Based originally on McArthurs (1981)
Longman Lexicon of Contemporary English
The 21 Top Level Semantic Categories of the
USAS Tag-set
A B C E F
GENERAL & THE BODY & THE ARTS & CRAFTS EMOTION FOOD & FARMING
ABSTRACT TERMS INDIVIDUAL

G H I K L
GOVERNMENT & ARCHITECTURE, MONEY & ENTERTAINMENT LIFE & LIVING
PUBLIC DOMAIN HOUSING & THE COMMERCE THINGS
HOME (IN INDUSTRY)
M N O P Q
MOVEMENT, NUMBERS & SUBSTANCES, EDUCATION LANGUAGE &
LOCATION, MEASUREMENT MATERIALS, COMMUNICATION
TRAVEL, OBJECTS,
TRANSPORT EQUIPMENT
S T W X Y
SOCIAL ACTIONS, TIME WORLD & PSYCHOLOGICAL SCIENCE &
STATES & ENVIRONMENT ACTIONS, STATES TECHNOLOGY
PROCESSES & PROCESSES
Z
NAMES &
GRAMMAR
WMatrix
G - Government and the public domain
Government, etc. G1.1
Government,
G1 politics and
elections Politics
G1.2

G2 Crime, law and


order

War, defence
G3 and the army:
weapons
WMatrix
Allows analysis of texts at :
the word level
the grammatical level (POS)
and the semantic level
WMatrix
Allows text comparison at:
the word level
the grammatical level (POS)
and the semantic level
WMatrix
Keyness
Word level Key-words
Grammatical level Key-POS
Semantic level Key-concepts

Вам также может понравиться