Вы находитесь на странице: 1из 34

Intonation units in spoken interaction:

Developing transcription skills


JUURD H. STELMA and LYNNE J. CAMERON

Abstract
This paper describes the transcription process and the development of transcription skills in a research project using recorded spoken interaction as its
main data. The spoken data was transcribed using intonation units, and the
paper traces the development of the rst authors skills in identifying such
intonation units. Intertranscriber checks of transcription, involving three researchers, were used to highlight ways in which the identication of intonation units could be improved. Subsequent re-transcription of the data highlighted stretches of talk that included many hesitations, false starts, and
speech used to regulate ongoing spoken interaction. These features were
linked to low levels of intertranscriber agreement. It is argued that the existing literature on intonation units does not address how to best deal with
this quality of spontaneous spoken interaction. The paper concludes with an
agenda that may be used to improve the quality of transcription in similar
research projects, and to develop the transcription skills of the researchers
that are responsible for transcription.
Keywords:

intonation units; transcription; transcription skills; intertranscriber checks; spoken interaction; spontaneous talk.

1. Transcribing spoken interaction


In a 1979 landmark paper on transcription, Elinor Ochs (1979: 44) set out
to consider with some care the transcription process (emphasis added).
She went on to say,
We consider this process (a) because for nearly all studies based on [verbal] performance, the transcriptions are the researchers data; (b) because transcription
is a selective process reecting theoretical goals and denitions; and (c) because,
18607330/07/00270361
Online 18607349
6 Walter de Gruyter

Text & Talk 273 (2007), pp. 361393


DOI 10.1515/TEXT.2007.015

362

Juurd H. Stelma and Lynne J. Cameron

with the exception of conversational analysis . . . , the process of transcription


has not been foregrounded in empirical studies of verbal behavior. (emphases
added)

A main focus of Ochss argument was to raise awareness about the eects
of various features of re-presenting spoken interaction in the form of transcripts. She discussed at length the signicance of page layouts, top-tobottom and left-to-right biases when reading transcripts, the representation of verbal versus nonverbal features of interaction, and the use of
transcription symbols. She not only made what was previously not-soobvious obvious, she also built a careful case for treating transcription as
something theoretical in nature.
According to Lapadat and Lindsay (1998: 5), the period following
Ochss contribution has been characterized by the following progression
of perspectives: the search for [transcription] conventions, acceptance of
a multiplicity of conventions, and [then] abandonment of standardization
in favour of contextualized negotiation of method (cf. also Lapadat and
Lindsay 1999). Standardized notation systems suggested in the literature
are usually designed to facilitate transcription according to a particular
perspective on spoken interaction. For example, conversation analysis
employs the Jeersonian notation system (cf. Atkinson and Heritage
1984: ixxvi), which is designed to explore the moment-to-moment
unfolding of turn-taking in interaction. Other systems are more eclectic,
but even so make their orientations explicit. For example, Gumperz and
Berenz (1993: 119) suggest a comprehensive set of conventions designed
to reveal the functioning of communicative signs in the turn-by-turn interpretation of talk. Finally, some systems are designed to account for
more specialized features of spoken interaction, such as Du Boiss (1991)
suggested standard notations for transcribing talk into intonation units,
which have as their theoretical basis the connecting of mental processing
and speech production (cf. also Du Bois et al. 1993). This latter notation
system will be discussed in detail later in this paper, as the transcription in
the current research in part draws on Du Boiss notation system.
The argument for standardized notation systems was followed by debate about the usefulness of such standardization (Lapadat and Lindsay
1998). This debate has included attempts at empirically deriving agreed
upon notation systems (e.g., Dressler and Kreuz 2000), suggestions for
general design principles to base transcription conventions on (Du Bois
1991; Edwards 1993), and arguments against standardization; for example, Muller and Damico (2002: 303) suggest that because transcription
involves recurrent interpretive cycles that lter, shape, and even recreate
data, there is no such thing as the complete transcript, and Cook
(1990) argues that one cannot claim objectivity when trying to re-present

Intonation units in spoken interaction

363

contextual features in transcription. What has emerged from this debate


is what may be described as a considered approach, where standardization
is valued but not mandatory (cf. OConnell and Kowal 1999). Gumperz
and Berenz (1993: 119) exemplify this line of thinking when they point
out that their notation system is not designed to record everything
that can be heard, and adding: Yet, at the same time, we seek to
remain as comprehensive and attentive to detail as possible in showing
what the phenomenological or perceptual bases of our interpretations
are.
Just as in Ochss (1979) original contribution, the later debates have focused on transcription as re-presentation or interpretation, as something
which is more or less consistent, and as being informed by theoretical positions and analytic concerns. Where the term process is used, it is used in
Ochss sense that the act, or activity, of transcription should be guided by
an awareness of the interpretive and theoretical nature of transcribing
talk (e.g., Green et al. 1997; Muller and Damico 2002). This focus neatly
avoids the very messy reality of actually doing transcription. Lapadat and
Lindsay (1998: 21) argue that empirical examination of transcription
processes, products, and their implications is singularly lacking in the research literature. With some exceptions, the empirical studies that do
exist generally focus on transcripts as products. For example, Lapadat
and Lindsay (1998) compared the dierent transcriptions produced by
students enrolled in a graduate course on language development. Romero
et al. (2002) explored how subjects reading of the same segment of talk,
transcribed using dierent notation systems, compared to the original
audio recording of the talk. Roberts and Robinson (2004) studied the intertranscriber (they used the term inter-observer) agreement of four researchers, all transcribing the same segment of talk using the Jeersonian
notation system. Finally, OConnell and Kowal (1994: 140) explored how
transcripts inevitably include errors because like all language users, transcribers are in search of meaning. While valuable contributions, and a
step in the right direction as they oer the eld of transcription a base of
empirical evidence for discussion, these contributions nevertheless lack
any focus on the actual process of transcribing talk.
A rst exception to this picture is a number of comments in passing
made about the process of transcribing talk. For example, Lapadat
(2000: 204) suggests that the process of doing transcription . . . promotes
intense familiarity with the data, which leads to the methodological and
theoretical thinking essential to interpretation, and more specically
that transcribing talk in interaction slows it down and focuses the
researchers interpretive eye, allowing him or her to become intensely
familiar with the data, and to draw meanings out of them (2000: 215).

364

Juurd H. Stelma and Lynne J. Cameron

Lapadat (2000: 216) also suggests that a researcher should keep an audit
trail of decision points while transcribing, in the end using this record to
produce a code book of what was transcribed, how, and why. Again,
these are useful comments indicating the importance of approaching transcription in a systematic manner, but they do not address in any empirical
manner the process of actually doing transcription.
A more notable exception to the general picture is a small set of recent
studies that have explored the experiences of researchers doing transcription work. Gregory et al. (1997: 295) comment on the progressively more
common practice of hiring transcribers, made possible by the increasing
funding available to qualitative research, and how, in their eld of health
research, this creates a need for examining the emotional laboring and
work worlds of transcribers. In a series of articles, Tilley (cf. Tilley
2003a, 2003b; Tilley and Powick 2002) has explored, through interviews
and narrative accounts, the particular experiences of transcribers who
themselves are not researchers on a particular project, and the challenges
this creates within the research projects. Finally, Bird (2005: 246) provides a personal narrative of her own initial and growing relationship
with the process of transcription across a series of transcription tasks
she undertook. These studies provide holistic accounts of transcribers experiences in qualitative research projects. They do not, however, address
the particular challenges involved in doing the narrow types of transcription often required in discourse analysis or applied linguistics research projects. The studies, nevertheless, suggest that doing transcription
is a complex process, and that to uncover the real complexity of this process may well require further direct exploration.
Associated with the scarcity of studies on the process of doing transcription is the observation, made by several authors, that the researchers
who actually do transcription are often postgraduate students or research
assistants (e.g., Lapadat 2000; Tilley 2003a, 2003b). With this realization
must come the recognition that these research assistants are developing
their transcription skills concurrently with the activity of preparing transcripts for a project (Bird 2005; Tilley 2003a, 2003b; Tilley and Powick
2002). Conspicuous in this regard is Lapadat and Lindsays (1998: 21) observation that transcription seldom appears as a topic of consideration in
the education of future researchers and practitioners who will be employing transcription in their work. McLellan et al. (2003: 73) come fairly
close to the mark when they point out that training data managers, transcribers and proofreaders [tasks often done by research assistants] is
highly variable given the research structure, the setting, the type and
volume of data collected, the data produced, and the analytic approach
taken. This highlights a number of implications that need careful consid-

Intonation units in spoken interaction

365

eration in research projects employing research assistants to do transcription. For example, how much time it takes to prepare quality transcriptions may need to be balanced against the time needed to develop
transcription skills, and the involvement of more senior researchers in
the development of the transcription skills of the research assistants needs
to be made explicit.
The present paper addresses these gaps in the literature, on the one
hand documenting the process involved in the transcription of spoken interaction using intonation units in an applied linguistics research project,
and on the other hand describing the gradual development of the transcription skills of the researcher with primary responsibility for preparing
the transcripts for the project. To achieve this, we next provide an introduction to the project from which this paper emerged, take a closer look
at the literature on intonation units and the transcription of these units,
and then outline our own view of what constitutes quality transcription.
This is followed by a detailed description of the transcription process we
engaged in and the intertranscriber checks we conducted. The nal part
of the paper is an in-depth exploration of one researchers developing
skills in transcribing spoken interaction using intonation units, employing
as evidence both the products and the processes of transcription in the
research project. We conclude the paper by presenting an agenda for the
development of transcription skills and quality of transcription that
emerged from our close engagement with the transcription process in our
project work.

2. The research project


The transcription process and the development of transcription skills described in this paper took place in the context of a research project whose
main aim was to explore the dynamics of metaphor use in conciliation
talk.1 The spoken data in this project consisted of about 3.5 hours of
video- and tape-recorded spoken interaction between two participants; a
perpetrator of a bombing and the daughter of a victim of this bombing.
For the most part, the interaction consisted of extended speaker turns,
with the listener producing occasional back channel responses. Only
sometimes did the interaction consist of shorter, more interactive turns.
The spoken interaction was nevertheless unscripted, and thereby spontaneous. The purpose of the interaction between the two participants was to
listen to each other, or to understand the story or journey of the Other,
as experienced before, during and after the bombing (Cameron and
Stelma 2004; Cameron forthcoming).

366
3.

Juurd H. Stelma and Lynne J. Cameron


Intonation units in spoken interaction

The project team, i.e., the rst and second authors of this paper, agreed at
an early stage to transcribe the conciliation talk using intonation units.
We were attracted to intonation units because there is evidence that these
units can account for the inherently dynamic interplay between speaking
and thinking (Chafe 1994, 1996), thereby responding to the research aim,
which was to explore the dynamics of metaphor use in talking and thinking in a context of post-conict reconciliation.
Our use of the intonation unit (henceforth IU or IUs) is based primarily on Wallace Chafes (1994, 1996) extensive work on naturally occurring
language data from a range of discourse contexts and languages, building
a case for a relationship between consciousness and language. In particular, Chafe (1996: 39) suggests that consciousness is a process in which
remembering, imagining, evaluating, and speaking come together to produce what we know as thought and language.2 One of the constant properties of consciousness, according to Chafe, is that it has a focus and a
periphery, much like human focal and peripheral vision. Moreover, focus
is a dynamic property, as there is a restless movement from one focus to
the next; consciousness does not stand still (1996: 38). In terms of speaking, the restless movement, or dynamic nature, of consciousness manifests
itself in speaking in the form of IUs (Chafe 1994, 1996). Chafe (1996: 40)
underlines the dynamic nature of consciousness, and the dynamic relationship between consciousness and speaking, by pointing out that IUs
(and consciousness) are produced in a series of brief spurts, typically between one and two seconds long.
This coupled relationship between consciousness and IUs, produced
in spurts, is supported by a number of arguments for the IU as a
cognitive unit (cf. Park 2002), as well as arguments for how the IU,
and intonation more generally, does work in spoken interaction (cf.
Couper-Kuhlen 2001; Wennerstrom 2001). One argument is based on
the observation that IUs, on average between 4 and 5 words long (Chafe
1994: 65; Crystal 1969: 256), are smaller than clauses. For this reason,
IUs may be constrained by something other than grammatical structure.
Cognitive constraints, such as how much information can be active in
consciousness at one time, might be such an alternative explanation
(Chafe 1994). On the other hand, some IUs are clause length. Since
clauses often encode propositions, it may be argued that IUs themselves
are vehicles for basic cognitive processes of information storage and
discourse processing (Park 2002: 639). However, this observation may
equally well be explained as a conuence between grammatical and intonation structure in discourse processing (cf. Ford and Thompson 1996).

Intonation units in spoken interaction

367

A possibly more intriguing observation is how IUs seem to be encoding a


single message (Kreckel 1981), a single new idea (Chafe 1994), or a single
unit of information (Halliday 1967). In Chafes (1993: 37) framework,
this includes substantive IUs, which are the contentful stretches of speech
that include ideas of people, objects, events, and states, and regulatory
IUs, which function, in one way or another, to regulate the ow of information. There is also evidence that dierent tunes marking boundaries
between so-called intonational phrases,3 or sets of intonation units, can
be used to interpret discourse-relevant relationships. That is, boundary
tunes between intonational phrases communicate meaning to interlocutors (Pierrehumbert and Hirschberg 1990). IUs are thus seen as playing
both cognitive and interactive roles in spoken discourse.
This perspective on consciousness and language is consistent with the
second authors conceptualization of the dynamic role of metaphor use
in thinking and speaking (Cameron 2003). Hence, we felt that transcription using IUs was appropriate for representing the dynamic unfolding of
mental and interactive processes in spoken data, and for facilitating the
investigation of metaphor use in the conciliation talk.

4. Identifying and transcribing intonation units


A central challenge, and a nontrivial one we shall claim, is the task of
identifying IUs in spoken interaction. Cruttenden (1986: 36) observes
that many linguists assume that the phonetic correlates of boundaries between intonation-groups are far more straightforward than they actually
are. Cruttendens denition of intonation groups is similar to Chafes
IUs, as well as our own understanding of IUs. Cruttenden further proposes that the diculty of identifying boundaries between intonation
groups depends on whether the verbal data is speaking prepared texts
or more spontaneous speech. Cruttenden also suggests that adults intonational competence can be very variable. Background knowledge in the
topic of a conversational event, whether a spoken event was rehearsed,
and probably also the extent to which new ideas are being generated by
the speech, may all aect the identiability of IUs (Wichmann 2000).
For example, the IUs of an experienced news anchor, reading from a teleprompter, would presumably be more distinct, and thereby more easily
identiable, than the IUs of the spoken interaction of, say, two people
who meet by chance in the street.
The literature is quite clear on how complete IUs may be recognized.
Chafe (1994: 58) suggests that the following six features be used in identifying boundaries between IUs: (i) changes in fundamental frequency, or

368

Juurd H. Stelma and Lynne J. Cameron

pitch (in musical terms, each key on a piano represents a dierent pitch),
(ii) changes in duration or tempo (manifesting itself as shortening and
lengthening of syllables and words), (iii) changes in intensity or loudness
(including stress and/or accents), (iv) alterations between vocalization and
silence (i.e., pausing), (v) changes in voice quality (e.g., creaky voice),
and (vi) changes in speaker turn. Chafes account thus produces six characteristics of prototypical IUs:
1.

2.

3.
4.
5.
6.

Pitch usually includes a resetting of the pitch baseline (as in a stepup or step-down in the pitch level) and a recognizable nal pitch
contour (e.g., falling or rising);
Duration usually includes increased tempo at the beginning (as in
a shortening of syllables and/or words), and then a gradual slowing
down toward the end (as in a lengthening of syllables and/or words);
Intensity usually includes one or more syllables and/or words spoken
more loudly;
Pausing is often preceded or followed by pausing (but may also contain pauses within its boundaries);
Voice quality sometimes begins or ends with a creaky voice or
whispering;
Speaker turn may sometimes be associated with a change of speaker.

Cruttendens (cf. 1986, 1997) description of criteria for identifying what


he calls intonation groups adds useful detail to Chafes characteristics.
In our own use of IUs as a tool for making sense of the dynamics of
metaphor use, we have come to view Cruttendens intonation groups
and Chafes intonation units as similar units of speech. There are some
dierences, however. Cruttenden makes a distinction between external
and internal criteria for intonation groups. Cruttendens (1997: 2934)
external criteria for identifying intonation groups include: (i) pausing
(unlled and lled), (ii) anacrusis (increase in speech rate at the start of
an intonation group), (iii) lengthening of syllables (at the end of intonation groups), and (iv) changes in pitch level and/or pitch direction on unaccented syllables.
Cruttenden distinguishes these external criteria from prosodic features
that are internal to intonation groups. For example, he argues that a
step-up and step-down in pitch level is sometimes associated with an accented syllable, in which case it should not be interpreted as an intonation
group boundary. To help resolve potential ambiguity, Cruttenden (1986:
34) points out that accents in connected speech normally fall only on syllables that are lexically stressed. Hence, a step-up or step-down in pitch
level followed by a lexically stressed syllable or word is not an intonation
group boundary. Cruttendens internal criteria for intonation groups,

Intonation units in spoken interaction

369

then, are (i) an intonation group must contain at least one stressed syllable, and (ii) there must be a pitch movement to or from at least one accented syllable.
Another potential ambiguity is that pausing, anacrusis, and syllable
lengthening may all, according to Cruttenden, happen in the middle of
intonation groups, especially in the case of hesitation. To distinguish
between intonation group boundaries and hesitation, Cruttenden (1997:
35) suggests the following heuristic: if the features of pause, and/or anacrusis, and/or syllable lengthening divide an utterance into two partutterances either one of which does not have the minimum internal structure of an intonation group, then any combination of these features is
taken as a hesitation rather than a boundary between intonation groups.
The later parts of this paper will show how this added detail, provided by
Cruttendens external and internal criteria, helped us make sense of IUs
in our own data.
A nal point we would like to make about the identication of IUs is
related to the distinction between prepared and spontaneous talk, introduced earlier in this paper. There is a question about the extent to which
IUs, with complete intonation contours, as described by, e.g., Chafes
(1994) characteristics listed above, reect spoken interaction adequately.
Cruttenden (1986: 36) points out that in the case of spontaneous speaking
any clear and obvious division into intonation-groups is not so apparent
because of the broken nature of much spontaneous speech, including as
it does hesitation, repetitions, false starts, incomplete sentences and
sentences involving a grammatical caesura in their middle. Wichmann
(2000: 21) also observes that spontaneous speech will to a greater or
lesser extent display syntactic and prosodic disuencieshesitations,
repetitions, incomplete utterances. In sum, the extent to which IUs are
complete will vary a great deal when speakers have to deal with the
real-time dynamics of spontaneous speech.
The research by Lindsay and OConnell (1995), reviewed above, suggests that hesitation phenomena and sentence fragments can cause particular diculties in transcription of spoken discourse. Even so, there is little
information in the transcription literature on how to deal with this often
broken, or fragmentary, nature of spontaneous speech. The literature
does mention the possibility of incomplete intonation contours, called
fragmented (Chafe 1994) or truncated (Du Bois et al. 1993) IUs. However, there is little detail on how such fragmented or truncated IUs are
recognized, or how the presence of these aects the identication of more
complete intonation contours. As we will see, the presence of such truncated IUs was one of the features of our data that caused most problems
in the project transcription process.

370
5.

Juurd H. Stelma and Lynne J. Cameron


Quality and the transcription process

Given the challenges we faced in the transcription of the project data, a


statement about our position on the transcription process is necessary at
this juncture. Our position may be summarized as post-positivist and realist. We reject the simplistic notion of real, or what Mishler (1991, cited
in Lapadat and Lindsay 1999: 7374) calls nave realism, in which a
transcription is held to be a true re-presentation of a discourse event. Instead, we align ourselves with the social realist approach developed by
Sealey and Carter (2004: 126), which requires us to take account of the
irreducible subjective realities of human consciousness and being while
also allowing that degrees of objectivity are possible in applied linguistic
research. We can thus avoid the inaction dictated by an extreme interpretivist or constructionist position. To some extent, every researcher who
listens to an audio-recorded conversation will hear something dierent;
people vary in the acuity of their hearing, in their awareness of melody
and intonation, and in other neurological or physical factors. However,
this variation does not rule out reaching useful levels of agreement between researchers on what was said in audio-recorded talk, or useful dialogues between researchers to improve the quality of a research process
such as transcription. People share sucient neurological and physical
features, and, in the research context, usually sucient common sociocultural and language understandings, for a level of agreement to be reached
that will produce a transcription that can be good enough for the particular research purposes. This, then, is similar to what we earlier referred to
as the considered approach that seems to have emerged, again echoing
Gumperz and Berenzs (1993: 119) advice to remain as comprehensive
and attentive to detail as possible in showing what the phenomenological
or perceptual bases of our interpretations are.
Hence, researchers working together on a project will need to set
acceptable levels of agreement, or alternatively detail how disagreement
is dealt with to increase the consistency of later transcription processes.
Meeting acceptable levels of agreement, and/or dealing with disagreement in ways that improve the consistency of transcription processes, will
be a measure of the quality of a transcription. The quality of a research
report will be enhanced by explicit statements about what is selected from
all that is possible to transcribe, and why, and about the levels of agreement reached in transcription, and how. From our perspective in this paper on the development of transcription skills, the process of learning
to transcribe will include: learning about how irreducible subjective realities aect what is heard and transcribed; learning about inter-researcher
variation; developing skills and strategies to work with these in the pro-

Intonation units in spoken interaction

371

duction of transcriptions to agreed levels of quality; acquiring knowledge


and skills to write about these decisions and processes in written reports
of the research.

6. Stages of the transcription process


The project transcription process included four stages. The rst transcriber (the 1st author of this paper) prepared an initial transcript of a
ve-minute sample segment of the spoken data. This was followed by
two additional researchers, one being the second author of the paper, preparing separate transcripts of the same ve-minute sample segment of the
data. The rst transcriber then used the results of these intertranscriber
comparisons to reect on his transcription skills, and subsequently to develop his skills in identifying IUs further. This included the production of
what we have called an enhanced transcription of the ve-minute sample
segment of data. In the last stage of the transcription process, the full spoken data set was transcribed by the rst researcher.
Table 1 provides an overview of the dierent transcript versions of
the ve-minute sample segment, as well as who prepared them. These
are referred to in the following discussion. See Appendix 1 for the transcription conventions and Appendix 2 for short illustrative segments of
the transcripts.
The rst transcriber was employed as a research fellow, and had overall
responsibility for preparing the project transcripts. Prior to the start of the
project, this researcher had experience of transcribing bilingual spoken
classroom interaction, as part of his doctoral study, using IUs. He had
also done some transcription using IUs for an earlier pilot stage of the
project reported upon here. However, the use of IUs was not central to
the aims of these previous research experiences, and he did not have any
formal training in the transcription of IUs or any other from of transcription. The second transcriber (the second author of this paper) was the
principal investigator in the research project, with overall responsibility
for processes and outcomes of the research (of which transcription was
Table 1.

Transcriptions of the ve-minute sample segment

Transcriber

Version

First transcriber
Second transcriber (experienced, but not with IUs)
Third transcriber (experienced with IUs)
First transcriber

Transcript 1
Transcript 2
Transcript 3
Enhanced transcript

372

Juurd H. Stelma and Lynne J. Cameron

only one component). She had no rst-hand experience of transcription


using IUs, but had extensive experience of transcribing discourse using
other discourse units and notational systems. The third transcriber acted
in an advisory capacity, oering feedback on the aims, processes, and
outcomes of the project as a whole. He had extensive experience of transcribing spoken interaction using IUs, both as a doctoral researcher and
later on a number of research projects using spoken interaction as primary data.
The initial transcript (Transcript 1) of the ve-minute sample segment
of spoken interaction was prepared from a digitized audio recording,
played back using VoiceWalker software (Du Bois 2000). This software
is designed to automatically step through an audio le; playing a few
seconds of the audio le, then taking a small step back before again
playing a few seconds. This stepping through the audio le, in a two
steps forward and one step back manner, facilitates the transcription of
spoken interaction without the use of the more traditional tape recorder
and pedals system. The software allows both the forwards and backwards
steps to be adjusted, and also allows the researcher to repeat steps when
needed, thereby allowing for dierent types of spoken discourse, varying
quality of recordings, and researchers dierent styles of transcription.
Extract (1) shows a segment from the beginning of Transcript 1, illustrating the product of the rst step of the transcription process.
(1)

(Sample extract from Transcript 1 [by the rst transcriber])


there is no right,
. . for me,
. . to sit here and be forgiven.
. . . if you understand me.
I mean in a sense there is the political thing,
I knew what I was doing,
. . and I would even defend actions I have taken,

The second and third transcribers prepared their transcripts under somewhat dierent conditions. They were given a verbatim transcription of the
ve-minute segment of spoken interaction, presented as a continuous
block of text as per Extract (2).
(2)

(Sample verbatim transcript segment for intertranscriber checks)


there is no right for me to sit here and be forgiven if you understand
me I mean in a sense there is the political thing I knew what I was
doing and I would even defend actions I have taken

Consistent with the aims of the intertranscriber checks, and working with
a digitized audio le, the task of the second and third transcribers was to

Intonation units in spoken interaction

373

segment the continuous block of text into IUs. Although a somewhat different task, we felt that this approach was justied, especially considering
previous research that has established high agreement between experienced transcribers identication of words and sounds only (Roberts and
Robinson 2004). The second and third transcribers were free to change
the words in the text they received according to their own hearing.
It was understood, by all the researchers involved, that the project was
working with Chafes (1994) notion of IUs. For this reason, no formal
denition of IUs was provided, and the intertranscriber checks were
based on rules as published rather than rules as agreed through discussion (cf. Oelschlaeger and Thorne 1999). Just as the rst transcriber did,
the second and third transcribers also indicated the transitional continuity
of each units intonation contour. Transitional continuity is dened as
the degree of continuity that occurs at the transition point between one
intonation unit and the next (Du Bois et al. 1993: 53). We will discuss
the marking of transitional continuity in greater detail later in the paper.
At this point, it is sucient to note that there are three types of transitional continuity: nal, continuing, and appeal. If an IU did not have an
identiable transitional continuity, it was marked as truncated (see discussion of truncated IUs below). The specic instructions for the intertranscriber checks were:
Insert line breaks corresponding to intonation unit boundaries (press the RETURN key at the end of each intonation unit you identify);
Mark the end of each intonation unit for transitional continuity. Please use:
a period for nal
a comma for continuing
a question mark for appeals
Alternatively, mark intonation units as truncated. Please use --

The second transcriber additionally recorded pauses and marked prominent words. The second transcriber listened to the audio le using the
same VoiceWalker software as the rst transcriber. The third transcriber,
being a Mac user, commented that there is no Mac version of VoiceWalker, so Im just rewinding on Quick Time.

7. Intertranscriber checks
We struggled for some time to nd a meaningful way to compare the different transcript versions. On the face of it, what we were doing seemed
like a straightforward inter-rater check that could be reported in terms
of levels of agreement and would tell us something about the reliability

374

Juurd H. Stelma and Lynne J. Cameron

Table 2. Total number of IUs identied and IUs agreed as identical


Total IUs identied

Transcript 1
Transcript 2
Transcript 3

159
186
218

IUs identical as
in Transcript 1

93
98

of Transcript 1 prepared by the rst transcriber. This would be commensurate with a positivist notion of transcription, where talk is an observable behavior that can be transcribed completely and accurately (Lapadat
2000). There are, in the literature, some precedents for such inter-rater
checks, some of which have developed quite sophisticated statistical measures (e.g., Roberts and Robinson 2004).
Table 2 shows a simple count of IUs identied by each of the transcribers, as well as how many IUs were transcribed as identical. The results show that the two intertranscribers identied more IUs overall, 186
and 218 IUs respectively, as compared to the rst transcribers 159 IUs.
This result was not very encouraging. Moreover, of the 186 IUs that the
second transcriber identied, only 93 IUs were identical in the rst researchers transcript. Table 2 also shows that the numbers are similar
when the rst and third transcribers performances are compared.
Not only were these results discouraging, we also felt that this positivist
view did not t the purpose of our intertranscriber checks, which was to
improve the quality of transcription in the conciliation talk project. That
is, we felt that any comparisons between the transcripts should facilitate
the development of the rst researchers transcription skills.
Extracts (3) and (4) contain parallel representations of a segment of
talk from Transcripts 1 and 2 and Transcripts 1 and 3, respectively. The
annotation in these parallel transcripts illustrates the approach we nally
used in order to produce a more meaningful comparison of the three transcripts. The lines in these parallel transcriptions are numbered to facilitate
line by line comparisons of IUs identied by the three transcribers. The
additional annotation, in the right-hand column in Extracts (3) and (4),
records dierent types of agreement, using Transcript 1 as the point of
reference. We recorded the following types of agreement:
identical IUs;
IUs with a common initial boundary;
IUs with a common nal boundary;
talk identied as two or more IUs by the intertranscribers, but that
has the same overall initial and nal boundaries as a single IU identied by the rst transcriber.

(3) (Comparison of Transcript 1 and Transcript 2)


Transcript 2

Type of agreement

and you know Im er --

and you know Im -er er


Im aware thats like er -its a part of your sort of spiritual make up to -. . to confront . . . the situation,

same initial boundary

er Im aware thats like er -its a part of your sort of spiritual make up to,
. . to confront . . the situation.
. . er and er . . move on from it.

same nal boundary


identical
identical

. . .(1.0) er and er -. . .(1.0) move on from it.


but again er -. . .(2.0) dealing -you know,
like having to handle that,
. . .(1.0) you know and -or the enormity of it,

identical
identical
identical

Transcript 1

Transcript 3

Type of agreement

and you know Im er --

and you know Im er,

identical

er Im aware thats like er -its a part of your sort of spiritual make up to,
. . to confront . . the situation.

er Im aware thats like er,


its a part of your sort of spiritual make up to-to confront-the situation,
er and er,
move on from it.
but again er,
dealing you know like,

identical
identical
One IU transcribed as two IUs

having to handle that.


you know and [uh],
or the-enormity of it.

same nal boundary


identical
One IU transcribed as two IUs

but again er -. . .(1.0) dealing you know,


like having to handle that,
you know and -. . or the enormity of it,

One IU transcribed as two IUs


identical
One IU transcribed as two IUs

(4) (Comparison of Transcript 1 and Transcript 3)

. . er and er . . move on from it.


but again er -. . .(1.0) dealing you know,
like having to handle that,
you know and -. . or the enormity of it,

One IU transcribed as two IUs


identical
same initial boundary

375

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Intonation units in spoken interaction

1
2
3
4
5
6
7
8
9
10
11
12
13
14

Transcript 1

376

Juurd H. Stelma and Lynne J. Cameron

A comparison of the talk represented in Extracts (3) and (4), using the
above types of agreement, showed that the rst and second transcribers
agreed fully on six IUs (lines 4, 5, 9, 12, 13, and 14; marked identical in
Extract [3]). The rst and third transcribers agreed on ve IUs (lines 1, 3,
4, 9, and 13; marked identical in Extract [4]). Equally interesting from
our perspective of developing transcription skills was that the rst and
second transcribers agreed on the initial boundary in line 1 (marked
same initial boundary in Extract [3]), and on the nal boundary in line
3 (marked same nal boundary in Extract [3]). A similar pattern is evident between the rst and third transcribers for lines 10 and 12 (see Extract [4]). Moreover, the single IUs identied by the rst transcriber in
lines 7 and 10 are transcribed as two IUs by the second transcriber (while
retaining the same overall initial and nal boundaries as the single IUs
identied by the rst transcriber; see Extract [3]). The same relationship
holds between the rst and the third transcribers for lines 5, 7, and 14.
That is, the rst transcriber has transcribed these lines as single IUs,
whereas the third transcriber has identied two IUs (but retaining the
same overall initial and nal boundaries as the single IUs identied by
the rst transcriber; see Extract [4]). There were also cases in the data
where the second or third transcribers identied three or more IUs with
the same overall initial and nal boundaries as a single IU identied by
the rst transcriber. The parallel representation of transcriptions in Extract (5) illustrates such an instance.
(5)

(A single IU in Transcript 1 identied as three IUs in Transcript 3)

Transcript 1

Transcript 3

so it was partly on on
. . er . . a political thing,

so it was partly on on-um,


a political thing,

Closer inspection of the full versions of the dierent transcripts showed


that these more subtle forms of agreement, as illustrated in Extracts (3),
(4), and (5), were frequent. Table 3 summarizes the level and dierent
types of agreement between the rst and second transcribers, as well as
the number of IUs uniquely identied by the second transcriber. The rst
row in Table 3 shows that the rst and second transcribers agreed on 93
IUs as identical (also reported in Table 2). The second row shows that
there were 21 instances where the second transcriber identied two IUs
with the same overall initial and nal boundaries as a single IU identied
by the rst transcriber. In the same way, there were four instances where
the second transcriber identied three IUs where the rst transcriber

Intonation units in spoken interaction


Table 3.

Identication of IUs in Transcript 2 as compared to Transcript 1

Instances when the rst and second transcribers agreed on the identication of an IU
Instances when the same talk was identied as one IU by the rst transcriber, but as
two IUs by the second transcriber
Instances when the same talk was identied as one IU by the rst transcriber, but as
three IUs by the second transcriber
Instances when the same talk was identied as one IU by the rst transcriber, but as
four IUs by the second transcriber
Instances where both the rst and the second transcriber agreed on the initial
boundary of an IU
Instances where both the rst and the second transcriber agreed on the nal
boundary of an IU
IUs uniquely identied by the second transcriber

Table 4.

377

93
21
4
1
17
11
7

Identication of IUs in Transcript 3 as compared to Transcript 1

Instances when the rst and second transcribers agreed on the identication of an IU
Instances when the same talk was identied as one IU by the rst transcriber, but as
two IUs by the second transcriber
Instances when the same talk was identied as one IU by the rst transcriber, but as
three IUs by the second transcriber
Instances when the same talk was identied as one IU by the rst transcriber, but as
four IUs by the second transcriber
Instances when the same talk was identied as one IU by the rst transcriber, but as
ve IUs by the second transcriber
Instances where both the rst and the second transcriber agreed on the initial
boundary of an IU
Instances where both the rst and the second transcriber agreed on the nal
boundary of an IU
IUs uniquely identied by the second transcriber

98
33
6
2
1
9
9
5

heard only one, and one instance where the second transcriber identied
as many as four IUs corresponding to the one that the rst transcriber
identied. Finally, there were 17 instances where the rst and second
transcribers identied the same initial boundary and 11 instances where
they identied the same nal boundary (see Extracts [3] and [4] for illustration). The nal row in Table 3 records the number of IUs uniquely
identied by the second transcriber. Table 4 shows the same comparison,
but now between the rst and the third transcribers.
We found this more subtle analysis more productive for our purposes
than a straightforward inter-rater check. For one, it begins to explain
why the rst transcriber identied only 159 IUs, as compared to the second transcribers 186 IUs and the third transcribers 218 IUs. Although
the absolute level of agreement reported in Table 2, counting only IUs
fully agreed upon, appears low, the rst transcribers identication of

378

Juurd H. Stelma and Lynne J. Cameron

boundaries between IUs appeared promising. That is, the more subtle
analysis shows that the real problem, as it were, was that the rst transcriber did not identify enough, or all, boundaries between IUs, thereby
identifying fewer IUs overall. This was productive for the purpose of developing transcription skills, as it indicated a specic area that the rst
transcriber could pay attention to, namely improving his identication of
boundaries between IUs.

8.

Developing transcription skills

In order to improve his skills in recognizing IU boundaries, the rst researcher again consulted Chafes six characteristics of IUs, reviewed in an
earlier part of this paper. Chafe (1994: 59) uses the following typical IU
to illustrate four out of his six characteristics:
. . and so the hall is re`al long%.
. . .(.36) [the next intonation unit]

Pausing: This IU is preceded by a very brief pause (indicated by two dots)


and followed by a slightly longer measured pause of 0.36 seconds.
Duration: The IU starts with three words spoken with an accelerated pace
(transcribed in small print: and so the). Such accelerated speech at the beginning of an IU is also called anacrusis (Cruttenden 1986: 39). Note also
that the nal word in the IU is transcribed with an equal sign following
the vowel (long), indicating that this is spoken more slowly. There is
therefore a general pattern of slowing down from the beginning toward
the end of the IU. The signicance of this for identifying IU boundaries
is that when the speech speeds up again this may indicate the start of a
next IU.
Pitch: Of the three prominent words in the IU (hall, real, and long),
the rst word (hall) has the highest pitch (as measured in hertz), the second word (real) has a slightly lower pitch, and the nal prominent word
(long) has the lowest pitch. This, then, is a declining pitch pattern within
the IU. IUs may often be part of yet larger units, spanning several IUs,
and with a marked pattern of declining pitch from IU to IU. These larger
units are variously called declination units (Schuetze-Coburn et al. 1991)
and paratones (Yule 1980). Declination units may help to identify individual IU boundaries, with each successive IU marked by a slight resetting of the pitch baseline within an overall pattern of declining pitch
across units (cf. Schuetze-Coburn et al. 1991). The IU is also characterized by a falling nal pitch contour (marked by a period).

Intonation units in spoken interaction

379

Voice quality: The end of the last word of the IU (long) is spoken with a
creaky voice (marked by the percent sign).
A similarly rigorous identication of IUs in the project data encountered
some diculties. The rst transcriber initially found Chafes use of a period to mark sentence-nal falling pitch, a comma to mark contours that
are not sentence-nal, and a question mark to represent an appeal, potentially confusing. Du Bois et al. (1993) was helpful here. They also use
period, comma, and question mark to distinguish intonation contours as
being either nal, continuing, or an appeal. According to Du Bois et
al. (1993: 53), this transitional continuity of intonation contours will have
various realizations, one of which is the nal pitch movement of IUs. In
the above example, from Chafe, the falling pitch contour at the end of the
IU is associated with a nal intonation contour. In sum, the rst researcher resolved to re-learn the use of these familiar symbols, the punctuation marks, to describe the intonation contours of IUs, and he used nal pitch movement of IUs as a heuristic to aid the identication of these
intonation contours. At the same time, the identication of nal pitch
movement turned out to be helpful in the identication of IU boundaries.
The rst transcriber encountered two additional diculties. Firstly, socalled changes in voice quality at the beginning and end of IUs (e.g., Chafes creaky voice) were dicult to recognize. Secondly, distinctly discernible declining patterns of pitch level were not found within or across the
identied IUs. We suspect that these may be rather subtle characteristics
of IUs, and that extensive experience may be needed before one may be
able to use these productively for identifying IU boundaries. We did,
however, notice that IU boundaries were sometimes associated with a
slight re-setting of the pitch baseline; i.e., IUs sometimes started with a
slightly lower or higher pitch than the preceding talk. In addition, we also
added prominence as a characteristic of IUs in our data. That is, we often
saw that IUs in our project data contained at least one prominent syllable
or word.
The careful re-transcription of the ve-minute sample segment, allowing for the above subtleties, resulted in the following characteristics of
IUs identied in the project data:
1. Pitch: Beginning with a re-setting of the pitch baseline and having a
recognizable nal pitch movement
2. Duration: Beginning with a shortening of syllables/words (i.e., anacrusis) and ending with lengthened syllables/words
3. Prominence: Including at least one clearly prominent word, achieved
with altered intensity, pitch, or a combination of both
4. Pausing: Being preceded and followed by a pause

380

Juurd H. Stelma and Lynne J. Cameron

Note that these characteristics include elements not only from Chafe
(1994). We also include one of Cruttendens (1986, 1997) internal criteria;
i.e., that there should be at least one prominent word or syllable in an IU.
We accept, however, that the way we recognize prominence is somewhat
dierent than the pitch movement to or from an accented syllable that
Cruttenden uses as an internal criterion for identifying intonation groups.
An important tool helping the rst transcriber to arrive at, and practice
the use of, the above criteria in his identication of IUs, was what we call
an enhanced transcription. The conventions used in this enhanced transcription build on Chafes (1994) narrow transcription of IUs, as illustrated above; include elements from Du Bois et al. (1993); and add our
own notation where necessary. Example (a) below illustrates the enhanced transcription conventions, and thereby also the four criteria we
used to identify IUs in the data. The enhanced conventions are, then, a
way to make transparent the decision-making involved in the identication of IUs in the ve-minute sample segment of the conciliation talk. By
contrast, the conventions used in the nal transcription of the full data set
were guided by what we needed to re-present for our exploration of metaphor dynamics. Hence, in the later full transcription we used the more
economical conventions illustrated by Example (b).
a. . . .(2.0) _back in the ^moment\
b. . . .(2.0) back in the moment.
In Example (a), there is a two second pause at the beginning of the IU
(marked by the three dots and parentheses indicating the length of the
pause). At the beginning of the IU there is a resetting of the pitch baseline
(marked by the underscore). The rst three words of the IU are accelerated (anacrusis; marked by the smaller font size). The rst syllable of the
word moment is prominent (marked by the caret). The rst and second
vowels of the word moment are both lengthened (marked by the equal
signs). Finally, the IU ends with falling nal pitch (marked by the backward slash). In Extract (6) the conventions are applied to a longer stretch
of talk from the rst researchers enhanced transcription of the veminute sample segment of conciliation talk. The line numbering in this
extract, as well as in subsequent extracts, corresponds to the line numbering in the enhanced transcript (see Appendix 2 for an illustrative
segment).
(6) (Illustration of conventions used in the enhanced transcription)
39 Pat: and you ^know Im er -40
. . .(1.0) er Im ^aware thats like er -41
. . its a part of your sort of spiritual ^make up to\

Intonation units in spoken interaction


42
43 Jo:
44 Pat:
45
46
47
48
49
50
51
52
53
54

381

. . .(2.0) to con^front/
[hmh]
. . .(1.0) _[the] situ^ation/
. . . _and er -. . . move ^on from it\
. . .(1.0) but a^gain/ -_er
. . .(3.0) _^dealing /-_you know\
like . . ^having to handle that\
. . . _you know\
and er
. . .(1.0) _or the . . e^normity of it\

A number of things are evident from an examination of the enhanced


transcription in Extract (6). First of all, not many IUs encompass all the
characteristics of our typical IU. That is, any single IU is identied using
only a subset of the characteristics. This observation was uniform across
all parts of the ve-minute sample segment. It is also evident that the talk
contains many incomplete utterances (often marked as truncated IUs)
and several so-called regulatory IUs (e.g., you know and but again; cf.
Chafe 1993). This less uent nature of the talk was not uniform across
the ve-minute sample segment of conciliation talk. However, we noticed
that less uent stretches of talk were associated with lower levels of
agreement in the earlier intertranscriber checks, and also that the rst researchers enhanced transcription of these stretches was very dierent as
compared to his initial transcription.
Extract (7) exemplies how the transcription of another stretch of less
uent talk changed a great deal, from the rst researchers initial transcription and the enhanced transcription. Just as in Extract (6), the enhanced transcript in Extract (7) includes a number of truncated IUs (lines
146, 152, 154, and 155) and some regulatory IUs (you know in lines 152
and 157).
(7)

146
147
148
149

(Transcript 1 and the enhanced transcript for a less uent stretch of


talk)
Transcript 1

Enhanced transcript

and that . . that er --

. . and that -. . .(1.0) _that um\


. . _that ^pain

that pain that loss . . was shared


by . . by everyone.

_that ^loss

382
150
151
152
153
154
155
156
157
158

Juurd H. Stelma and Lynne J. Cameron

you know and after that,


. . er the pain on on every side,

you know,
I felt it.

. . . _was ^shared by\


. . by ^everyone\
. . . you know an- -and after that
. . .(1.0) _um -_the ^pain on -on ^every side\
. . _you know\
. . _I ^felt it\

Extract (8) shows a stretch of talk that was more uent, in the sense that
it includes fewer truncated IUs (only one in Extract [8]: when you come
-- in line 12) and fewer regulatory IUs (only one clear example in Extract
[8]: I mean in line 13). Such more uent stretches of talk were also
common in the ve-minute sample segment.
(8) (Enhanced transcription of a more uent stretch of talk)
8 Pat: theres ^no right
9
. . .(1.0) ^for ^me
10
. . .(2.0) _to sit here and be for^given\
11
. . .(2.0) if you understand me
12 Jo: hmh
13 Pat: _I mean
14
in a ^sense theres the political thing
15
I knew what I was ^doing\
16
. . and I would even de^fend actions Ive taken
17
_etcetera
18
_^but
19
_when you come -20
when it comes ^down to it\
21
. . I am ^sitting with somebody whos aected by\
22
. . . _^my actionns
The more uent stretches of talk, as in Extract (8), were also dierent in
that they showed high levels of agreement in the earlier intertranscriber
checks, and the transcription of these stretches of talk did not change
much between the rst researchers initial transcription and the enhanced
transcription. The parallel representation in Extract (9) illustrates how the
transcription of the more uent stretch of talk in Extract (8) did not
change much between the initial transcription (Transcript 1) and the enhanced transcription; only two IUs, in lines 14 and 22, are unique to the
enhanced transcription.

Intonation units in spoken interaction


(9)

(A more uent stretch of talk from Transcript 1 and the enhanced


transcript)
Transcript 1

Enhanced transcript

8
9
10

there is no right,
. . for me,
. . to sit here and be forgiven.

11
12
13

. . . if you understand me.

theres ^no right


. . .(1.0) ^for ^me
. . .(2.0) _to sit here and be
for^given\
. . .(2.0) if you understand me
hmh
_I mean

14
15
16
17
18
19
20
21

22

383

I mean in a sense there is the


political thing,

I knew what I was doing,


. . and I would even defend
actions I have taken,
etcetera.
but,
when it come -when it comes down to it,
I am sitting with somebody
who is aected by . . my
actions.

in a ^sense theres the political thing


I knew what I was ^do:ing\
. . and I would even de^fend
actions Ive taken
_etcetera
_^but
_when you come -when it comes ^down to it\
. . I am ^sitting with somebody
whos aected by\

. . . _^my actionns

It seems, then, that stretches of talk that included a lot of truncated and
regulatory IUs were particularly challenging for the rst transcriber. Successfully dealing with these stretches of talk was therefore instrumental
for the development of his transcription skills. By contrast, the transcription of stretches of talk with fewer truncated and regulative IUs were associated with higher levels of intertranscriber agreement, and changed
much less across the dierent stages of transcription.
The nal stage, then, was the transcription of the full data set, putting
the skills that the rst transcriber had developed to work. This transcription did not use the enhanced transcription conventions. Rather, the project now reverted to a more simple set of conventions agreed as sucient
for the purposes of the research project. Extract (10) illustrates the conventions used in this nal transcription.
(10)

(Illustration of conventions in the nal transcription of the conciliation talk)

384

Juurd H. Stelma and Lynne J. Cameron

39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

Pat:

9.

Conclusion

Jo:
Pat:

and you know Im er -. . .(1.0) er Im aware thats like er -. . its a part of your sort of spiritual make up to,
. . .(2.0) to confront.
[hmh]
. . .(1.0) [the] situation.
. . . and er -. . . move on from it.
. . .(1.0) but again -er,
. . .(3.0) dealing -you know,
like . . having to handle that.
. . . you know,
and er,
. . .(1.0) or the . . enormity of it.

In this paper, we have highlighted issues around the development of transcription skills, in particular the problems of working with spontaneous
talk, such as that found in spoken interaction on emotionally charged
topics. Our position is that the process of transcription is key to rigorous
research involving spoken interaction data, and the present paper oers,
we believe, a unique description of the challenges of the transcription process in one particular project, as well as the ways that these challenges
were overcome. In the following, we summarize this process of developing transcription skills in our research project. This summary may, at the
same time, be seen as a set of recommendations, or an agenda, that can
inform the transcription process of researchers engaged in other similar
projects working with spoken discourse data.
Our starting point was the decision to transcribe the conciliation talk
using IUs. Working toward this aim, we trialed our transcription using
a ve-minute sample segment of talk. The rst transcriber began by transcribing the segment based on denitions of IUs available in the literature. Next, we involved two more experienced intertranscribers in the rst
transcribers learning process. Following this, the rst transcriber made
his decisions transparent using our enhanced transcription conventions.
Finally, we developed project-specic conventions, criteria, and procedures for the transcription of the full data set.

Intonation units in spoken interaction

385

The rst step, then, was to prepare an initial transcript of the veminute sample segment of conciliation talk based primarily on the characteristics of IUs outlined by Chafe (1994), but also consulting Cruttenden (1986, 1997). However, as Chafe (1994: 62) points out, and we ourselves discovered at the end of this rst step, skills in the transcription of
IUs requires both instruction and practice, and takes time and eort to
develop. In particular, the later intertranscriber checks, as well as the subsequent externalizing of the transcribers decisions, showed that the criteria for identifying IUs were only partially internalized by the rst transcriber in this rst step of the transcription process.
We, next, had two additional researchers transcribing our ve-minute
sample segment of conciliation talk. Both of these transcribers brought
unique and valuable experience to the task. Although we originally set
out to compare the three resulting transcriptions, we found this to be a
less fruitful exercise for the development of the rst transcribers skills.
Rather, anticipating the necessarily more autonomous task of transcribing the full data set, the rst transcriber worked through the two additional transcriptions now available to him, noted both dierences and
similarities, what the nature of these dierences and similarities were, and
then reected on whether these indicated areas where his own transcription might change and/or improve. For example, the detailed intertranscriber analysis showed that IU boundaries had been correctly identied,
but that a number of potential IU boundaries had been missed. With this
insight, and re-engaging with the literature on IUs, the ve-minute sample
segment was re-transcribed. In sum, this exercise may be described as listening, noticing, and reecting using already prepared transcripts, in this
case the transcripts of two experienced researchers. The exercise, then,
involved learning with rather than from the transcripts of the more
experienced researchers. More generally, we believe that intertranscriber
checks should be geared toward improving the quality of transcription,
rather than acting as a simple objective measure validating the reliability
of transcription. This does not mean that we dismiss the need for reliability across transcribers. Rather, our experience tells us that such reliability
can only follow from a more general concern with the quality of transcription within a project.
In the following step of the transcription process, the rst transcriber
used the enhanced transcription conventions (see Extract [6]). This exercise was instrumental in making transparent the decision-making process
involved in the identication of IUs. It also acted to highlight particular
challenges of transcribing IUs with consistency. For example, the rst
transcriber found that paying equal attention to the beginnings and ends
of IUs increased the consistent identication of IUs. This also highlighted

386

Juurd H. Stelma and Lynne J. Cameron

characteristics of IUs, such as anacrusis and re-setting of the pitch baseline at the beginning of IUs, as well as syllable lengthening and terminal
pitch contours at the end of IUs. Importantly, this equal focus on both
the beginnings and ends of IUs was subtly dierent than the concern
with boundaries between intonation units, which had been the focus
of the rst transcriber at the earlier intertranscriber stage. Making the
decision-making processes more transparent also highlighted that identifying IUs was less problematic in the case of more uent stretches of
talk, and more problematic in the case of less uent stretches of talk.
From this, we became aware that the existing literature provides insucient detail on how to deal with hesitations, false starts, and talk used to
regulate ongoing spoken interaction, as well as the eect of these truncated intonation contours on the transcription of co-present more complete intonation contours. Finally, making the transcription decisions
transparent using our enhanced conventions helped ensure the development of transcription skills to a level where, when the rst transcriber
went on to transcribe the full data set, he was working not at the limits
of his skill level, but within his skill level.
Finally, the development of project-specic conventions, criteria, and
procedures was an ongoing aim, aecting all stages of the transcription
process. At the beginning of the project, we decided to transcribe the conciliation talk using IUs. This was based on our intuitions that such a transcription would facilitate the investigation of the dynamics of metaphor
use in the conciliation talk. In later stages, more ne-grained decisions
were made about what features of the spoken discourse to represent in
the transcripts. These decisions were not taken in isolation from the rst
transcribers gradually developing skills. Rather, his developing skills in
the transcription of IUs acted at times to create aordances, i.e., opening
up new possibilities for what could be included in the nal project transcription, and at times as a constraint, i.e., what the rst transcriber could
do within his skill level limited what could be included in the nal project
transcription.
One nal activity that may usefully be added to this agenda for developing transcription skills is the possibility of learning from other transcribers experiences. This may involve talking to other researchers with
similar responsibilities within a research project, or with more experienced researchers who in the past have had such transcription responsibilities. It may also include interacting with the emerging research methods
literature that describes the experiences of transcribers (e.g., Gregory et al.
1997; Bird 2005; Tilley 2003a, 2003b; Tilley and Powick 2002). The present paper is a further contribution to this emerging research methods
literature.

Intonation units in spoken interaction

387

Appendix 1: Transcription conventions


Discourse feature
Transitional
continuity:
Final pitch
contour:
Pitch baseline:
Truncation:
Prominence:
Accelerated pace:
Slower pace:
Pauses:

Overlapping
speech:
Uncertain
hearing:

Convention
,
.
?
/
\
_ (at the end of IU)
_ (at the start of IU)

[ ]

Description
Continuing
Final
Appeal
Rising
Falling
Level
Re-setting of the pitch
baseline
Truncated word
Truncated intonation unit
Prominent syllable or word
Small print
Equal sign
Short pause (<0.4 s)
Medium pause (0.4 to 0.75 s)
Longer pause (length in
seconds indicated in
parenthesis)
Overlapping speech

3X X4

Not clearly audible speech

-^
and so the

..
...
. . .(2.0)

Appendix 2: Transcriptions of the ve-minute sample segment


Transcript 1: First transcribers initial transcript
Pat: you know,
its broken some sort of taboo here.
and er -its into that territory,
you know,
of er -there is no right,
. . for me,
. . to sit here and be forgiven.
. . . if you understand me.
I mean in a sense there is the political thing,
I knew what I was doing,
. . and I would even defend actions I have taken,

388

Juurd H. Stelma and Lynne J. Cameron


etcetera.
but,
when it come -when it comes down to it,
I am sitting with somebody who is aected by . . my actions.
and er -. . thats er -. . . theres no preparation in the world for that.
I dont think.
. . you know its er -3X certain X4 -. . I think it is unique.
and er -. . I couldnt possibly have anticipated,
er -. . your response and what have.
I was aware from speaking to certain people,
how . . you saw this as a journey etcetera.

Transcript 2: Second transcriber (experienced, but not with IUs)


Pat: you know,
its . . broken some sort of taboo here.
and er -it goes into that territory,
and er -theres no ^right -. . .(1.0) ^for ^me,
. . .(1.0) to sit here and be forgiven.
. . .(3.0) if you understand me?
I mean in a sense theres the political thing,
I knew what I was doing,
. . .(1.0) and I would even ^defend actions Ive taken,
etcetera.
but -when you come -when it comes down to it,
I am ^sitting with somebody whos ^aected by
. . .(1.0) ^my actions,
. . .(1.0) and er -. . .(1.0) thats er -. . theres no ^preparation in the world for that.
I dont think.

Intonation units in spoken interaction


you know its er something -. . .(2.0) I think its ^unique,
and er -. . .(2.0) I couldnt ^possibly have anticipated-. . .(2.0) um
. . .(2.0) ^your response . . and what have you,
I was aware from speaking to certain ^people,how--,
. . .(1.0) you . . saw this as a journey etcetera.
Transcript 3: Third transcriber (experienced with IUs)
Pat: you know its-broken some sort of taboo here,
and er it,
goes into that territory,
you know,
of er,
theres no right.
for me.
to sit here and be forgiven,
if you understand me.
I mean in a sense theres the political thing,
I knew what I was doing.
and I would even defend actions I have taken,
etcetera.
but-when you come-when it comes down to it,
I am sitting with somebody whos aected by-my actions.
and er,
thats er,
theres no preparation in the world for that.
I dont think.
you know its er,
something-I think its unique.
and er,
I couldnt possibly have anticipated,
um,
your response,
and what have I was aware from speaking to certain people,

389

390

Juurd H. Stelma and Lynne J. Cameron


how-you-saw this as a journey etcetera.

Enhanced
1 Pat:
2
3 Jo:
4 Pat:
5
6
7
8 Pat:
9
10
11
12 Jo:
13 Pat:
14
15
16
17
18
19
20
21
22
23
24
25
26 Jo:
27 Pat:
28
29
30
31
32
33
34
35
36
37

transcript: First transcriber


you know its
. . broken some sort of ta^boo here/
. . . hmh
and er -it goes into that ^territory/
_you know\
. . . of er -theres ^no right
. . .(1.0) ^for ^me
. . .(2.0) _to sit here and be for^given\
. . .(2.0) if you understand me
hmh
_I mean

^sense theres the political thing


knew what I was ^doing\
. . and I would even de^fend actions Ive taken
_etcetera
_^but
_when you come -when it comes ^down to it\
. . I am ^sitting with somebody whos aected by\
. . . _^my actions
. . .(1.0) and er -. . .(2.0) thats er -. . .(2.0) theres no preparation in the ^world for that\
[hmh]
. . . _[I] dont think\
. . .(1.0) _you know its er -. . something -. . .(1.0) I think it is unique\
. . and er\
. . .(2.0) I couldnt possibly have an^ticipated\
. . .(1.0) _er
. . _^your response and what have\
I was a^ware from speaking to certain people/
. . .(1.0) how . . . y- you -. . saw this as a ^journey etcetera\
in a
I

Intonation units in spoken interaction

391

Notes
1. The project, Using visual display to investigate the dynamics of metaphor in conciliation talk, was supported by the UKs Arts and Humanities Research Board under its
Innovation Award scheme. We acknowledge that support, and also thank the participants in the talk for giving permission to use the data.
2. This view of consciousness does not necessarily dismiss a role for unconscious mental
processes. In fact, Chafe (1996: 39) concedes that both the content . . . ow and management of consciousness is probably in large part unconsciously determined. Our assessment is that the exact division of labor, between conscious and unconscious processes,
does not impact upon the process of transcribing intonation units.
3. Pierrehumbert and Hirschberg (1990) suggest that intonational phrases are made up of
one or more intermediate units. These intermediate units are roughly similarly to what
we in this paper call intonation units.

References
Atkinson, J. M. and Heritage, J. (1984). Transcript notation. In Structures of Social Action:
Studies in Conversational Analysis, J. M. Atkinson and J. Heritage (eds.), ixxvi. Cambridge: Cambridge University Press.
Bird, C. M. (2005). How I stopped dreading and learned to love transcription. Qualitative
Inquiry 11 (2): 226248.
Cameron, L. (2003). Metaphor in Educational Discourse. London: Continuum.
Cameron, L. J. (forthcoming). Patterns of metaphor use in reconciliation talk. Discourse &
Society.
Cameron, L. J. and Stelma, J. H. (2004). Metaphor clusters in discourse. Journal of Applied
Linguistics 1 (2): 107136.
Chafe, W. (1993). Prosodic and functional units of language. In Talking Data: Transcription
and Coding in Discourse Research, J. A. Edwards and M. D. Lampert (eds.), 3343. Hillsdale, NJ: Lawrence Erlbaum.
(1994). Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago: Chicago University Press.
(1996). How consciousness shapes language. Pragmatics and Cognition 4 (1): 3554.
Cook, G. (1990). Transcribing innity: Problems of context presentation. Journal of Pragmatics 14: 124.
Couper-Kuhlen, E. (2001). Intonation and discourse: Current views from within. In The
Handbook of Discourse Analysis, D. Schirin, D. Tannen, and H. E. Hamilton (eds.),
1334. Malden, MA: Blackwell.
Cruttenden, A. (1986). Intonation. Cambridge: Cambridge University Press.
(1997). Intonation, 2nd ed. Cambridge: Cambridge University Press.
Crystal, D. (1969). Prosodic Systems and Intonation in English. London: Cambridge University Press.
Dressler, R. A. and Kreuz, R. J. (2000). Transcribing oral discourse: A survey and a model
system. Discourse Processes 29 (1): 2536.
Du Bois, J. W. (1991). Transcription design principles for spoken discourse research. Pragmatics 1 (1): 71106.
(2000). VoiceWalker: A discourse transcription utility. URL: 3http://www.linguistics
.ucsb.edu/resources/computing/download/documentation.htm4 [accessed 27 May 2005].

392

Juurd H. Stelma and Lynne J. Cameron

Du Bois, J. W., Schuetze-Coburn, S., Cumming, S., and Paolino, D. (1993). Outline of discourse transcription. In Talking Data: Transcription and Coding in Discourse Research,
J. A. Edwards and M. D. Lampert (eds.), 4589. Hillsdale, NJ: Lawrence Erlbaum.
Edwards, J. A. (1993). Principles and contrasting systems of discourse transcription. In Talking Data: Transcription and Coding in Discourse Research, J. A. Edwards and M. D.
Lampert (eds.), 331. Hillsdale, NJ: Lawrence Erlbaum.
Ford, C. E. and Thompson, S. A. (1996). Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the management of turns. In Interaction and
Grammar, E. Ochs, E. A. Scheglo, and S. A. Thompson (eds.), 134184. Cambridge:
Cambridge University Press.
Green, J., Franquiz, M., and Dixon, C. (1997). The myth of the objective transcript: Transcribing as a situated act. TESOL Quarterly 31 (1): 172176.
Gregory, D., Russell, C. K., and Phillips, L. R. (1997). Beyond textual perfection: Transcribers as vulnerable persons. Qualitative Health Research 7 (2): 294300.
Gumperz, J. J. and Berenz, N. (1993). Transcribing conversational exchanges. In Talking
Data: Transcription and Coding in Discourse Research, J. A. Edwards and M. D. Lampert
(eds.), 91121. Hillsdale, NJ: Lawrence Erlbaum.
Halliday, M. A. K. (1967). Intonation and Grammar in British English. The Hague: Mouton.
Kreckel, M. (1981). Communicative Acts and Shared Knowledge in Natural Discourse. London: Academic Press.
Lapadat, J. C. (2000). Problematizing transcription: Purpose, paradigm and quality. International Journal of Social Research Methodology 3 (3): 203219.
Lapadat, J. C. and Lindsay, A. C. (1998). Examining transcription: A theory-laden methodology. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA, 1317 April (Eric Document Number ED 419 821).
(1999). Transcription in research and practice: From standardization of technique to interpretive positionings. Qualitative Inquiry 5 (1): 6486.
Lindsay, J. and OConnell, D. C. (1995). How do transcribers deal with audio recordings of
spoken discourse. Journal of Psycholinguistic Research 24 (2): 101115.
McLellan, E., MacQueen, K. M., and Neidig, J. L. (2003). Beyond the qualitative interview:
Data preparation and transcription. Field Methods 15 (1): 6384.
Muller, N. and Damico, J. S. (2002). A transcription toolkit: Theoretical and clinical considerations. Clinical Linguistics & Phonetics 16 (5): 299316.
Ochs, E. (1979). Transcription as theory. In Developmental Pragmatics, E. Ochs and B. B.
Schieelin (eds.), 4372. New York: Academic Press.
OConnell, D. C. and Kowal, S. (1994). The transcriber as language user. In The Dynamics
of Language Processes: Essays in Honor of Hans W. Dechert, B. Guillermo (ed.), 119142.
Tubingen: Gunter Narr.
(1999). Transcription and the issue of standardization. Journal of Psycholinguistic Research 28 (2): 103120.
Oelschlaeger, M. L. and Thorne, J. C. (1999). Application of the correct information unit
analysis to the naturally occurring conversation of a person with aphasia. Journal of
Speech, Language, and Hearing Research 42: 636648.
Park, J. S.-Y. (2002). Cognitive and interactional motivations for the intonation unit. Studies in Language 26 (3): 637680.
Pierrehumbert, J. and Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In Intentions in Communication, P. R. Cohen, J. Morgan, and
M. E. Pollack (eds.), 271311. Cambridge, MA: MIT Press.
Roberts, F. and Robinson, J. D. (2004). Interobserver agreement on rst-stage conversation
analytic transcription. Human Communication Research 30 (3): 376410.

Intonation units in spoken interaction

393

Romero, C., OConnell, D. C., and Kowal, S. (2002). Notation systems for transcription:
An empirical investigation. Journal of Psycholinguistic Research 31 (6): 619631.
Schuetze-Coburn, S., Shapley, M., and Weber, E. G. (1991). Units of intonation in discourse: A comparison of acoustic and auditory analyses. Language and Speech 34: 207
234.
Sealey, A. and Carter, B. (2004). Applied Linguistics as a Social Science. London:
Continuum.
Tilley, S. A. (2003a). Challenging research practices: Turning a critical lens on the work of
transcription. Qualitative Inquiry 9 (5): 750773.
(2003b). Transcription work: Learning through coparticipation in research practices.
Qualitative Studies in Education 16 (6): 835851.
Tilley, S. A. and Powick, K. D. (2002). Distanced data: Transcribing other peoples research
tapes. Canadian Journal of Education 27 (2/3): 291310.
Wennerstrom, A. (2001). The Music of Everyday Speech: Prosody and Discourse Analysis.
Oxford: Oxford University Press.
Wichmann, A. (2000). Intonation in Text and Discourse: Beginnings, Middles and Ends. Harlow, Essex: Pearson Education/Longman.
Yule, G. (1980). Speakers topics and major paratones. Lingua 52: 3347.
Juurd H. Stelma received a Ph.D. in Education from the University of Leeds in 2003 and is
currently a Lecturer in TESOL in the School of Education at the University of Manchester.
His main research interest is the development of methodology for exploring the dynamical
nature of language in use. Address for correspondence: School of Education, Humanities
Devas Street Building, University of Manchester, Oxford Road, Manchester, M13 9PL,
UK 3Juup.Stelma@manchester.ac.uk4.
Lynne J. Cameron is Professor of Applied Linguistics in the Centre for Language and Communication at the Open University. Her research seeks to understand how language is used
in building understanding between people, particularly through metaphor. She has published
widely on the use of metaphor in dierent settings, including the co-edited Researching and
Applying Metaphor (1999, with Graham Low) and her book Metaphor in Educational Discourse (2003). Address for correspondence: Centre for Language and Communication, Faculty of Education and Language Studies, The Open University, Milton Keynes, MK7 6AA,
UK 3L.J.Cameron@open.ac.uk4.

Вам также может понравиться