2008 - Effects of Training On Na Lve Listeners' Judgments of The Speech Intelligibility of Children With Severe-To-Profound Hearing Loss

Effects of Training on Nal ve Listeners
Judgments of the Speech Intelligibility

of Children With Severe-to-Profound
Hearing Loss
Lee W. Ellis
Svetlana A. Beltyukova
The University of Toledo, OH
Purpose: This study examined the effects of feedback training, familiarization training,
and no training on nave listeners word identification ( WI) and magnitude estimation
scaling ( MES) judgments of the speech intelligibility of children with severe-toprofound hearing impairments.
Method: Depending on the training group, listeners received a pretest, an immediate
posttest, and /or a delayed posttest.
Results: Results indicated that repeated exposure, with or without training, led to
improved WI scores. Beyond the effects of repeated exposure, listeners WI judgments
of the intelligibility of speech significantly increased immediately after training in
which listeners received feedback regarding the accuracy of their WI responses. The
MES results were less straightforwardlisteners in the feedback group perceived
speech samples as less intelligible after the training, perceptions of speech intelligibility
stayed almost the same for the familiarization training group, and participants in the
control group perceived speech samples as more intelligible at the posttest. For the
training groups that were not pretested, perceptions improved from the immediate to
delayed posttest.
Discussion: Results may have both theoretical and clinical significance, particularly as
they relate to contrasting theories of perceptual learning and the extent to which
listener characteristics may be reflected in intelligibility judgments.
KEY WORDS: speech intelligibility, listener training, children with hearing loss
everal studies have reported that professionals who work with deaf
or hard-of-hearing individuals tend to find their speech more understandable compared with those who are unfamiliar with the
speech of these individuals (Klimacka, Patterson, & Patterson, 2001;
McGarr, 1983; Monsen, 1983). Interestingly, in one study by Monsen
(1983), the difference in scores between experienced and inexperienced
listeners somewhat decreased as inexperienced listeners heard more
examples of hearing-impaired speech. Such a finding suggests that, at
least for some individuals, simply becoming more familiar with difficultto-understand speech may be an important factor in the individual
variance noted among listeners.
Beyond Monsens observation, however, there has been no systematic investigation of the effect of familiarization on listeners intelligibility judgments of the speech of hearing-impaired children. The effect of
familiarizing listeners with particular speakers or particular types
of speakers has been examined for listeners intelligibility judgments of
1114 Journal of Speech, Language, and Hearing Research
Vol. 51 11141123 October 2008 D American Speech-Language-Hearing Association

1092-4388/08/5105-1114
other speech-impaired populations, including individuals

with dysarthria (e.g., DePaul & Kent, 2000; Hustad &
Cahill, 2003; Liss, Spitzer, Caviness, & Adler, 2002; Tjaden
& Liss, 1995) and children with delayed speech development (Flipsen, 1995). However, not all studies have found
a significant effect for familiarization (e.g., Dowden &
Stoel-Gammon, 1992; Yorkston & Beukelman, 1983).
Yorkston and Beukelman (1992) emphasized the importance of continued research in the area of listener
training, noting it as important and underinvestigated.
In addition to the need for further research regarding
the effect of familiarization training, these researchers
also suggested examining the effect of knowledge of results (i.e., feedback) on the listeners ability to understand disordered speech. The potential for clinical and
theoretical relevance of such research becomes apparent
when one considers the implications of finding effective
listener-centered intervention strategies that could
enhance the intelligibility of hearing-impaired speech
to nave listeners. Thus, listener-centered training may
contribute to the overall goal of maximizing the communication between people with significantly reduced
intelligibility and their listeners.
Furthermore, research shows that additional insights
into the effects of training on listener behaviors when performing speech intelligibility tasks can be obtained by
looking at it in the context of perceptual learning. Improvements in perceptual judgments with practice have
been demonstrated for more than a century (Goldstone,
1998). Evidence for perceptual learning with practice
comes from research in cognitive psychology, hearing
and speech sciences, psychophysics, neuroscience, expert/
novice differences, development, computer science, and
cross-cultural differences (Goldstone, 1998).
Perceptual learning can be viewed as a change in
what is perceived (Gibson & Pick, 2000). According to
Gibson and Pick, two contrasting viewpoints underlie
theories of perceptual learningnamely, the enrichment
view and the differentiation view. In enrichment theories,
the underlying view is that perceptual learning consists
of supplementing initial sensory reception by the addition
of something. Depending on the particular theory, what is
assumed to be added to enrich the relatively barren, initial sensory experience might include associated ideas
(e.g., schemata, representations), learned responses (e.g.,
stimulusresponse discriminations), or some form of inference. The subsequent perception is viewed as having a
decreased correspondence with initial sensory experience.
In contrast, differentiation theories view the eventual perception as having an increased correspondence
with sensory experience. Gibsons and Picks (2000) ecological theory of perceptual learning is perhaps the best
known and most detailed differentiation theory, in which
perceptual learning is defined as an increased ability to
extract information from the environment as a result of

practice and experience. From the differentiation theory
perspective, perception begins as vague impressions,
gradually becomes differentiated into more specific percepts, and finally becomes more finely tuned to variations
in what is presented in stimulation. Perceptual learning
is viewed as a self-regulating process of detecting new
information in stimulation. Discrimination, rather than
association, is the basis of learning.
Epistemological differences between the two theories of perceptual learning naturally result in different
research perspectives on the effects of training on perceptual learning. For example, two broad classes of training mechanisms distinguished by cognitive scientists
(e.g., Goldstone, 1998)training mechanisms driven by
feedback (supervised training) and those that require no
feedback and, instead, operate on the structures inherent in environmentally supplied stimuli (unsupervised
training)would be predicted to have different effects
on perceptual learning depending on the underlying
theory. Thus, from an enrichment perspective, particularly one that is based on learning theory, supervised
training would be predicted to enhance perceptual learning. From a differentiation perspective, unsupervised
training would be sufficient to engender perceptual
change.
In the context of the proposed study, the question
arises as to the extent to which supervised (i.e., feedback) or unsupervised (i.e., familiarization) training may
impact listeners intelligibility judgments. A differentiation theory would suggest that, in general, increased exposure without feedback to difficult-to-understand
speech will increase accurate interpretation of this type
of speech. An enrichment perspective might argue that
knowledge of results (i.e., feedback) would be one possible mechanism for improving listeners performance on
intelligibility tasks.
A variety of research procedures have been used in
studies of speech production intelligibility. For the most
part, studies of the intelligibility of hearing-impaired
speakers have required listeners to identify or rate
isolated words or sentences read or imitatively produced
by hearing-impaired speakers. Most often, adolescents
or adults with severe and profound hearing losses have
served as the speakers in these studies.
Few studies have used spontaneously produced speech
as stimulus material. In fact, possible difficulty determining the speakers intended message has been cited as
a major disadvantage of using spontaneous speech samples. Attempts have been made to minimize this potential difficulty. Thus, Gordon-Brannan and Hodson
(2000) used continuous speech samples from children who
participated in a dialogue with the examiner about action pictures and a childs book. The children responded
Ellis & Beltyukova: Effects of Training
1115
to open-ended questions and requests for information. In

this way, the known context, coupled with verification
of the examiners transcript of the childs speech by an individual familiar with the child, served to increase the
degree of certainty of the childs intended message. GordonBrannan and Hodson found that unfamiliar listeners judging the intelligibility of these spontaneous speech samples
reached interjudge agreement levels ranging from .86 to
.95 and intrajudge reliability levels of .91 to 1.00.
In the available research on the intelligibility of
hearing-impaired speakers, listeners have typically been
required either to identify or rate the stimuli presented to
them. Although there is general consensus in the literature that write-down identification tasks represent
the most valid of the identification procedures (GordonBrannan & Hodson, 2000), there is less consensus regarding the most appropriate rating procedure to use in
studies of intelligibility.
The most frequently used rating procedure has been
interval scaling, particularly equal-appearing interval
scaling, although several researchers have commented
on the limitations of equal-appearing interval scaling as
an intelligibility measure. Listeners who have not received training to use the scales might have difficulty
interpreting what the numbers mean for individual
speakers. Also, untrained listeners might experience
difficulties using the scales for an individual speaker
(Gordon-Brannan & Hodson, 2000).
Schiavetti (1992) questioned the appropriateness of
interval scales for rating intelligibility, arguing that the
intervals between numbers are not likely to be equal along
the continuum (an assumption that is often made by some
researchers and often criticized by others). Schiavetti proposed that magnitude estimation scaling (MES) would be
a more appropriate rating procedure. Other advantages of
MES over equal-appearing interval scaling procedures
that have been cited in the literature include not constraining the listener (i.e., allowing freedom of judgments), increased sensitivity of measurement, and ease of use, among
others (Beltyukova, Fox, Stone, & Ellis, 2006; Grant &
Kinney, 1990; Sennott-Miller, Murdaugh, & Hinshaw,
1988). Meek, Sennott-Miller, and Ferketich (1992) note
the following:
Judgments and interpretations of any given stimulus
and the sensations associated with it are known to vary
greatly from person to person, time to time, and instrument to instrument I . Phenomena that have been
particularly difficult to measure include those that tap
subjective feelings. Complex social perceptions such as
creativity, intrusiveness, and difficulty, as well as the
individuals perception of physiological states I have
proved challenging to quantify. Magnitude estimation has been established as a useful approach to deal
with many measurement issues involved with the
perception of these phenomena. (p. 77)
1116
Although not explicitly mentioned, speech intelligibility falls within the class of the above-mentioned phenomena. Thus, more intelligible speech can be described
as less difficult to understand, whereas less intelligible
speech can be said to be more difficult to understand
from a listener-centered approach. Therefore, using MES
to measure perceptions of speech intelligibility (or difficulty) in this study was deemed useful and appropriate.
Our decision to use both MES and word identification
(WI) as measures of speech intelligibility was also based
on the premise that many factors affect intelligibility and
contribute to deficits in intelligibility; hence, more than a
single measure is often needed to provide the most complete representation of a speakers speech intelligibility.
As Weismer and Laures (2002) have noted, using identification methods alone may make a study of intelligibility incomplete because these methods focus primarily
on the segmental aspects of the speech signal and may be
less sensitive to the nonsegmental aspects (e.g., voice,
prosody). Including scaling methods that are more sensitive to these aspects may better represent the contribution of these factors to a speakers overall intelligibility.
The purpose of the present study was to examine the
effects of two methods of training (familiarization and
feedback) on nave listeners WI and MES judgments of
the intelligibility of speech of children with severe-toprofound hearing impairments. We also examined the
nature of individual change by type of training. The WI
and MES intelligibility judgments of nave listeners
following familiarization training or response feedback
training were compared to the intelligibility judgments
of other nave listeners who did not receive training.
Three major research questions were addressed:
1.
Is there an effect of training on WI and /or MES

(either immediate or delayed)?
2.
Is one training more effective than the other?
3.
Are there any patterns in individual change by type

of training?
These research questions were investigated for the

WI measure, keeping in mind both the enrichment and
differentiation theories. From the enrichment perspective, we expected an increase in WI scores for all training
groups. However, from the differentiation perspective,
we expected WI scores to improve over time, regardless
of the training. These hypotheses seem conflicting but
are supported by the two theories of perceptual learning
presented earlier. We chose to test both, given the preliminary and exploratory nature of the present study
and little previous research to guide us in forming more
specific predictions or hypotheses. No specific hypotheses about MES were formulated either. Because WI and
MES may not reveal the same aspects of intelligibility,
results obtained with the two measures may differ for
various reasons. Although, logically, one might expect
Journal of Speech, Language, and Hearing Research Vol. 51 11141123 October 2008
similar changes in both measures, the limited literature

related to MES as an intelligibility measure makes specific hypothesis formulation especially difficult at this
point.
Method
Using the Rasch (1960, 1980) analysis, the samples characterized by higher PCC scores and an absence of other
speech deviations were shown to be easier for listeners to
understand compared with those that had lower PCC
scores and different types of speech deviations. The Rasch
analysis also revealed that the range was sufficient for
the sample used in the study.
Preparation of Stimulus Materials
Participants
Eight digitally recorded speech samples (also referred

to as test samples) served as stimulus materials in pre- and
post-testing. The speech samples were elicited from 4 boys
and 4 girls (M = 9;4 [ years;months], SD = 1;0) who were
oral communicators with severe-to-profound hearing loss
(these children had all been diagnosed as severely or profoundly hearing impaired prior to age 2). At the time of
the sampling, the children were enrolled at a summer
camp for hearing-impaired children, and all were enrolled in speech and language therapy at the camp.
Fifty-five undergraduate students (M = 20;0, SD = 0;4)

from introductory classes in communication disorders
volunteered to participate. None of the listeners reported
any significant previous experience listening to, or judging the intelligibility of, the speech of children with
severe-to-profound hearing impairments. None of the
listeners reported any medical or physical conditions
that could interfere with their participation. All of the
participants reported speaking English as their primary
language, and all passed screening tests of speech, language, and hearing. Screening of speech and language
was completed by a licensed speech-language pathologist based on a brief case history, and observations were
made as the participant read a paragraph and conversed
with the examiner. None of the participants reported
any previous history of speech, language, or hearing
problems, and none of them evidenced any deviations in
speech or language during the reading and conversation
tasks. Pure-tone audiometric screening was performed
to assess each participants hearing acuity. Pure tones
were presented through earphones to both ears at 25 dB
for the frequencies of 1000 Hz, 2000 Hz, and 4000 Hz.
Each of the participants responded to the frequencies
presented, suggesting that hearing was within normal
limits bilaterally.
A narrative sample of approximately the first

50 words of a narrative production (story retelling) was
elicited from each child in a quiet room, yielding a total of
eight samples. These were digitally recorded using a
Marantz portable minidisk recorder ( Model PMD 650;
Marantz, Mahwah, NJ) with a lapel microphone. The
samples had a mean duration of 60 s and ranged in duration from 46 s to 69 s.
The eight test samples were selected from a larger
pool of samples from 12 children who were deaf or hard of
hearing to represent a range of intelligibility, on the one
hand, and have a short measure, on the other. In selecting the eight samples, procedures similar to those described
by Gordon-Brannan and Hodson (2000) were followed.
Two female graduate students in speech-language
pathology orthographically transcribed the samples of
the children. The transcripts were then compared to a
scoring key developed for each child by an individual
who was familiar with the child and who had elicited the
actual sample from the child (Ellis & Pakulski, 2003).
After the percentage of words correctly identified by each
graduate student for each child was calculated, the scores
of 4 children appeared to be very similar to the scores for
other children and thus were eliminated. The elimination
procedure involved assigning numbers to similarly scored
samples and asking one of the graduate students to choose
one number from each pair of the numbers presented. The
samples with the chosen numbers were excluded from use
in the study. This left the eight samples with intelligibility scores ranging from 52% to 100% correct.
The 55 participants were randomly assigned to five

groups, each with 11 participants. Group A received a
pre-test, familiarization training, and a post-test 1 week
after training (referred to as delayed post-test). Group B
received a pre-test, response feedback training, and a
delayed post-test. Group C (the control group), received a
pre-test, no training, and a delayed post-test. Group D
received familiarization training, an immediate posttest, and a delayed post-test. Group E received response
feedback, an immediate post-test, and a delayed posttest (see Table 1 for the details of the data collection). The
training sessions were scheduled for different groups at
different times, based on their availability.
Each of the eight samples was analyzed further to

determine speech characteristics that may contribute to
listeners perceptions of intelligibility. Percent of consonants correct ( PCC) and the presence or absence of deviations in voice, resonance, and prosody were identified.
The testing procedures were adapted from a previous study (Ellis & Fucci, 1992) and were identical for all
participants. A Marantz portable minidisk recorder
( Model PMD 650; Marantz, Mahwah, NJ) was used to
deliver all of the stimuli. Stimuli were presented at the
Procedures and Listener Tasks
1117
Table 1. Data collection matrix.
Group
A
B
C
D
E
Pre-test
Training
O1
O3
O5
___
___
Familiarization
Feedback
___
Familiarization
Feedback
Immediate
post-test
Delayed
post-test
O7
O9
O2
O4
O6
O8
O10
listeners comfort level (approximately 65 dB SPL) through

matched and calibrated supra-aural earphones (AudioTechnica ATH-908; Stow, OH). The participants were
tested individually while seated in a soundproof booth.
Each participant provided written WI responses and
MES responses for each of the eight speech samples in
the same session.
The participants were read the following instructions: You will be listening to recordings of children
telling stories. You will hear the recording of each child
three times. As you hear each recording, write down all
the words that you understand. After you listen to each
child and write down what you think he said, I will ask
you to give me a number that goes with how understandable the speech was for you. You can use any numbers
that you wish, including whole numbers, decimals, and
fractions. Give higher numbers for speech that is less
understandable; give lower numbers for speech that is
more understandable.
This chosen directionality of the numeric ratings of
understandability is consistent with the tradition in
measurement when lower numbers are assigned to less
difficult ( here, more understandable) stimuli, whereas
higher numbers are associated with more difficult ( here,
less understandable) stimuli. In addition, because listeners were rating speech that was primarily less than
100% intelligible, having the scale go in this direction
allowed them to more easily scale the degree of reduced
intelligibility. To make sure that the instructions were
understood and would be properly followed, after hearing and reading these instructions, listeners wrote down
what they understood their tasks to be. Two of the participants demonstrated a misunderstanding of the directions; however, after re-reading the instructions, both
participants demonstrated correct understanding by accurately paraphrasing the instructions.
For each listener, the speech samples were presented
in a different randomized order. To minimize possible bias
in the data, no reference standard was used (Hellman &
Zwislocki, 1963). Each session lasted approximately 1 hr.
As mentioned previously (see also Table 1), participants in Groups A, B, and C completed these testing
procedures initially as a pretest. Within 1 week of this
1118
initial testing, participants in Group A and Group B received training. Familiarization training for participants in Group A consisted of listening to the training
samples while following along with a printed transcript
of what each child was attempting to say. Response
feedback training for participants in Group B involved
listening to the training samples, writing down the words
that were understood, and receiving feedback about
which words were identified correctly. Feedback consisted of providing the participants with their transcripts
on which correctly identified words were highlighted and
informing the participants that these were words they
had correctly identified. Approximately 1 week after the
training was completed, participants in groups A, B, and
C were tested again under the conditions identical to
those during the initial testing, except that each participant responded to a different randomized sequence of
the samples.
Participants in Group D received the same familiarization training as Group A, and participants in Group E
received the same feedback training as Group B; however,
participants in these two groups did not undergo pretesting, and their post-testing occurred immediately after
the training was completed (immediate post-testing)
and again 1 week later (delayed post-testing).
As part of the training, groups A, B, D, and E listened to a different set of eight samples three times each.
These training samples were produced by the same
children who produced the eight test samples. Each
training sample consisted of the first 25 words of each
childs retelling of a different story. Training sessions for
both the familiarization and the feedback conditions
lasted 60 min.
Several factors were considered in determining the
training parameters for this study, particularly the duration of training and the time intervals between testing
and training sessions. Available research on the effects
of familiarization training typically used a rather brief
period of training within a single session (e.g., Tjaden &
Liss, 1995). To control for time on task as a variable separate from type of training, we decided to keep the training relatively brief and limited to a single session for this
preliminary investigation. The decision to limit the period
of training to no more than 60 min was also based on
the researchers desire to limit any influence of listener
fatigue and reduced motivation.
The decision to use two different time intervals between test periods was based on the need to capture the
possible immediate effects of training and maintenance
of change or other effects of training over time. Scheduling no more than a 1-week period between the training
and post-testing was also influenced by the availability of
participants and a concern about possible attrition if a
longer time period was used.
Previous literature related to familiarization training (e.g., Tjaden & Liss, 1995; Yorkston & Beukelman,
1983) was helpful in guiding our selection of training
materials for the familiarization and feedback conditions as well as procedures for the familiarization condition. Regarding specific procedures developed for the
feedback condition, our choices were motivated primarily by the general literature related to the use of feedback
during skill development, which tends to emphasize
the importance of prompt and reasonably contingent
feedback.
Participants transcripts of the test samples (preand/or post-training) were scored to determine the number of correctly identified words. All correctly identified
words (i.e., content or function words) were equally
weighted in the scoring. Two female graduate students
in speech-language pathology independently scored the
transcripts using a scoring key. The scoring key consisted of the transcripts of the eight samples that represented each childs intended words as determined by
the consensus between an individual familiar with the
children (this individual had elicited the original samples) and the parent of each child.
Prior to scoring the samples, the two graduate students were trained in the scoring procedure by the first
author, and both students achieved at least 90% agreement with the author when scoring transcripts similar
to those that were used in the study. After scoring the
actual samples, the two graduate students conferred to
resolve any discrepancies in scoring. Any discrepancies
in scoring between the two graduate students were resolved by consensus between the two scorers.
Data Analysis
Because of the incomplete data matrix (see Table 1),
the effect of training was determined by conducting two
repeated measures analyses of variance (ANOVAs) on
each dependent measure (i.e., WI and MES)one to test
for change from pre-test to delayed post-test using the
data from groups A, B, and C, and the other to test for
change from immediate post-test to delayed post-test
using the data for groups D and E and pre-test data for
the control group C.
The WI and MES scores used in these analyses were
obtained from the Rasch analysis (Rasch, 1960, 1980)
that transformed the raw percentages correct (the WI
data) and the intelligibility perceptions (the MES data)
into abstract, equal-interval units by log transformations of raw data odds and probabilistic equations (Bond
& Fox, 2001).We chose this approach over computing
total raw scores for the WI measure and over using average scores for the MES measure because we could not
assume that the WI and MES scales were interval. For
example, we could not assume that the difference and
distance between WI scores of 50 and 55 was the same as

the difference and distance between WI scores of 30 and
35. Such an assumption was even less plausible with the
MES ratings, for which 33 different scales were observed, with the narrowest being 24 and the widest
being 120. Rasch analysis also allowed us to accommodate the individual MES scales chosen by the listeners.
This was accomplished by conducting the Partial Credit
Rasch analysis (Masters, 1982). Typically, this analysis
allows each item/stimulus to function on its own scale.
However, in this case, persons used their own scales.
Therefore, prior to the analysis, the data matrix was
transposed such that the items and the listeners switched
positions.
Multiple comparisons (t tests) were also explored. To
keep the family-wise alpha level under control across the
four tests, a Bonferroni adjustment was applied, yielding a new alpha level of .0125 that was used for all the
tests. The individual change was analyzed using the
Rasch Differential Person Functioning procedure by building a 95% confidence band around each individual persons
WI or MES estimate and examining the overlap in the
confidence bands from assessment 1 to assessment 2. No
overlap was interpreted as evidence of significant
individual change.
Results
Table 2 summarizes descriptive statistics for the sample on both measures of speech intelligibility. For convenience of interpretation, the MES scores were reported on
the same 0100 scale as the WI scores. However, lower
MES scores indicate more intelligible speech, whereas
higher MES scores indicate less intelligible speech.
WI Results
The WI results of the repeated measures ANOVA for
groups A, B, and C showed a significant main effect for
time, F(1, 30) = 83.64, p < .001. This means that regardless of the type of training, there was a statistically
significant increase in WI scores from the average of 55%
correct at the pre-test to the average of 58% correct at the
delayed post-test (Cohens d = 0.75). Neither the effect for
the type of training nor the Training Time interaction
effect were significant. The three groups were statistically
equivalent at the pre-test, F(2, 30) = 1.05, p > .05.
Similar WI results were obtained for Groups C, D,
and E (i.e., comparing immediate and delayed post-test
data). Although we did not collect the immediate posttest data for Group C, it was established that Group C at
the pre-test was not statistically different from Groups D
and E at immediate post-test, F(2, 30) = 2.61, p > .05, and
therefore, we were able to use Group C pre-test data for
1119
Table 2. Mean scores and standard deviations for the Word Identification (WI) and Magnitude Estimation Scaling (MES) measures of
speech intelligibility as a function of training and time.
WI percentage correct
Pre-test
Immediate
post-test
MES scores
Delayed
post-test
Immediate
post-test
Pre-test
Delayed
post-test
Type of Training
SD
SD
SD
SD
SD
SD
Familiarization Training: Group A (n = 11)

Feedback Training: Group B (n = 11)
No Training: Group C (n = 11)
Familiarization Training: Group D (n = 11)
Feedback Training: Group E (n = 11)
55.3
53.8
56.2
3.7
4.5
3.6
57.2
58.8
3.1
2.5
57.6
57.1
58.3
58.7
60.0
2.6
3.8
2.3
2.5
2.0
54.3
53.7
56.5
3.2
3.7
1.8
54.7
55.0
2.7
2.3
53.8
55.9
53.3
52.7
54.5
4.9
4.3
2.0
4.9
2.3
this comparison. The results of the repeated measures

ANOVA for Groups C, D, and E revealed that there was a
highly significant main effect for time, F(1, 30) = 46.57,
p < .001. This means that regardless of the type of
training, there was a statistically significant increase in
WI scores from the average of 57% correct at the immediate post-test to the average of 59% correct at the
delayed post-test (Cohens d = 0.57). There was no significant effect for the type of training or the Training Time
interaction.
The individual change analysis and a series of multiple comparisons shed some additional light on the trends
in the WI data. The individual change analysis revealed
that in the familiarization groups, an average of 6 listeners
(5 from Group A and 7 from Group D) showed significant
individual growth. The average individual change for
the feedback groups was 7 listeners (10 in Group B and
4 in Group E). Finally, 7 listeners improved their WI
scores in the control Group C. Overall, 36% of the participants improved on the WI measure, which is consistent
with the finding from the ANOVA analysis showing a significant improvement over time.
The results of multiple comparisons tests (see Table 3)
showed that the change in WI scores was significant when
pretest scores of familiarization group A were compared to
delayed posttest scores for the familiarization Groups A
and D. When feedback Groups B and E were compared,
the change in WI scores was significant from pre-test to

immediate post-test and to delayed post-test. In other
words, familiarization training Groups A and D showed
an increase in WI over time but no immediate increase;
the feedback training Groups B and E demonstrated improvement immediately and 1 week following training.
MES Results
The MES results of the repeated measures ANOVA
test for Groups A, B, and C showed a significant Time of
Testing Type of Training interaction, F(2, 30) = 4.87,
p < .05. This means that depending on the time of assessment, the perceptions of speech intelligibility differed
for different groups (see Figure 1). As seen in Figure 1, the
MES score went up for the feedback Group B, stayed almost the same for the familiarization training Group A
and dropped for the control group C. The three groups did
not differ in their MES scores at the pretest, F(2, 30) = 2.92,
p > .05.
The repeated measures ANOVA results for the MES
data for Groups C, D, and E (i.e., comparing immediate
and delayed post-test data) were similar to the WI results. As with the WI data, we did not collect the immediate post-test MES data for Group C. However, it
was established that Group C at the pretest was not
statistically different from Groups D and E at immediate
Table 3. Summary of Scheffe multiple comparisons of WI data.

Comparison
Mean difference
SE
Group Apre-test and Group Adelayed post-test

Group Apre-test and Group Dimmediate post-test
Group Apre-test and Group Ddelayed post-test
Group Bpre-test and Group Bdelayed post-test
Group Bpre-test and Group Eimmediate post-test
Group Bpre-test and Group Edelayed post-test
2.32
1.85
3.38
3.31
5.11
6.21
1.96
1.46
1.36
1.29
1.55
1.49
.003
.220
.022
.000
.004
.000
Note.
1120
WI = Word Identification.
Figure 1. Change in MES judgments depending on time of assessment and type of training.
posttest, F(2, 30) = 1.97, p > .05, and therefore we were

able to use Group C pre-test MES data for this comparison. The results of the repeated measures ANOVA
for Groups C, D, and E revealed a significant main effect
for time, F(1, 30) = 12.07, p < .01. This means that regardless of the type of training, there was a statistically
significant change in MES scores from the average of
55.4 at the immediate post-test to the average of 54.5 at
the delayed post-test (Cohens d = 0.67). There was no
significant effect for the type of training or the Training
Time interaction. A series of multiple comparisons of
MES data showed no significant group difference in MES
scores, and no significant individual change in perceptions was noted.
Discussion
This study attempted to answer three major research questions and determine if there was an effect of
training (either immediate or 1 week after the training)
on listeners WI performance and their perceptions of
speech intelligibility, if either of the two types of training
(familiarization and feedback) was more effective than
the other, and whether there were any patterns of individual change specific to the type of training. Overall, the
study found no significant training effect and no trainingspecific patterns of individual change in WI performance
and perceptions of speech intelligibility. However, several
interesting findings merit discussion.
The results of this study indicate that overall, for the
sample of listeners in this study, repeated exposure
with or without trainingleads to improved WI scores
and perception scores as supported by both the group
change and individual change analyses. Such an outcome would be supported by a differentiation theory of
perceptual learning. The standardized effect sizes of
these improvements ranged from medium to high, which
is not surprising, given small standard deviations. However, when these improvements are interpreted using a
simple effect size (i.e., raw difference between the means)
and in the context of the location of the listeners on the
scale of measurement, the clinical significance is limited.
It appears that regardless of the training and exposure,
listeners WI and perception of speech intelligibility significantly improved but still remained in the range of
55%58%. Therefore, from the perspective of clinical significance or social validity of results, it might be argued
that the discrepancy in results for the two measures or
across groups on the same measure is more apparent
than real.
The improvement itself was neither surprising nor
very different from similar findings in the literature (e.g.,
Monsen, 1983; Peng, Spencer, & Tomblin, 2004), although
procedures to control for the differential effect of repeated exposure were included in this study by providing all listeners with three consecutive exposures to each
sample during each test occasion. The decision to use
this common procedure was based on the available research (e.g., Epstein, Giolas, & Owens, 1968) suggesting
that more than three exposures to single word stimuli do
not typically result in significant changes in listeners
WI responses. It is possible that the same procedure may
not function in the same way when used with narrative
speech samples used as stimuli in the study. Further
research is warranted to determine the number of trials
needed for narrative samples in order to control for the
practice effect resulting simply from repeated exposure
1121
to the samples and /or increased familiarity with the

judgment task itself.
Beyond the effects of repeated exposure, the study
found that the WI scores increase significantly immediately after training in which the listeners receive feedback regarding the accuracy of their WI responses. This
seems to support our earlier argument that of the two
broad classes of training mechanisms distinguished by
cognitive scientistssupervised training driven by feedback and unsupervised training that requires no feedback (Goldstone, 1998)supervised training in the form
of listeners receiving feedback about the accuracy of
their responses significantly impacts listeners WI judgments. On the other hand, unsupervised training in the
form of familiarization training (as well as simply a repeated exposure of the listeners to the samples) operated
on structures inherent in the stimuli themselves and
resulted in the listeners being more finely tuned to (and
discriminating better) what was presented to them.
These findings also suggest an interesting relationship between WI and MES measures. The relationship
appears to be positive when no training or no pre-testing
is used. The results show that for the control group and
the groups receiving training followed by immediate and
delayed post-tests (i.e., without pre-testing), changes in
perceptions of intelligibility, as reflected in MES scores,
paralleled changes in WI scores. In other words, increased WI identification scores were associated with
the perception of increased intelligibility of the speech
samples to participants in these groups. By contrast, for
groups that received training and were assessed by pretesting and delayed post-testing, the MES results varied
in unexpected ways. Rather than changing parallel to
the increased correct WIs, the MES scores for the familiarization training group stayed about the same from
pre-test to delayed post-test, indicating no significant
change in the perception of the intelligibility of the
samples to these listeners. More surprisingly, the MES
scores of the feedback training group indicated that
perceived intelligibility of the samples decreased from
pre-test to delayed post-test for these listeners. Apparently, not only frequency of exposure but also the type of
supervised training and time between exposures to the
samples may influence the intelligibility judgments of
the nave listeners in this study. More research is needed
to clarify how perceptions of speech intelligibility are affected by the type of training, testing, and time. The
number of participants and the duration of training may
need to be increased in future studies to better test the
trends identified in the present study. Additionally, immediate post-testing of the control group needs to be built
into the study design to avoid some of the extrapolation
procedures used in the current investigation.
Analysis of individual change in this study supported
a common finding reported by other researchers
1122
namely, the considerable variability in ability levels

among listeners. For example, in a study examining the
effects of listeners experience on WI for speech samples of deaf or hard-of-hearing children, Klimacka et.al.
(2001) found that although a group of experienced listeners scored significantly higher than a group of inexperienced listeners, several of the inexperienced
individuals actually performed more proficiently than
some of the experienced judges.
In the present study, using the criterion of 2 SDs
difference between the scores on assessment 1 (i.e., pretest for Groups A, B, and C and immediate post-test for
Groups D and E) and assessment 2 (delayed post-test for
all five groups) as an indicator of significant change, it is
clear that the individual ability levels varied noticeably
in most groups, with the notable exception of the feedback training Group B that was pretested and received a
delayed post-test. In this group, all but 1 of the participants demonstrated significant change following training, in contrast to other groups, in which the number of
listeners showing significant change ranged between 4 and
7. One potential benefit of this type of individual change
analysis is to inform or facilitate further investigation of
the individual characteristics of listeners that may help to
distinguish more-proficient from less-proficient listeners. For example, knowing which listeners significantly
improved and which did not, we could examine their error patterns. Once identified, these patterns can be used
to understand strategies used by more-proficient listeners. These strategies can then be implemented in training less-proficient listeners to increase their accuracy in
WI of the speech of hearing-impaired speakers.
In conclusion, the results of this study have perhaps
raised as many questions as they have helped to answer.
Although feedback training significantly increases WI
scores of listeners, the type of familiarization training
provided to listeners did not affect the overall group performance to any greater degree than providing listeners
with repeated opportunities to respond to the speech of
hard-of-hearing or deaf children. Nevertheless, consistent with findings reported in a number of other studies
involving listeners judgments of intelligibility, results of
the present investigation revealed considerable individual variability in listeners responses. Further research
is needed to enhance our understanding of listeners
performance on intelligibility tasks, in general, and for
tasks involving speech of hard-of-hearing or deaf talkers,
in particular.
References
Beltyukova, S. A., Fox, C. M., Stone, G. E., & Ellis, L.
(2006, April). Understanding the utility of magnitude estimation scaling. Paper presented at the meeting of the American Educational Research Association, San Francisco, CA.
Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model:

Fundamental measurement in the human sciences. Mahwah,
NJ: Erlbaum.
McGarr, N. (1983). The intelligibility of deaf speech to experienced and inexperienced listeners. Journal of Speech and
Hearing Research, 26, 451458.
DePaul, R., & Kent, R. (2000). A longitudinal case study

of ALS: Effects of listener familiarity and proficiency on
intelligibility judgments. American Journal of SpeechLanguage Pathology, 9(3), 230240.
Meek, P. M., Sennott-Miller, L., & Ferketich, S. L. (1992).

Focus on psychometrics: Scaling stimuli with magnitude
estimation. Research in Nursing & Health, 15, 7781.
Dowden, P. A., & Stoel-Gammon, C. (1992). The effects of

listener training on the speech intelligibility of severely dysarthric
individuals. Seattle, WA: University of Washington.
Ellis, L. W., & Fucci, D. J. (1992). Effects of listeners experience on two measures of intelligibility. Journal of Perceptual and Motor Skills, 74, 10991104.
Ellis, L., & Pakulski, L. (2003). Judgments of speech intelligibility and speech annoyance by mothers of children who
are hard of hearing or deaf. Journal of Perceptual and Motor
Skills, 96, 324328.
Epstein, A., Giolas, T. G., & Owens, E. (1968). Familiarity
and intelligibility of monosyllabic word lists. Journal of
Speech and Hearing Research, 11, 435438.
Monsen, R. B. (1983). The oral speech intelligibility of

hearing impaired talkers. Journal of Speech and Hearing
Disorders, 48, 286296.
Peng, S., Spencer, L. J., & Tomblin, J. B. (2004). Speech
intelligibility of pediatric cochlear implant recipients with
seven years of device experience. Journal of Speech, Language, and Hearing Research, 47, 12271236.
Rasch, G. (1960). Probabilistic models for some intelligence
and attainment tests. Copenhagen: Danmarks Paedagogiske
Institut.
Rasch, G. (1980). Probabilistic models for some intelligence
and attainment tests (Expanded ed.). Chicago: University of
Chicago Press.
Flipsen, J. (1995). Speaker-listener familiarity: Parents as

judges of delayed speech intelligibility. Journal of Communication Disorders, 28, 319.
Schiavetti, N. (1992). Scaling procedures for the measurement of speech intelligibility. In R. D. Kent ( Ed.), Intelligibility in speech disorders ( pp. 1134). Amsterdam: John
Benjamins.
Gibson, E. J., & Pick, A. D. (2000). An ecological approach to

perceptual learning and development. Oxford, England:
Oxford University Press.
Sennott-Miller, L., Murdaugh, C., & Hinshaw, A. S. (1988).

Magnitude estimation: Issues and practical applications.
Western Journal of Nursing Research, 10(4), 414424.
Goldstone, R. L. (1998). Perceptual learning. Annual Reviews

of Psychology, 49, 585612.
Tjaden, K., & Liss, E. (1995). The influence of familiarity on

judgments of treated speech. American Journal of SpeechLanguage Pathology, 4(1), 3948.
Gordon-Brannan, M., & Hodson, B. (2000). Intelligibility/

severity measures of prekindergarten childrens speech.
American Journal of Speech-Language Pathology, 9(2),
141150.
Grant, J. S., & Kinney, M. (1990). Altered level of consciousness: Validity of a nursing diagnosis. Research in
Nursing & Health, 13, 403410.
Hellman, R., & Zwislocki, J. (1963). Monaural loudness
function at 1000 cps and interaural summation. The Journal
of the Acoustical Society of America, 35, 856865.
Hustad, K. C., & Cahill, M. A. (2003). Effects of presentation
mode and repeated familiarization on intelligibility of
dysarthric speech. American Journal of Speech-Language
Pathology, 12(2), 198208.
Weismer, G., & Laures, J. (2002). Direct magnitude estimates of speech intelligibility in dysarthria: Effects of a
chosen standard. Journal of Speech, Language, and Hearing
Research, 45, 421433.
Yorkston, K. M., & Beukelman, D. R. (1983). The influence
of judge familiarization with the speaker on dysarthric
speech intelligibility. In W. Berry ( Ed.), Clinical dysarthria
( pp. 155163). San Diego: College Hill Press.
Yorskston, K. M., & Beukelman, D. R. (1992). Intelligibility measurement as a tool in the clinical management of
dysarthric speakers. In R. D. Kent ( Ed.), Intelligibility in
speech disorders ( pp. 265286). Amsterdam: John Benjamins.
Klimacka, L., Patterson, A., & Patterson, R. (2001). Listening to deaf speech: Does experience count? International
Journal of Language and Communication Disorders, 36,
210215.
Received November 27, 2006
Liss, J. M., Spitzer, S. M., Caviness, J. N., & Adler, C.

(2002). The effects of familiarization on intelligibility and
lexical segmentation in hypokinetic and ataxic dysarthria.
The Journal of the Acoustical Society of America, 112,
30223030.
DOI: 10.1044/1092-4388(2008/06-0217)
Revision received June 5, 2007

Accepted November 29, 2007
Contact author: Lee W. Ellis, Department of Public Health
and Rehabilitative Services, The University of Toledo,
2801 West Bancroft Street, Toledo, OH 43606.
E-mail: lellis@utnet.utoledo.edu.
Masters, G. N. (1982). A Rasch model for partial credit

scoring. Psychometrika, 47, 149174.
1123

2008 - Effects of Training On Na Lve Listeners' Judgments of The Speech Intelligibility of Children With Severe-To-Profound Hearing Loss

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

2008 - Effects of Training On Na Lve Listeners' Judgments of The Speech Intelligibility of Children With Severe-To-Profound Hearing Loss

Загружено:

Авторское право:

Доступные форматы

Effects of Training on Nal ve Listeners

Judgments of the Speech Intelligibility

Vol. 51 11141123 October 2008 D American Speech-Language-Hearing Association

other speech-impaired populations, including individuals

extract information from the environment as a result of

Ellis & Beltyukova: Effects of Training

to open-ended questions and requests for information. In

Is there an effect of training on WI and /or MES

Is one training more effective than the other?

Are there any patterns in individual change by type

These research questions were investigated for the

similar changes in both measures, the limited literature

Preparation of Stimulus Materials

Eight digitally recorded speech samples (also referred

Fifty-five undergraduate students (M = 20;0, SD = 0;4)

A narrative sample of approximately the first

The 55 participants were randomly assigned to five

Each of the eight samples was analyzed further to

Procedures and Listener Tasks

Ellis & Beltyukova: Effects of Training

Table 1. Data collection matrix.

listeners comfort level (approximately 65 dB SPL) through

distance between WI scores of 50 and 55 was the same as

Ellis & Beltyukova: Effects of Training

Familiarization Training: Group A (n = 11)

this comparison. The results of the repeated measures

the change in WI scores was significant from pre-test to

Table 3. Summary of Scheffe multiple comparisons of WI data.

Group Apre-test and Group Adelayed post-test

posttest, F(2, 30) = 1.97, p > .05, and therefore we were

Ellis & Beltyukova: Effects of Training

to the samples and /or increased familiarity with the

namely, the considerable variability in ability levels

Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model:

DePaul, R., & Kent, R. (2000). A longitudinal case study

Meek, P. M., Sennott-Miller, L., & Ferketich, S. L. (1992).

Dowden, P. A., & Stoel-Gammon, C. (1992). The effects of

Monsen, R. B. (1983). The oral speech intelligibility of

Flipsen, J. (1995). Speaker-listener familiarity: Parents as

Gibson, E. J., & Pick, A. D. (2000). An ecological approach to

Sennott-Miller, L., Murdaugh, C., & Hinshaw, A. S. (1988).

Goldstone, R. L. (1998). Perceptual learning. Annual Reviews

Tjaden, K., & Liss, E. (1995). The influence of familiarity on

Gordon-Brannan, M., & Hodson, B. (2000). Intelligibility/

Received November 27, 2006

Liss, J. M., Spitzer, S. M., Caviness, J. N., & Adler, C.

Revision received June 5, 2007

Masters, G. N. (1982). A Rasch model for partial credit

Ellis & Beltyukova: Effects of Training

Вам также может понравиться