and Older Adults Mitchell S. Sommers, Nancy Tye-Murray, and Brent Spehar Objective: The purpose of the present study was to examine the effects of age on the ability to benefit from combining auditory and visual speech infor- mation, relative to listening or speechreading alone. In addition, the study was designed to com- pare visual enhancement (VE) and auditory en- hancement (AE) for consonants, words, and sen- tences in older and younger adults. Design: Forty-four older adults and 38 younger adults with clinically normal thresholds for fre- quencies of 4 kHz and below were asked to identify vowel-consonant-vowels (VCVs), words in a carrier phrase, and semantically meaningful sentences in auditory-only (A), visual-only (V), and auditory-vi- sual (AV) conditions. All stimuli were presented in a background of 20-talker babble, and signal-to-bab- ble ratios were set individually for each participant and each stimulus type to produce approximately 50% correct in the A condition. Results: For all three types of stimuli, older and younger adults obtained similar scores for the A condition, indicating that the procedure for indi- vidually adjusting signal-to-babble ratios was suc- cessful at equating A scores for the two age groups. Older adults, however, had significantly poorer per- formance than younger adults in the AV and V modalities. Analyses of both AEand VEindicated no age differences in the ability to benefit from com- bining auditory and visual speech signals after con- trolling for age differences in the V condition. Cor- relations between scores for the three types of stimuli (consonants, words, and sentences) indi- cated moderate correlations in the V condition but small correlations for AV, AE, and VE. Conclusions: Overall, the findings suggest that the poorer performance of older adults in the AV con- dition was a result of reduced speechreading abili- ties rather than a consequence of impaired integra- tion capacities. The pattern of correlations across the three stimulus types indicates some overlap in the mechanisms mediating AV perception of words and sentences and that these mechanisms are largely independent from those used for AV percep- tion of consonants. (Ear & Hearing 2005;26;263275) Considerable evidence is now available to suggest that speech intelligibility improves when listeners can both see and hear a talker, compared with listening alone (Grant, Walden &Seitz, 1998; Sumby &Pollack, 1954). Moreover, the benefits of combining auditory and visual speech information increase with the diffi- culty of auditory-only (A) perception (Sumby & Pol- lack, 1954). This increased performance for auditory- visual (AV) compared with A or visual-only (V) presentations is at least partially a result of comple- mentary information available in the auditory and visual speech signals (Grant & Seitz, 1998; Grant, et al., 1998; Summerfield, 1987). Thus, for example, if a listener is unable to perceive the acoustic cues for place of articulation, accurate intelligibility can be main- tained if the speaker is visible because speechreading provides an additional opportunity to extract place information. Grant et al. (1998) proposed a conceptual frame- work for understanding the improved performance for AV presentations, compared with either unimo- dal format (A or V) in which both peripheral and central mechanisms contribute to an individuals ability to benefit from combining auditory and vi- sual speech information. In the initial step of the model, peripheral sensory systems (audition and vision) are responsible for extracting signal-related segmental and suprasegmental phonetic cues inde- pendently from the auditory and visual speech sig- nals. These cues are then integrated and serve as input to more central mechanisms that incorporate semantic and syntactic information to arrive at phonetic and lexical decisions. This model is partic- ularly useful for comparing the benefits of combin- ing auditory and visual speech information across different populations because it highlights that dif- ferential benefits can result from changes in periph- eral, central, or a combination of peripheral and central abilities. One of the difficulties in comparing the benefits obtained from combining auditory and visual speech information across different populations, however, is that performance for unimodal presentations is often different across the populations of interest. In the present study, for example, we wanted to inves- tigate the effects of age on the ability to combine Department of Psychology (MSS), Washington University, St. Louis, Missouri; and Central Institute for the Deaf (N.T. and B.S.), St. Louis, Missouri. 0196/0202/05/2603-0263/0 Ear & Hearing Copyright 2005 by Lippincott Williams & Wilkins Printed in the U.S.A. 263 auditory and visual speech information. However, this effort is complicated by well-documented de- clines in both A and V speech perception as a function of age (CHABA, 1988; Dancer, Krain, Thompson, Davis, & Glenn, 1994; Honneil, Dancer, & Gentry, 1991; Lyxell & Ronnberg, 1991; Middel- weerd & Plomp, 1987; Shoop & Binnie, 1979). Thus, evidence for age differences in the ability to benefit from combining auditory and visual speech signals could be attributed to age-related differences in auditory sensitivity, age-related differences in speechreading, age differences in integrating audi- tory and visual speech information, or some combi- nation of these factors. Although a few studies comparing AV scores in older and younger adults have been designed to minimize the influence of unimodal performance differences (Cienkowski & Carney, 2002; Walden et al., 1993), other methodological concerns make their findings difficult to interpret. In one investigation, for example, Walden et al. (1993) compared differ- ences between AV, V, and A performance for conso- nant-vowels (CVs) and sentences in middle-aged (35 to 50 years of age) and older (65 to 80 years of age) adults. For the sentences (but not for the CVs), testing was conducted in the presence of speech- shaped noise and noise levels were adjusted to obtain approximately 40 to 50% correct in the A condition. Visual enhancement for sentences, de- fined as the difference between the AV and A condi- tions, did not differ between the two groups. V scores for sentences were significantly lower for the older (16.7%) than for the middle-aged adults (34.4%), preventing an analysis of age differences under conditions of equivalent unimodal performance. Moreover, scores in the AV condition for sentences were near ceiling for both groups (93.8 and 92% for the middle-aged and older adults, respectively), making it difficult to interpret the null effects of age. Cienkowski & Carney (2002) minimized the con- tribution of presbycusis in an AV perception task by comparing normal-hearing younger and older adults on susceptibility to the McGurk effect (McGurk & MacDonald, 1976). In the McGurk effect, partici- pants are presented with discrepant auditory and visual information and often perceive a fused re- sponse that differs from both inputs. For example, participants might be presented with an auditory VCV containing a medial bilabial stop (e.g., /aba/) while simultaneously viewing a face articulating a VCV with a medial velar stop (e.g., /aga/). On a certain percentage of trials, listeners will report hearing a VCV with a medial alveolar place of articulation (e.g., /ada/ or /aa/) that represents a fusion of the auditory and visual inputs. Cien- kowski & Carney found that the percentage of fused responses did not differ between normal-hearing older and younger adults and suggested that this finding argued against age-related declines in audi- tory-visual integration. The results of the Cien- kowski & Carney study provide indirect evidence for similar enhancement from combining auditory and visual speech information as a function of age but because their study was not designed to assess enhancement specifically, they did not include mea- sures of unimodal performance. Consequently, it is not clear whether the comparable susceptibility to the McGurk effect in the two age groups was asso- ciated with equivalent A and V abilities. One purpose of the present study, therefore, was to examine the effects of age on the ability to benefit from combining auditory and visual speech informa- tion after minimizing differences in unimodal per- formance. Toward this end, we tested normal-hear- ing younger and older adults under conditions that produced similar levels of A performance and that also avoided ceiling effects in the AV condition. In addition, we computed both visual enhancement (VE, the benefit obtained fromadding a visual signal to an auditory stimulus) and auditory enhancement (AE, the benefit obtained from adding an auditory signal to a visual-only stimulus) as a means of examining age differences in the ability to combine auditory and visual speech information after nor- malizing for any differences in unimodal perfor- mance (see Methods section for additional details on computing these two measures). A second goal of the study was to investigate age differences in AE and VE for consonants, words, and sentences. In general, only small to moderate corre- lations have been observed between VE for stimuli differing in semantic context (e.g., nonsense sylla- bles, isolated words, and meaningful sentences), suggesting that the mechanisms mediating en- hancement may differ as a function of the amount of semantic or lexical information available in the stimulus (Grant & Seitz, 1998; Grant et al., 1998). Furthermore, to our knowledge, there have been no systematic investigations of the relationship be- tween AE or VE for speech stimuli differing in semantic content, although a small number of inves- tigations (Dekle, Fowler, & Funnell, 1992; Sams, Manninen, Surakka, Helin, & Ktt, 1998) have compared AV integration for different stimulus types using the McGurk effect. In the study most similar to the current investigation, Sams et al. (1998) found considerable variability in participants susceptibility to the McGurk effect in Finnish as a function of semantic context (consonants, isolated words, words in sentences). Thus, similar to results with A identification (Pichora-Fuller, Schneider, & Daneman, 1995; Sommers & Danielson, 1999) and 264 EAR & HEARING / JUNE 2005 measures of auditory-visual integration with the McGurk effect (Sams et al., 1998) the effect of age on VE and AE may differ as function of stimulus type. The current study was therefore designed to exam- ine whether any observed age effects on VE and AE would be modulated by the amount of lexical and semantic information available. METHODS Participants Thirty-eight younger adults (mean age, 20.1, SD 2.1) and forty-four older adults (mean age 70.2, SD 6.8) served as participants. Younger adults were all students at Washington University and were recruited through posted advertisements. Older adults were all community-dwelling residents and were recruited through a database maintained by the Aging and Development Program at Wash- ington University. Testing required three 2.5-hour sessions that were conducted on separate days. All participants reported that English was their first language and that they had never had any lipread- ing training. Participants were paid $10/hour for taking part in the experiments. All participants were screened for CNS dysfunctions, using an exten- sive medical history questionnaire that asked about significant CNS events, including stroke, open or closed head injury, concussions, any event in which the participant was rendered unconscious, dizzi- ness, and current medications. In addition, partici- pants were asked about conditions for which they were currently being treated. Any participant with a history of CNS disorders, who was currently being treated for a CNS condition or who was currently taking drugs that affect CNS activity, was excluded. Verbal abilities were assessed using the vocabulary subtest of the Wechsler Adults Intelligence Scale, with a maximum score of 70. Mean scores for older and younger adults were 55.3 (SD 8.7) and 46.2 (SD 4.2), respectively. An independent-samples t-test indicated that vocabulary scores were signifi- cantly higher for older than for younger adults, t(81) 5.5, p 0.001. Older participants were also screened for dementia using the Mini Mental Status Exam (Folstein, Folstein, & McHugh, 1975). Partic- ipants who scored below 24 (of 30) were excluded from further testing. Participants were also screened for vision and hearing before testing. Participants whose normal or corrected visual acuity, as assessed with a Snellen eye chart, exceeded 20/40 were excluded from par- ticipating to minimize the influence of reduced vi- sual acuity on the ability to encode visual speech information. Visual contrast sensitivity was mea- sured using the Pelli-Robson contrast sensitivity chart (Pelli, Robson, & Wilkins, 1998) and partici- pants whose score exceeded 1.8 were also excluded from further participation. Pure-tone air-conduction thresholds were obtained for all participants at octave frequencies from 250 to 4000 Hz, using a portable audiometer (Beltone 110) and headphones (TDH 39). Any participant whose threshold ex- ceeded 20 dB HL (American National Standards Institute, 1989) was excluded from participating. Participants with asymmetric hearing losses, opera- tionalized as greater than a 10-dB threshold differ- ence between the two ears at any of the test frequen- cies, were also excluded. Pure-tone averages (500, 1000, and 2000 Hz) for younger adults were 0.61 (SD 4.3) and 0.67 (SD 4.2) dB HL for the left and right ears, respectively. Corresponding values for older adults were 14.1 (SD 6.6) and 13.2 (SD 6.3) dB HL. A comparison of pure-tone averages revealed that thresholds were significantly greater for older than for younger adults (left ear: t(81) 10.4, p 0.001; right ear: t(81) 10.1, p 0.001). It should also be noted that although we did not obtain threshold measures at frequencies higher than 4000 Hz, based on previous studies of age-related hearing loss (Morrell, Gordon-Salant, Pearson, Brant, & Fozard, 1996), the older adults almost certainly had clinically significant hearing losses (e.g., greater than 20 dB HL) at frequencies above 4000 Hz. Thus, despite meeting the criteria for normal hearing for frequencies through 4 kHz, older adults had signif- icantly poorer hearing than younger adults. Stimuli and Procedures Participants were presented with consonants, words (in a carrier phrase), and sentences in A, V, and AV conditions. All participants were first tested on consonants, followed by words and then sen- tences. Within each stimulus type, however, testing in the A, V, and AV conditions was counterbalanced such that approximately equal numbers of partici- pants from each age group were tested in each possible order of testing modality. All testing was conducted in a double-walled sound attenuating chamber (IAC, 4106). Stimuli were presented via a PC (Dell 420) equipped with a Sound Blaster Live audio card and Matrox (Millennium G400 Flex) 3D video card. Auditory stimuli for the A and AV conditions were presented binaurally in a multi- talker background babble (see below for details on babble level) over headphones (Sennheiser HD 265). Signal level remained constant at 60 dB SPL for the A and AV conditions, as measured in a 6-cc supra- aural flat plate coupler using the A-weighting scale (Bruel & Kjaer 2231). Testing levels were evaluated against calibrated levels before every test using an EAR & HEARING, VOL. 26 NO. 3 265 in-line RMS meter. Visual stimuli for the V and AV conditions were presented on a 17-inch touch screen monitor with participants sitting approximately 0.5 m from the display. The video image completely filled the 17-in monitor. Participants viewed the head and neck of each talker as they articulated the stimuli. Setting the Background Babble Level As noted, one goal of the present study was to investigate age differences in VE and AE under conditions in which A performance was similar across groups and that did not produce ceiling level scores in the AV condition. To accomplish this goal, all testing was conducted in a multitalker back- ground babble and signal-to-babble levels were set individually for each participant and stimulus con- dition (consonants, words, and sentences) so as to produce approximately 50% correct in the A condi- tion. For the words and sentences, the stimuli used to establish the background babble level for a given condition were the same type as those used in the corresponding test condition (e.g., when words were used as test items, the background babble level was established using words). None of the stimuli used for setting the background babble levels for words and sentences were repeated in the test phase (see below for details on stimuli used for setting back- ground babble level with consonants). The multi- talker babble was captured from the Iowa Audiovi- sual Speech Perception Laserdisc (Tyler, Preece, & Tye-Murray, 1986) using 16-bit digitization and a sampling rate of 48 kHz. Babble level was set independently for each par- ticipant and each stimulus type using a modified version of the procedure for establishing speech reception thresholds (American Speech and Hearing Association, 1988). Briefly, in the first phase of the procedure a starting babble level was established by initially setting signal-to-babble ratio to 20 dB. Appropriate stimuli (consonants, words, or sen- tences) were then presented as babble level was increased in 10-dB steps until the participants first incorrect response. The starting babble level for the remainder of the tracking was then set at a level 10-dB less than the level of the first incorrect re- sponse. In the next phase, babble level was incre- mented from the starting level (i.e., 10 dB below the level of the first incorrect response) in 2-dB steps (with two stimuli at each level) until the participant responded incorrectly for five of six trials. Babble level for testing was then established by subtracting the total number of correct responses from the starting level and adding a correction factor of 1. Mean signal-to-babble ratios for the three stimulus types were as follows: consonants (younger, M 9.7, SD 2.1; older, M 4.8, SD 3.7); words (younger, M 8.6, SD 1.2; older, M 6.5, SD 1.5); sentences (younger, M 7.5, SD 0.9; older, M 5.6, SD 1.5). Independent-measures t-tests indicated that older adults were tested at significantly higher signal-to-babble ratios than younger adults for all three types of stimuli (p 0.001 for all comparisons). Consonants Participants received 42 repetitions of 13 conso- nants in an /iCi/ context. The consonants tested were: /m/, /n/, /p/, /b/, //, /t/, /d/, /g/, /f/, /v/, /z/, /s/, and /k/ (e.g., participants were presented with /imi/, ini/, etc.). The same male talker produced all of the stimuli for testing consonants. The stimuli were digitized from existing Laserdisc recordings of the Iowa Consonant Test (Tyler et al., 1986) by connect- ing the output of the Laserdisc player (Laservision LD-V8000) into a commercially available PCI inter- face card for digitization (Matrox RT2000). Acquisi- tion was controlled by software (Adobe Premier). Video capture was 24-bit, 720 480 in NTSC- standard 4:3 aspect ratio and 29.97 frames per second to best match the original analog media. Audio was captured at 16 bits and a sampling rate of 48 kHz. Consonant identification in the A, V, and AV conditions was measured using a 13-item closed-set test. Testing order was counterbalanced such that each modality was presented first, second, and third equally often to approximately the same number of participants in each age group. To familiarize par- ticipants with the test stimuli, the 13 test conso- nants were first presented (in the /iCi/ context) using the A condition without any background bab- ble. Participants were instructed to respond by pressing the appropriate response area on a touch- screen monitor (ELO ETC-170C). Once participants were able to identify all 13 consonants presented in quiet correctly, the background babble was added for all conditions (including V) at the level estab- lished during pretesting (i.e., at the level yielding signal-to-babble ratios designed to produce approx- imately 50% correct A consonant performance). Par- ticipants then received 42 presentations of each consonant with presentation order determined pseu- dorandomly for each participant. No feedback was provided during testing. Words Stimulus materials for testing words were digi- tized from analog recordings of the Childrens Audi- tory Visual Enhancement Test (CAVET) (Tye-Mur- 266 EAR & HEARING / JUNE 2005 ray & Geers, 2001), using the same equipment and procedures as for the consonants. The CAVET con- sists of three lists of 20 words and a practice list embedded in the carrier phrase (say the word) followed by a one- to three-syllable word. Conse- quently, scores for each of the individual conditions (A, V, and AV) were based on a relatively small number of presentations (20 in each condition) com- pared with either the consonants or sentences. De- spite this potential limitation, we elected to use the CAVET for several reasons. First, the CAVET was designed specifically to avoid floor and ceiling level performance in the V condition. Second, each list is considered equally difficult to speechread and uses highly familiar words. The same female talker pro- duced all stimuli. Participants were told the carrier phrase and were instructed to identify the word after the phrase. Lists were counterbalanced across the three presentation modalities (A, V, and AV) such that approximately equal numbers of older and younger adults received each list in a given condition. Before testing, participants were informed that they would see, hear or both see and hear a talker articulating a carrier phrase say the word followed by a target word. Participants were told to say the target word aloud and scoring was based on exact phonetic matches (i.e., adding, deleting, or substi- tuting a single phoneme counted as an incorrect response). If participants were unsure of the target word, they were encouraged to guess. Participants received three practice trials in each condition (A, V, and AV) before testing and none of the practice words appeared during the actual testing. All test- ing was performed using signal-to-babble levels es- tablished for words during pretesting. Participants received a total of 60 trials (20 each in the A, V, and AV conditions) with modality test order counterbal- anced across participants. Within a given test con- dition (A, V, or AV) presentation order of the indi- vidual stimuli was determined pseudorandomly. Sentences Sentences were digitized from Laserdisc record- ings of the Iowa Sentence Test (Tyler et al., 1986), using the same equipment and procedure as was used for the consonants and words. One hundred sentences (five lists of 20 sentences each) were digitized. Two of the lists were used for practice stimuli and the remaining three were used for testing. Each sentence in a list was produced by a different talker (half men, half women). Thus, within a list participants saw and/or heard a new talker on every trial. Lists were counterbalanced across participants such that each of the three lists was presented an equal number of times in the A, V, and AV conditions. All testing was conducted at signal-to-babble ratios established for sentences during pretesting. After the sentence was presented, participants were instructed to repeat as much of the sentence as possible and were again encouraged to guess if they were unsure of one or more words in the sentence. Scoring was based on five to seven key words presented in each sentence. Calculating VE and AE In the present study, VE was calculated relative to an individuals A performance (expressed as pro- portion correct) according to the following equation: VE (AV A)/1 A This measure of VE has been used in several inves- tigations of AV performance (Grant & Seitz, 1998; Grant et al., 1998; Rabinowitz, Eddington, Del- horne, & Cuneo, 1992) because it provides a method for comparing VE across a wide range of A and V scores. For example, an individual scoring 80% in A presentations and 90% with AV presentations would have the same enhancement score as an individual scoring 20% in A and 60% in AV despite large absolute differences in overall A and AV perfor- mance. Furthermore, this equation avoids the bias inherent in the simple difference score AV-A, in which higher values of A necessarily lead to lower values of enhancement. Although less common, we also calculated AE according to the following equation: AE (AV A)/1 V In the present study, AE is less likely to be affected by differences in unimodal performance than the more traditional measure of VE because it normal- izes for any age differences in V performance (recall that different signal-to-babble ratios were used to minimize age differences in A performance). To further minimize differences in unimodal per- formance in comparing both VE and AE as a func- tion of age, we used analysis of covariance (AN- COVA) to control for age differences in A or V scores. That is, when comparing AE as a function of age we used A scores as a covariate to control for any differences in A performance that were not elimi- nated by using different signal-to-babble ratios. Similarly, when comparing VE as a function of age we used V scores as a covariate to control for any age differences in V performance. RESULTS AND DISCUSSION Performance in A, V, and AV Conditions Figure 1 displays mean percent correct for older and younger adults as a function of both stimulus type and presentation modality. A three-way, EAR & HEARING, VOL. 26 NO. 3 267 mixed-design analysis of variance was used to ex- amine main effects and interactions for age, stimu- lus type, and presentation modality. Age (younger versus older) was treated as an independent-mea- sures variable and stimulus type (consonants, words, sentences) and presentation modality (audi- tory, visual, and auditory-visual) were treated as repeated-measures variables. Overall, performance differed significantly across stimulus type [F(2, 164) 10.8, p 0.001]. Tukey honestly significant dif- ferences post hoc pairwise comparisons with a Bon- ferroni correction for multiple comparisons indi- cated significant differences between identification scores for all three types of stimuli such that conso- nants (56.1%) were identified significantly better than words (51.1%) and words were identified sig- nificantly better than sentences (40%) (p 0.001 for all comparisons). Significant differences were also observed for presentation modality [F(2, 164) 12.3, p 0.001], with scores for AV presentations (78.1%) significantly higher than for A (47.8%) and scores for A significantly higher than for V (21.9%) (p 0.001 for all comparisons). Recall, however, that A scores were established using signal-to-bab- ble ratios designed to produce approximately 50% correct in this condition. Therefore, although the data indicate that the procedure for establishing appropriate signal-to-babble levels in A were suc- cessful (overall A performance was very close to 50%), the measures do not reflect true group differ- ences in A performance. Finally, older adults scored significantly poorer overall (46.7%) than younger adults [51.6%, F(1, 82) 16.2, p 0.001]. Of particular interest to the present investigation is that the three way interaction of age stimulus type presentation modality was significant [F(4, 328) 2.5, p 0.05]. A series of Tukey honestly significant differences post hoc comparisons with a Bonferroni correction for multiple comparisons indi- cated no significant age differences in A perfor- mance for the three types of stimuli (p 0.9 for all comparisons). This finding provides additional evi- dence that the procedure for equating A perfor- mance levels was effective across age groups and stimulus types. In the AV condition, however, older adults exhibited significantly poorer identification scores than younger participants for consonants and words (p 0.05 for both comparisons) but not for sentences. Older adults also exhibited significantly poorer V scores for both consonants (p 0.001) and words (p 0.05). The difference between V perfor- mance for sentences was not significant (p 0.5), but this finding should be interpreted cautiously because absolute performance levels were relatively low for both older and younger adults. Post hoc comparisons of V performance as a function of stim- ulus type for older and younger adults indicated that younger adults had similar V scores for consonants and words (p 0.3) but significantly lower scores for sentences (p 0.001 for the differences between consonants and sentences and between words and sentences). For older adults, V performance was Fig. 1. Percent correct identification of consonants (left), words (middle), and sentences (right) for younger (filled bars) and older (open bars) adults. Presentation modality refers to visual only (V), auditory only (A), and auditory-visual (AV). Error bars indicate standard deviation. 268 EAR & HEARING / JUNE 2005 significantly better for words than for consonants and significantly better for consonants than for sentences (p 0.001 for all comparisons). Comparison of VE and AE As noted, evidence for age differences in the ability to extract visual speech information is a critical consideration when evaluating age-related changes in both VE and AE. In the current study, VE was examined using three separate ANCOVAs (one for each stimulus type). In all analyses, age (younger, older) served as an independent-measures variable and the appropriate V condition served as the covariate (i.e., performance in the consonant V condition served as a covariate when comparing consonant VE for older and younger adults). The left panel of Figure 2 displays the adjusted means for VE from the ANCOVAs as a function of age and stimu- lus type. None of the differences between older and younger adults for VE reached statistical signifi- cance: for consonants [F(1, 81) 2.1, p 0.15; for words F(1, 81) 1.4, p 0.23; for sentences F(1, 81) 1, p 0.93]. Analyses paralleling those for VE were also con- ducted to examine age differences in AE. Age again served as an independent-measures variable and the corresponding A scores (rather than V scores) served as the covariates. The right panel of Figure 2 displays the adjusted means for AE as a function of stimulus type and age. Separate ANCOVAs were conducted to examine age differences in AE for the three stimulus types and none of the comparisons reach statistical significance (all F- values less than 1). Correlations Between Measures Correlations Between V Performance for Con- sonants, Words, and Sentences In addition to examining age-related changes in AE and VE, the present study was also designed to investigate the relationship between performance in the V and AV modalities as a function of stimulus type. Figure 3 displays the relationships between V performance for consonants, words, and sentences as a function of both age and stimulus type. Overall, correlations were higher for younger than for older adults and were stronger when correlating V for words and sentences than for either consonants and words or consonants and sentences. All correlations were sig- nificant at the .01 level except for the correlation between words and sentences for younger adults (p 0.001). The correlation between consonants and words for older adults (p 0.09) was not significant. These findings suggest that V performance for the three types of stimuli is mediated, at least in part, by a set of common mechanisms that are a critical component of speechreading and that operate inde- pendent of lexical or semantic constraints. Correlations Between Measures of AV Perfor- mance Figure 4 displays scatterplots and correla- tion coefficients for AV performance as a function of stimulus type and age group. In contrast to the findings with V scores, only one correlationthe correlation between AV performance for words and sentences in younger participantswas significant (p 0.01). However, the magnitude of the correla- tion (0.46) was relatively modest in that AV perfor- mance for words accounted for just slightly greater than 20% of the variance in AV performance with Fig. 2. Left, Adjusted means for visual enhancement for younger (filled bars) and older (open bars) adults as a function of stim- ulus type. Right, Same as in left panel except data are for audi- tory enhancement. Error bars in- dicate standard deviation. EAR & HEARING, VOL. 26 NO. 3 269 sentences. For older adults, there was a positive relationship between AV performance with words and sentences but this did not reach statistical significance. Correlations between consonants and words and between consonants and sentences did not reach significance (and were actually negative for younger participants). One implication of these findings is that AV performance for consonants is mediated by mechanisms that are distinct from those used to identify words and sentences. For younger adults, the moderate correlation between words and sentences indicates some overlap in the mechanisms used to identify these two types of stimuli when both auditory and visual speech infor- mation is available. However, the relatively small correlations obtained for older adults across all three stimulus types suggests that they may rely on different mechanisms for AV perception of conso- nants, words, and sentences. Relationship Between Measures of VE and AE Figures 5 and 6 display the scatterplots and correlation coefficients for VE and AE between con- sonants, words, and sentences as a function of age group. Overall, the findings are similar to those obtained with AV presentations in that the stron- gest positive correlations obtained were between words and sentences. For VE, the correlation be- tween consonants and sentences was significant for older adults (p 0.05) but not for younger adults who, similar to the results for AV presentations, exhibited a nonsignificant (p 0.27) negative corre- lation between VE for consonants and sentences. For AE, the correlation between consonants and sentences was not significant for older adults but was significant and negative for younger adults. Note that similar to the significant correlations observed in the AV condition, the relative magni- tude of the correlations for both VE and AE were small to moderate, accounting for 8 to 14% of the variance. Also similar to the AV results, older adults demonstrated a small, but nonsignificant, positive correlation for both VE and AE between consonants and words but younger adults exhibited a small negative relationship between these two stimulus types. Taken together, these findings suggest that the mechanisms mediating VE and AE for conso- nants are largely distinct from those mediating VE and AE for words and sentences and this is true for Fig. 3. Scatterplots and Pearson product-moment correlations between visual-only scores for consonants and words (left), consonants and sentences (middle), and words and sentences (right). Solid circles and clear triangles represent data for younger and older adults, respectively. Darker solid line shows the best fitting regression line for the younger adult data and the lighter line shows the best fitting regression line for the older adults. Single (*), double (**), and triple (***) asterisks indicate significance levels of 0.05, 0.01, and 0.001, respectively. 270 EAR & HEARING / JUNE 2005 both older and younger adults. The small but signif- icant correlations between the two enhancement measures (AE and VE) for words and sentences suggest some overlap in the mechanisms underlying enhancement for these two types of stimuli but also indicate significant independence in the operations mediating the benefits of combined auditory and visual speech information for words and sentences. Finally, the similar correlations for both VE and AE between words and sentences in younger and older adults is consistent with the use of similar abilities in the two groups to improve word and sentence perception when both auditory and visual speech signals are available. General Discussion The present study was designed to investigate the effects of age on the ability to benefit fromcombining auditory and visual speech information. The find- ings indicate that older and younger adults exhibit comparable benefits, as indexed by measures of both VE and AE, for all three types of stimuli tested (consonants, words, and sentences). Moreover, older adults were able to achieve similar enhancement relative to either unimodal condition, despite signif- icant reductions in V performance. Correlations be- tween performance with the different stimulus types indicated significant positive correlations between V performance for consonants, words, and sentences, with younger adults generally showing stronger correlations than older adults. Correlations across stimulus types for AV, VE, and AE, however, were generally significant only for the relationship be- tween words and sentences, with small (and some- times negative) correlations between consonants and words and between consonants and sentences. Age and Speechreading Performance The results for the V condition are in good agree- ment with previous studies demonstrating age-re- lated declines in speechreading (Dancer, et al., 1994; Honneil, et al., 1991; Lyxell & Ronnberg, 1991; Middelweerd & Plomp, 1987; Shoop & Binnie, 1979). Shoop & Binnie (1979), for example, reported differ- ences of approximately 13% between younger (40 to 50) and older (over age 71) adults on a CV speechre- ading test. This age difference is similar to the difference of approximately 17% between younger and older adults that was observed for the V condi- tion with VCV stimuli in the present study. The V sentence data from the present study are also in good agreement with previous investigations of age- related changes in speechreading. Dancer et al. (1994) and Honneil et al. (1991), for instance, both Fig. 4. Same as Figure 3, except the data are for auditory-visual presentations. EAR & HEARING, VOL. 26 NO. 3 271 reported declines of 8 to 10% in V sentence intelli- gibility for older, compared with younger adults. These values compare favorably with the 6% differ- ence in V sentence performance that was observed between older and younger adults in the present study. Thus, the findings from both the present study and previous investigations suggest that older adults are less able than younger participants to encode visual speech information. To our knowledge, the current study also repre- sents the first demonstration of age-related declines in V performance using older participants with clin- ically normal hearing (at least through 4 kHz). Viewed in conjunction with previous findings of declines in V performance for hearing-impaired older adults (Dancer et al., 1994; Honneil et al., 1991), our results suggest a dissociation between hearing abilities and speechreading, at least in older adults. Thus, it appears that aging is associated with declines in one or more capacities that are critical for successful encoding of V speech informa- tion and that these impairments are independent of hearing status. One implication of these findings is that if speechreading training is to be successful with older adults, future research must be directed at identifying and correcting the mechanisms re- sponsible for the poorer V performance for this population. Another interesting aspect of the present V data is the reduced performance for sentences, compared with either consonants or words. This pattern of identification scores is exactly opposite what is ob- served for A presentations, where identification of words in sentences is generally higher than for isolated words or CVs (Bernstein, Demorest, & Tucker, 2000; Hutchinson, 1989; Nittrouer & Boo- throyd, 1990; Sommers & Danielson, 1999). One explanation for the differential pattern of findings with A and V presentations is that the advantage for sentences with A presentations is dependent on the listeners ability to extract and use semantic and syntactic information to increase the predictability of target items (Miller, et al., 1951; Sommers & Danielson, 1999). In the case of V presentations, however, extraction of semantic and syntactic cues is more difficult because identification of individual words is often not possible. The reduced availability of semantic and syntactic information combined with the increased processing demands of under- standing sentences (i.e., sentences present more Fig. 5. Scatterplots and Pearson product-moment correlations between visual enhancement for consonants and words (left), consonants and sentences (middle), and words and sentences (right). Solid circles and clear triangles represent data for younger and older adults, respectively. Darker solid line shows the best fitting regression line for the younger adult data and the lighter line shows the best fitting regression line for the older adults. Asterisks indicate significance at the 0.05 level. 272 EAR & HEARING / JUNE 2005 information than either words or consonants) prob- ably is the main reason that V performance for sentences was the poorest of the three stimulus types for both older and younger adults. Despite the relatively poor performance with V presentations, significant correlations were ob- tained between V performance with consonants, words, and sentences, and the magnitude of these correlations is similar to what has been reported previously (Bernstein, et al., 2000; Demorest, Bern- stein, & DeHaven, 1996; Grant & Seitz, 1998). Grant & Seitz (1998), for example, reported a corre- lation coefficient of 0.49 between V scores for conso- nants and sentences. This value is similar to the correlations of 0.42 and 0.39 obtained for younger and older adults, respectively, in the present study. Bernstein et al. (2000) measured V performance for phonemes, words, and sentences in normal-hearing adults (18 to 45 years of age) and found significant correlations between scores for phonemes and words (0.39) and phonemes and sentences (0.43). Simi- larly, Demorest et al. (1996) reported that V perfor- mance for younger adult participants on nonsense syllables correlated approximately .5 with V perfor- mance on both words and sentences, with an even stronger correlation (approximately 0.8) between V performance with words and sentences. Again, the magnitudes of these correlations are comparable to those observed for younger adults in the V condition of the present study. Thus, the picture that is beginning to emerge from investigations of speechreading using different types of stimulus ma- terials is that there is some overlap in the mecha- nisms used to extract V speech information from consonants, words, and sentences and that this set of core abilities is similar for both older and younger adults. Auditory and Visual Enhancement The finding that older and younger adults exhibit similar enhancement from combining auditory and visual speech information replicates and extends results from previous investigations examining the effects of age on AV speech perception. To our knowledge, the present investigation is the first to examine age effects on auditory-visual enhancement across three different types of stimuli after control- ling for age-related changes in unimodal encoding. Consistent with the absence of age differences in enhancement that were obtained in the present study, Helfer (1997; 1998) reported that visual ben- efit for words presented in nonsense sentences was similar for younger normal-hearing adults and older participants with mild to moderate hearing impair- ments. These findings are particularly relevant to Fig. 6. Same as in Figure 5, except data are for auditory enhancement. EAR & HEARING, VOL. 26 NO. 3 273 the current investigation because Helfer used one of the same relative measures of enhancement, VE, as in the present study and older adults were tested at a slightly higher signal-to-noise ratio than younger adults. Taken together, the findings from Helfer (1998) and the present investigation suggests that older and younger adults exhibit similar enhance- ment from combining auditory and visual speech signals over a relatively large range of A perfor- mance. An important direction for future research will be to examine whether the age equivalence in enhancement observed in the present study main- tains for more adverse listening conditions where A scores are reduced and the need for integrating auditory and visual speech signals is even greater. The finding that older and younger adults exhib- ited comparable enhancement from combining audi- tory and visual speech information suggests that age differences in AV performance reflect age-related impairments in the ability to encode V speech sig- nals (recall that A performance was manipulated to be approximately equivalent in the two groups) rather than a reduced ability to combine or integrate information across the two modalities. Within the model of AV perception proposed by Grant et al. (1998), this hypothesis argues that when older and younger adults are able to obtain similar amounts of visual and auditory speech information, both groups exhibit similar abilities to integrate the unimodal percepts into a unified AV perception. That is, the present findings suggest that any age differences in AV perception do not result from age differences in central integration capacities. Investigations are currently underway in our laboratory to obtain more direct measures of integration per se (i.e., indepen- dent of unimodal encoding) with the working hy- pothesis that older and younger adults will exhibit similar integration efficiency. Pattern of Correlations for VE and AE In contrast to the significant correlations between V performance for consonants, words, and sen- tences, both VE and AE for these three types of stimuli generally exhibited small, and in some cases, even negative relationships. In interpreting these results it is important to note that a number of methodological differences between the conditions may have contributed to the relatively low correla- tions. Specifically, consonants were tested using a closed-set format whereas words and sentences were tested using an open-set format. In addition, a single male talker produced the items for the consonant test, a single female talker produced the items for the word test, and both male and female talkers were used to produce items for the sentence test. Despite these methodological differences, however, the findings are generally consistent with previous measures of visual enhancement as a function of stimulus type. Grant & Seitz (1998) also reported a near-zero (0.06) correlation between visual enhance- ment for consonants and sentences in a group of hearing-impaired adults ranging in age from 41 to 76. This pattern of results is readily explained within the model of AV perception proposed by Grant et al. (1998). Specifically, the model proposes that AV performance for consonants is determined primarily by bottom-up capacities that serve to extract appropriate linguistic cues from auditory and visual speech signals. For words and sentences, however, AV performance is determined by both bottom-up extraction of auditory and visual cues and by top-down lexical, syntactic, and semantic processing. Thus, the small correlations between enhancement (both AE and VE) for consonants and enhancement for other types of stimuli may reflect the increased importance of top-down processing abilities for the more linguistically complex words and sentences. Consistent with this proposal, the strongest correlations for both AE and VE were between words and sentences, suggesting some overlap in the mechanisms mediating enhancement for these two types of stimuli. Finally, it is useful to consider the pattern of correlations for both AE and VE as a function of stimulus type in relation to the analogous correla- tions for V and A. As noted, both the current study and previous investigations have found moderate to strong correlations between V performance for con- sonants, words, and sentences. Similarly, Humes et al. (1994) found that correlations between A perfor- mance with the same three types of stimuli (conso- nants, words, and sentences) ranged between 0.35 and 0.85. These findings suggest that the absence of strong correlations for AE and VE across conso- nants, words, and sentences is a consequence of the mechanisms mediating integration rather than to within-modality differences in the mechanisms un- derlying identification of consonants, words and sentences. An important goal for future research, therefore, will be to identify the unique demands imposed by integration and to specify why those demands reduce or eliminate correlations between enhancement for consonants, words, and sentences. ACKNOWLEDGMENTS Portions of these data were presented at the 144th meeting of the Acoustical Society of America, Cancun, Mexico. This research was supported by grant R01 AG 180294 from the National Institute on Aging. The authors thank Arnold Heidbreder for technical assistance. 274 EAR & HEARING / JUNE 2005 Address for correspondence: Mitchell S. Sommers, Department of Psychology, Washington University, Campus Box 1125, St. Louis, MO 63130. Received April 20, 2004; accepted December 12, 2004 REFERENCES American National Standards Institute. (1989). Specification for Audiometers. ANSI S3.61989. New York. American Speech and Hearing Association (1988). Guidelines for determining threshold levels for speech. American Speech and Hearing Association, 8589. Bernstein, L. E., Demorest, M. E., & Tucker, P. E. (2000). Speech perception without hearing. Perception and Psychophysics, 62, 233252. CHABA, Committee on Hearing and Bioacoustics, Working Group on Speech Understanding and Aging. (1988). Speech understanding and aging. Journal of the Acoustical Society of America, 83, 859895. Cienkowski, K. M., & Carney, A. E. (2002). Auditory-visual speech perception and aging. Ear and Hearing, 23, 439449. Dancer, J., Krain, M., Thompson, C., Davis, P., & Glenn, J. (1994). A cross-sectional investigation of speechreading in adults: Effects of age, gender, practice and education. Volta Review, 96, 3140. Dekle, D. J., Fowler, C. A., & Funnell, M. G. (1992). Audiovisual integration in perception of real words. Perception & Psycho- physics, 51, 355362. Demorest, M. E., Bernstein, L. E., & DeHaven, G. P. (1996). Generalizability of speechreading performance on nonsense syllables, words, and sentences: Subjects with normal hearing. Journal of Speech and Hearing Research, 39, 697713. Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). Mini- mental state: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychological Research, 12, 189198. Grant, K. W., & Seitz, P. F. (1998). Measures of auditory-visual integration in nonsense syllables and sentences. Journal of the Acoustical Society of America, 104, 24382450. Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integra- tion. Journal of the Acoustical Society of America, 103, 2677 2690. Helfer, K. S. (1997). Auditory and auditory-visual perception of clear and conversational speech. Journal of Speech, Language, and Hearing Research, 40, 432443. Helfer, K. S. (1998). Auditory and auditory-visual recognition of clear and conversational speech by older adults. Journal of the American Academy of Audiology, 9, 234242. Honneil, S., Dancer, J., & Gentry, B. (1991). Age and speechre- ading performance in relation to percent correct, eyeblinks, and written responses. Volta Review, May, 207212. Humes, L. E., Watson, B. U., Christensen, L. A., Cokely, C. G., Halling, D. C., & Lee, L. (1994). Factors associated with individual differences in clinical measures of speech recogni- tion among the elderly. Journal of Speech and Hearing Re- search, 37, 465474. Hutchinson, K. M. (1989). Influence of sentence context on speech perception in younger and older adults. Journal of Gerontology, 44, 3644. Lyxell, B., & Ronnberg, J. (1991). Word discrimination and chronological age related to sentence-based speech-reading skill. British Journal of Audiology, 25, 310. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746748. Middelweerd, M. J., & Plomp, R. (1987). The effect of speechre- ading on the speech-reception threshold of sentences in noise. Journal of the Acoustical Society of America, 82, 21452147. Miller, G. A., Heise, G. A., & Lichten, W. (1951). The intelligibility of speech as a function of the context of the test material. Journal of Experimental Psychology, 41, 329335. Morrell, C. H., Gordon-Salant, S., Pearson, J. D., Brant, L. J., & Fozard, J. L. (1996). Age- and gender-specific reference ranges for hearing level and longitudinal changes in hearing level. Journal of the Acoustical Society of America, 100, 19491967. Nittrouer, S., & Boothroyd, A. (1990). Context effects in phoneme and word recognition by younger children and older adults. Journal of the Acoustical Society of America, 87, 27052715. Pelli, D., Robson, J., & Wilkins, A. (1998). The design of a new letter chart for measuring contrast sensitivity. Clinical Vision Science, 2, 187199. Pichora-Fuller, M. K., Schneider, B. A., & Daneman, M. (1995). How younger and older adults listen to and remember speech in noise. Journal of the Acoustical Society of America, 97, 593608. Rabinowitz, W. M., Eddington, D. K., Delhorne, L. A., & Cuneo, P. A. (1992). Relations among different measures of speech reception in subjects using a cochlear implant. Journal of the Acoustical Society of America, 92, 18691881. Sams, M., Manninen, P., Surakka, V., Helin, P., & Ktt, R. (1998). McGurk effect in Finnish syllables, isolated words and words in sentences: Effects of word meaning and sentence context. Speech Communication, 26, 7587. Shoop, C., & Binnie, C. A. (1979). The effects of age upon the visual perception of speech. Scandinavian Audiology, 8, 38. Sommers, M. S., & Danielson, S. M. (1999). Inhibitory processes and spoken word recognition in younger and older adults: The interaction of lexical competition and semantic context. Psy- chology and Aging, 14, 458472. Sumby, W. H., & Pollack, I. (1954). Visual contributions to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212215. Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd and R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip- reading (pp. 351). Hillsdale, NJ: Lawrence Erlbaum Associ- ates. Tye-Murray, N. & Geers, A. (2001). Childrens audio-visual en- hancement test, Central Institute for the Deaf, St. Louis, Missouri. Tyler, R. D., Preece, J., & Tye-Murray, N. (1986). The Iowa laser videodisk tests. University of Iowa Hospitals: Iowa City, IA. Walden, B. E., Busacco, D. A., & Montgomery, A. A. (1993). Benefit from visual cues in auditory-visual speech recognition by middle-aged and elderly persons. Journal of Speech and Hearing Research, 36, 431436. EAR & HEARING, VOL. 26 NO. 3 275