Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Hearing Science and Hearing Disorders
Hearing Science and Hearing Disorders
Hearing Science and Hearing Disorders
Ebook508 pages6 hours

Hearing Science and Hearing Disorders

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Hearing Science and Hearing Disorders focuses on the nature of the processes in the inner ear and the nervous system that mediate hearing. Organized into eight chapters, this book first discusses the nature of speech communication, the extent of hearing problems, and the pathophysiology of hearing. Four core chapters follow, in which four areas of central importance to understanding hearing disorders and their effects are covered. These areas are assessment of auditory function, the scope for technological solutions, the nature of audio-visual speech perception, and the effects of deafness upon speech production. This book will be valuable to students; to academic and professional workers concerned with hearing, speech, and their disorders; and to scientifically or medically literate people in general.
LanguageEnglish
Release dateJun 28, 2014
ISBN9781483295169
Hearing Science and Hearing Disorders

Related to Hearing Science and Hearing Disorders

Related ebooks

Medical For You

View More

Related articles

Reviews for Hearing Science and Hearing Disorders

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Hearing Science and Hearing Disorders - M.E. Lutman

    8LD

    1

    Hearing for Speech: the Information Transmitted in Normal and Impaired Hearing

    Peter J. Bailey

    Publisher Summary

    This chapter discusses the information that is present in speech; it has been shown that such knowledge can be used to bring a degree of order and understanding into the difficulties that are faced by hearing-impaired listeners in understanding speech. Supra-segmental information can be distinguished from two types of segmental information—the voicing and place of articulation features. There are grounds for believing humans to be specialized for speech communication. Parallel specialization for speech perception would be plausible despite the absence of extensive experimental evidence for perceptual specialization. Several characteristics of speech render it an efficient code for the particular communicative role it that it serves. It can support high rates of information flow by contrast with writing and reading, but it is flexible to allow the other extreme in trading off speed against accuracy, as when people pronounce clearly and slowly a crucial, unpredictable, or rare word, or a proper name.

    Contents

    INTRODUCTION

    II WHAT FORMS CAN LINGUISTICALLY RELEVANT INFORMATION TAKE?

    III ARTICULATORY AND ACOUSTIC BASES FOR PHONETIC CONTRASTS

    Acoustic theory of speech production

    Acoustic correlates of articulatory dimensions

    IV REPRESENTATIONS OF ACOUSTIC AND AUDITORY SPEECH PATTERNS

    PROPERTIES OF HEARING IMPAIRMENT RELEVANT TO SPEECH PERCEPTION

    VI DIVISION OF LABOUR BETWEEN PERCEPTION AND PRODUCTION FOR EFFICIENT SPEECH COMMUNICATION

    VII SPEECH PERCEPTION IN HEARING IMPAIRED LISTENERS

    VIII CONCLUDING SUMMARY

    I INTRODUCTION

    There are grounds for believing humans to be specialised for speech communication - most evidently for speech production by virtue of the shape and flexibility of movement of the vocal tract and its articulators. Parallel specialisation for speech perception would be plausible despite the absence of extensive experimental evidence for perceptual specialisation (Repp, 1982). More generally, there is no compelling evidence against the formal uniqueness of human linguistic behaviour (MacPhail, 1982). The implications of these points are: first, evolutionary pressures have rendered speech an efficient vehicle for linguistic communication (Lieberman, 1973), and secondly, speaker/hearers appear to be specialised to produce and perceive linguistic information in this form. Second, evolution of human culture has since made speech the main reason we have for hearing. To the extent that these observations are true it would seem perverse not to attempt to maximise access to speech for hearing-impaired people before trying other rehabilitative measures.

    Several characteristics of speech render it an efficient code for the particular communicative role it serves. It can support high rates of information flow by contrast with writing and reading (intelligibility falls significantly only when speech rates exceed about 160 words per minute), but it is flexible enough also to allow the other extreme in trading off speed against accuracy, as when people pronounce clearly and slowly a crucial, unpredictable, or rare word, or a proper name. The redundancy which permits these high rates also ensures substantial resistance to the masking effects of extraneous noise, so that speech remains intelligible despite considerable distortion or signal-to-noise ratios worse than 0 dB. Speech is well adapted to carry extra-linguistic information such as emphasis and emotional content by means of variations in pitch, loudness and rhythm. Speech also provides meta-communicative cues such as those which cede or deny the floor to another interlocutor; these facilitate fluent conversation. At a more practical level, there is clear virtue in a mode of communication which leaves hands, feet and limbs free for other skilled behaviour. For all these reasons there is good justification for expressing our understanding of hearing impairment in specific relation to the requirements of understanding speech. Traditionally in audiology this has only been done in a rather notional way, on which recent developments in the acoustical control of test materials have begun to improve (e.g. Fourcin, 1980).

    Much of the handicap of hearing impairment derives from a single intervening disability - the failure to understand speech easily, particularly when listening in a noisy environment. A common report from hearing-impaired people is that they can hear speech but cannot understand what is being said. My two main aims in this chapter are to characterise the main types of linguistically significant information present in speech, and to establish a basis for discussion of some factors which contribute to poor speech perception in hearing-impaired people. A complementary chapter (Summerfield, 1982; this volume) considers the limitations on, and implications of, speech perception using additional information derived from senses other than hearing primarily vision.

    Ideally we might begin with a description of the process of speech understanding in normal hearing listeners. In that way, drawing on data on hearing impairment we might be able to deduce the effects of hearing impairment on speech processing and discuss the results of experiments which test such predictions. In a statistical fashion this has been done (see Haggard, 1982; this volume). But these purely quantitative formulations make no reference to the specific classes of information within speech or to the processes by which the information is analysed and interpreted and so, despite their usefulness, offer no element of explanation. Only with such understanding can hope be entertained of an effective approach to disability assessment, hearing aid design, and training in hearing tactics. Unfortunately, in spite of a considerable body of data on the relationships between speech production, speech acoustics and speech perception, there exists no adequate account of normal speech understanding that embraces the complex relationships reported and qualifies as a genuine theory. I shall concentrate, therefore, on a more descriptive level, clarifying the principles governing the form of acoustical patterns resulting from activities of the vocal apparatus, and examining how these patterns carry linguistically significant information for the listener. Many of the experiments I shall refer to have used listeners with normal hearing; this is in part because such experiments are the majority, but also because there are reasons to believe that improvements in auditory prostheses are now likely to follow from a better overall understanding of normal speech processes (Haggard, 1982; this volume). My general strategy will be to attempt to rationalise the patterns of speech perception deficits that are characteristic of certain kinds of hearing impairment, particularly hearing loss of sensorineural origin acquired post-lingually. Specifically, I shall comment on the relative efficacy with which various significant acoustical properties of speech are preserved in the listener’s impaired auditory system.

    II WHAT FORMS CAN LINGUISTICALLY RELEVANT INFORMATION TAKE?

    Most normal language users believe that understanding is an immediate and effortless consequence of listening to speech. Against this background, describing formally what is involved in successful speech understanding is surprisingly difficult. To describe spoken language demands a complex representation which can take many forms. A useful view of the speech communication process is as a set of sub-processes inside the brains of talkers and listeners. The first set is in the talker starting with the intention to communicate, and involves a series of normally hierarchical stages where implicit knowledge about word meanings, syntax, word-sound correspondence etc. is used to encode a message into an acoustic signal. The listener is supposed to decode the signal using an approximately matched set of hierarchical but inverse perceptual processing stages, beginning with an auditory representation and terminating in recovery of the talker’s message and hence understanding. Each processing stage is assumed to transform the message from one internal representation to another, preserving linguistically relevant information. A full account of linguistic communication would thus require a specification of each representation and a detailed description of the mechanism of each processing stage. This view is not an explanatory model of the process but a starting framework within which detailed models could be proposed. The psychological reality of a particular model has then to be established by experimental investigation.

    Although normal and abnormal linguistic and phonetic structures can be described in a fashion that is logically rigorous (see Cowie and Douglas-Cowie, 1982; this volume) the only readily accessible data which can be measured in a physical sense are the optical correlates of speech and the acoustic speech signal; if one is interested in production, various physiological measures of articulatory behaviour may be added. However, if used in isolation, conventional techniques for acoustical analysis of speech do not illuminate directly its linguistically significant properties. This issue - the nature of acoustic correlates of linguistic units - is a central one for this chapter and will be considered in detail. We must begin, however, with a brief discussion of some ways of conceptualising the elements of a linguistic message.

    I shall refer to the structures that generate speech - vocal cords, pharynx, soft and hard palate, tongue, teeth, jaw, lips, nasal passages etc. - as forming the vocal tract, and to the larger moving parts - lips, tongue and jaw - as the major articulators in the vocal tract. Measurements of articulator movement reveal intricate motor patterns; the simple demonstration of attending to all the detailed antics in your own vocal tract, while speaking this sentence aloud in slow motion will confirm that speaking is a complex act which demands precise control and coordination of a large number of muscles. Despite this complexity when expressed in terms of spatio-temporal coordinates of major articulators over time, a number of general principles of vocal tract action can be described which form the basis for a more manageable taxonomy of speech involving a set of intersecting articulatory classes. Articulatory classifications of speech elements are economical, and have historical respectability - they were employed by Sanskrit grammarians roughly 2600 years ago.

    A relatively small number of articulatory dimensions is sufficient to carry linguistically significant contrasts. Vowels (for example, /i/ and /a/ in deep, dark), semi-vowels /w/ as in wailed), continuant consonants (/s/ as in monster) and interrupted consonants (/d/ as in dark, /g/ in grotto) form a natural ranking of articulations with increasingly narrow constriction of the vocal tract. Another important dimension is the position in the vocal tract where the maximum constriction occurs; the initial consonants in gay, day and bay involve constriction at increasingly more forward vocal tract locations, towards the front of the mouth. These two dimensions correspond roughly to those known to phoneticians as manner and place of articulation. The voicing contrast, referring to the initial presence or absence of vibration of the vocal cords, as between the initial consonants /b/ and /p/ in bay and pay, allows further subdivisions of some of the above categories. This taxonomy allows the phonemes of a language to be represented as an intersecting set of features and hence allows utterances to be represented as articulatorily-defined segments arrayed serially in time. Thus the initial segment in bay is an interrupted, voiced consonant with bilabial place of articulation, that is with vocal tract constriction at the lips. The adequacy of such a description of the content of an utterance in terms of a series of phonetic segments or phonemes (consonant, vowel, consonant etc.) having in turn distinctive features (interrupted, voiced etc.) depends on the purpose for which the description is used. It shares much with schemes one might use to classify the orthography of written language; segments correspond roughly to alphabetical characters and features to properties like presence or absence of a vertical stroke in a character. For speech, descriptions at this level are natural candidates for expressing economically some of the knowledge that language users have which makes them creative. For example, we can state simple prescriptive rules for the formation of the plural of English nouns never previously encountered. Although generally written with an s, the plural is realised phonetically in different ways, chiefly as /Iz/, /z/ or /s/ depending on the preceding segment. The ease with which this and similar rules can be stated in segmental terms contrasts sharply with their difficulty of expression in any other form, and is seen by those seeking a description of the sound pattern of languages as an argument for the fundamental nature of phonetic segments (Halle, 1964).

    I have dwelt on the background to the phonetic specification of utterances because of its basic role in speech communication research. A string of phonetic symbols is sometimes taken also as the appropriate description of the input to speech production processes on the one hand, and the output of processes of speech perception on the other. However the convenience of a particular representation for capturing the intuitions of theoretical linguists about the structure of a language may not be a sufficient reason for according that representation the status of physiological or psychological reality. Although the phoneme concept remains useful there is only equivocal empirical support for a phonemic stage in speech perception. Serious consideration has been given to schemes wherein lexical access - the process of making contact with a word in the internal lexicon - can be achieved without invoking an intermediate phonemic representation (see Summerfield, 1982; this volume).

    The listener’s particular expectations, goal or task when presented with a speech signal may condition the different levels at which the listener may represent the signal internally. In this chapter I shall be concerned primarily with the aspects of speech perception that transform an auditory representation, such as may be present in the pattern of nerve activity in the VIIIth nerve, and deliver as output a representation analogous to a phonemic or perhaps lexical specification of the message. Since practical and ethical considerations limit the use of neurophysiological techniques with human subjects, many of the data on the internal representations at both these stages in humans are derived from psychophysical experiments where the perceptual consequences of systematic manipulations of acoustical stimulus structure are assessed from listener’s judgements. Such experiments must be conducted using experimental methods designed to minimise effects such as response bias, practice and experience, even though these are of practical significance, since the aim is to characterise basic processes of auditory speech processing in general. Convergent with these results, the increasingly sophisticated electrophysiological techniques available for recording directly from single units in animal auditory systems discussed by Evans (1982; this volume) are revealing details of auditory representations in non-human animals.

    For some species (cat, for example) there exists a fairly large body of data on electrophysiological responses to sound patterns and also some behavioural data from psychoacoustical experiments using comparable stimulus manipulations (Pickles, 1980). In general the physiological and psychophysical data are in accord and are broadly consistent with the results of psychophysical experiments on humans. We may reasonably assume, therefore, that animal models do give useful insight into the form of the internal auditory represention in humans on which later perceptual processes must go to work.

    I shall not consider in detail here the large issues of representation or processing of the syntactic, semantic or pragmatic information in utterances. Although it is acknowledged that the fine tuning of central mechanisms may depend upon peripheral input, particularly during early maturation, hearing impairments acquired post-lingually are normally considered to spare these high level aspects of speech understanding.

    III ARTICULATORY AND ACOUSTIC BASES FOR PHONETIC CONTRASTS

    In producing speech the vocal tract and major articulators can be thought of as implementing a series of abstract logical states selected from a limited number of feature combinations. In fact at the level of physical measurements we have to realise that these configurations are only descriptions of targets that may not, and need not, be fully achieved. In order to characterise the acoustic concomitants of particular vocal tract configurations and to appreciate the acoustical consequences of articulatory manoeuvres, it is necessary to understand some of the acoustic theory underlying speech production.

    A Acoustic theory of speech production

    Broadly speaking the talker has control over two inter-related aspects of the state of the vocal tract - its gross shape (which will include the disposition of major articulators within the tract), and the type and amount of sound energy that is created and fed into the tract. The vocal tract acts as a filter to modify the character of an intrinsic sound source. In technical terms the sound radiated from the nose and lips has spectral properties determined by the convolution of the source spectrum with the spectral transfer function (ie. the frequency response of the vocal tract). The filtering properties of the vocal tract derive from the natural resonances of its cavities - like any other enclosed air spaces, cavities within the vocal tract have characteristic frequencies which are roughly (and inversely) dependent upon the cavity size. We are all familiar with this generalisation - large organ pipes produce low notes, and large animals tend to have low-pitched calls. Usually when we produce a vowel, the sound source (or excitation) is provided by periodic modulation of air flow between the vocal cords in the larynx, achieved by rapid opening and closing of the vocal folds. The periodicity of this modulation determines the spacing of harmonics in the source spectrum and the pitch of the vowel. The sound energy generated by vocal cord vibration has the spectral section shown in Fig. 1(a). The filtering action of the vocal tract has the effect of enhancing energy in the source at some frequencies as a result of vocal tract resonances, and of attenuating energy at other frequencies. Schematic vocal tract outlines and their corresponding filter transfer functions are shown for the vowels /i/, /a/ and /u/ in Fig. 1(b) and (c), and the spectra of the radiated speech waves are shown in Fig. 1(d). Different vowels are distinguished acoustically by the overall shape of the spectrum envelope, and particularly by the frequency of spectral peaks. The spectral peaks arise from vocal tract resonances and are referred to as formants, identified by number (F1, F2, F3, etc.) with the first formant having the lowest frequency.

    FIG. 1 (a) Energy spectrum of vocal fold vibration; (b) schematic vocal tract outlines for the vowels /i/, /a/ and /u/; (c) transfer functions corresponding to the vocal tract configurations in (b); (d) energy spectra of waves radiated at the lips for these vowels.

    A formant is not associated specifically with the resonance of a particular vocal tract cavity; there is no simple unique relationship between the size of a specific cavity and the frequency of a particular peak in the output spectrum. As Fig. 1 illustrates, vowels are distinguished in articulatory terms primarily by changes in the position of the lips and in the position and cross-sectional area of the maximum constriction in the vocal tract. A corollary of the general relationship which exists between overall vocal tract size and formant frequencies is that differences in formant frequencies for a given vowel are to be expected when the vowel is spoken by talkers of different physical size. Thus men, women and children will tend to have average formant frequencies ranked in ascending order of frequency, although the relationship is not equivalent to simple proportional scaling. Evidently, formant frequency changes resulting from changes in shape of a particular vocal tract will be relative to the output of that tract when in a neutral configuration. The periodic vocal tract excitation produced by vocal cord vibration characteristic of voiced speech is not the only source of sound energy used in speech. Excitation can also be provided by an aperiodic noise source resulting from turbulent air flow through a narrow aperture formed by constriction of some part of the vocal tract. For example, whispered speech and aspirated sounds (such as /h/) are excited by random noise. This is created by forcing air past the part-closed vocal cords at the rear of the vocal tract. The higher frequency noise excitation in the initial sounds of saw, four and shore is the result of forcing an airstream through a relatively narrow constriction at more forward positions in the

    Enjoying the preview?
    Page 1 of 1